Skip to content

fix(api): bust metagraph edge cache on neuron snapshot refresh, not health prober#2181

Open
andriypolanski wants to merge 1 commit into
JSONbored:mainfrom
andriypolanski:fix/neurons-routes-edge-cache-stamp
Open

fix(api): bust metagraph edge cache on neuron snapshot refresh, not health prober#2181
andriypolanski wants to merge 1 commit into
JSONbored:mainfrom
andriypolanski:fix/neurons-routes-edge-cache-stamp

Conversation

@andriypolanski

Copy link
Copy Markdown
Contributor

Summary

  • Key edge cache for neurons-tier routes (subnet-metagraph, subnet-validators, subnet-concentration) on a neurons freshness stamp instead of health:meta.last_run_at.
  • Source stamp from D1 MAX(captured_at) for the subnet, or a lightweight KV pointer updated by loadStagedNeurons.
  • Add regression test: neuron data changes bust cache without a health prober tick.

Closes #2180

Why

Agents polling metagraph/validators for stake and rank can see data up to ~12 minutes stale at the edge despite 3-minute neuron refreshes in D1.

Test plan

  • npm test -- tests/analytics-edge-cache.test.mjs
  • Cache hit when stamp unchanged; miss when neuron captured_at advances
  • Health-trends routes still bust on last_run_at only

@superagent-security

Copy link
Copy Markdown

Superagent didn't find any vulnerabilities or security issues in this PR.

@codecov

codecov Bot commented Jun 27, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 90.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.23%. Comparing base (a3c4230) to head (92b7a48).

Files with missing lines Patch % Lines
workers/request-handlers/analytics.mjs 85.71% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2181      +/-   ##
==========================================
- Coverage   93.25%   93.23%   -0.02%     
==========================================
  Files          51       51              
  Lines        8076     8088      +12     
  Branches     2962     2968       +6     
==========================================
+ Hits         7531     7541      +10     
  Misses         92       92              
- Partials      453      455       +2     
Files with missing lines Coverage Δ
workers/api.mjs 94.57% <100.00%> (ø)
workers/request-handlers/analytics.mjs 95.57% <85.71%> (-0.69%) ⬇️
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@gittensory-orb

gittensory-orb Bot commented Jun 27, 2026

Copy link
Copy Markdown

Tip

🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩

✅ Gittensory review — safe to merge

3 files · 1 AI reviewers · no blockers · readiness 55/100 · CI green · clean

✅ Approved — safe to merge

Review summary
This PR correctly fixes the metagraph/validators/concentration edge cache staleness bug by decoupling those routes' cache keys from the 15-minute health prober tick and keying them on the per-subnet neuron `captured_at` stamp instead. The `withEdgeCache` extension via an optional `resolveCacheStamp` callback is backward-compatible; all three call sites in `api.mjs` are correctly migrated with matching `netuid` values; the null-MAX guard in `readSubnetNeuronsCacheStamp` (`Number.isInteger(capturedAt) && capturedAt > 0`) correctly suppresses caching when the table is cold or empty. The one material concern is that `readSubnetNeuronsCacheStamp` issues a D1 `MAX(captured_at)` query on every cache-eligible GET, including hits — the original design avoided all D1 on hits via a per-isolate-memoized KV read, and that property is now lost for neurons-tier routes.

Signal Result Evidence
Code review ✅ No blockers 1 reviewers, synthesized
Linked issue ✅ Linked #2180
Related work ⚠️ 2 scoped overlaps Top overlaps are listed below; lower-confidence bulk is hidden.
Review load ❌ 8/20 Readiness component derived from cached public PR metadata and labels.
Validation evidence ❌ 5/25 Cached preflight status is hold.
Open PR queue ❌ 3/10 15 open PR(s), 13 likely reviewable, 2 unlinked.
Contributor context ✅ Confirmed Gittensor contributor andriypolanski; Gittensor profile; 76 PR(s), 7 issue(s).
Gate result ✅ Passing No configured blocker found.
Nits — 6 non-blocking
  • analytics.mjs `readSubnetNeuronsCacheStamp`: fires a D1 `MAX(captured_at)` query on every cache-eligible request, including hits — the updated comment ('no D1 aggregation at all for the handler body') acknowledges the regression, but the stamp lookup is itself a D1 round-trip per request; the PR description mentioned a KV pointer updated by `loadStagedNeurons` as the intended source, which would restore the all-KV-on-hit property.
  • tests/analytics-edge-cache.test.mjs first `neurons-tier edge cache` test: the three-route loop accumulates `cache.putKeys` across iterations without resetting, so the negative assertion (`!cache.putKeys.some(key => key.includes(LAST_RUN_AT))`) covers all prior iterations and makes per-route failures harder to isolate — a `putKeys.length` reset or per-route sub-assertion would tighten the signal.
  • analytics.mjs `withNeuronsEdgeCache`: no comment explains the parameter order (netuid precedes keyParts, inverting the `withEdgeCache` signature); a one-liner noting the inversion would prevent callers from miscounting positional args.
  • tests/analytics-edge-cache.test.mjs: the cold-table path (D1 returns `{captured_at: null}` for an empty `neurons` table → stamp null → caching skipped) has no direct unit coverage for `readSubnetNeuronsCacheStamp`; the existing mock always returns a non-null `capturedAt`.
  • Write `neurons:max_captured_at:<netuid>` to KV inside `loadStagedNeurons` (workers/api.mjs scheduler) so `readSubnetNeuronsCacheStamp` can do a KV get instead of a D1 aggregation query, restoring the zero-D1-on-cache-hit property that motivated the memoized KV design in the first place.
  • PR author also opened the linked issue — Link an issue that was opened by a different contributor, or provide a rationale for why this self-authored issue represents genuine discovery work.
Review context
Contributor next steps
  • Review top overlaps.
  • Add scope summary.
  • Fix blocker.
  • Expect slower review.
  • Refresh registry data or choose a registered active repo.
  • Check active issues and PRs before submitting.
Signal definitions
  • Related work = same linked issue, overlapping active PRs, or title/path similarity.
  • Review load = cached public PR metadata such as size labels, changed paths, and preflight status.
  • Open PR queue = repo-wide review pressure; it is not a PR quality failure.
  • Contributor context = public GitHub/Gittensor identity context; non-Gittensor status is not a blocker.
Review details

Generated from public PR metadata and the diff. Advisory only; deterministic signals remain authoritative.

This PR correctly fixes the metagraph/validators/concentration edge cache staleness bug by decoupling those routes' cache keys from the 15-minute health prober tick and keying them on the per-subnet neuron `captured_at` stamp instead. The `withEdgeCache` extension via an optional `resolveCacheStamp` callback is backward-compatible; all three call sites in `api.mjs` are correctly migrated with matching `netuid` values; the null-MAX guard in `readSubnetNeuronsCacheStamp` (`Number.isInteger(capturedAt) && capturedAt > 0`) correctly suppresses caching when the table is cold or empty. The one material concern is that `readSubnetNeuronsCacheStamp` issues a D1 `MAX(captured_at)` query on every cache-eligible GET, including hits — the original design avoided all D1 on hits via a per-isolate-memoized KV read, and that property is now lost for neurons-tier routes.

Nits (5)

  • analytics.mjs `readSubnetNeuronsCacheStamp`: fires a D1 `MAX(captured_at)` query on every cache-eligible request, including hits — the updated comment ('no D1 aggregation at all for the handler body') acknowledges the regression, but the stamp lookup is itself a D1 round-trip per request; the PR description mentioned a KV pointer updated by `loadStagedNeurons` as the intended source, which would restore the all-KV-on-hit property.
  • tests/analytics-edge-cache.test.mjs first `neurons-tier edge cache` test: the three-route loop accumulates `cache.putKeys` across iterations without resetting, so the negative assertion (`!cache.putKeys.some(key => key.includes(LAST_RUN_AT))`) covers all prior iterations and makes per-route failures harder to isolate — a `putKeys.length` reset or per-route sub-assertion would tighten the signal.
  • analytics.mjs `withNeuronsEdgeCache`: no comment explains the parameter order (netuid precedes keyParts, inverting the `withEdgeCache` signature); a one-liner noting the inversion would prevent callers from miscounting positional args.
  • tests/analytics-edge-cache.test.mjs: the cold-table path (D1 returns `{captured_at: null}` for an empty `neurons` table → stamp null → caching skipped) has no direct unit coverage for `readSubnetNeuronsCacheStamp`; the existing mock always returns a non-null `capturedAt`.
  • Write `neurons:max_captured_at:<netuid>` to KV inside `loadStagedNeurons` (workers/api.mjs scheduler) so `readSubnetNeuronsCacheStamp` can do a KV get instead of a D1 aggregation query, restoring the zero-D1-on-cache-hit property that motivated the memoized KV design in the first place.

🟩 Safe / merged · 🟦 Advisory · 🟨 Held for review · 🟥 Blocked / closed


💰 Earn for open-source contributions like this. Gittensor lets GitHub contributors earn for the work they already do — register to start earning →.

Checked by Gittensory, a quiet PR intelligence layer for OSS maintainers.

  • Re-run Gittensory review

@gittensory-orb gittensory-orb Bot added gittensor Gittensor contributor context gittensor:bug Bug labels Jun 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gittensor:bug Bug gittensor Gittensor contributor context

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Neurons-tier routes edge-cache bust on health prober, not neuron refresh

1 participant