Skip to content

fix(backfill): compare PR updatedAt by instant so the metadata gate skips#197

Merged
entrius merged 2 commits into
testfrom
fix/backfill-metadata-gate-timestamp-compare
Jun 25, 2026
Merged

fix(backfill): compare PR updatedAt by instant so the metadata gate skips#197
entrius merged 2 commits into
testfrom
fix/backfill-metadata-gate-timestamp-compare

Conversation

@anderdc

@anderdc anderdc commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Problem

The nightly incremental backfill (#195) gates the PR_METADATA re-fetch on whether GitHub's updatedAt moved, via needsMetadataRefresh:

stored.updatedAt === updatedAt   // stored: from DB, updatedAt: GitHub ISO string

stored.updatedAt is read back from a timestamptz column, which TypeORM hydrates into a Date (verified: pg parses OID 1184 → Date; TypeORM's normalizeHydratedDate returns a Date in both branches). The entity mis-annotated it as string. The incoming side is GitHub's ISO string (2026-06-01T00:00:00Z).

So the comparison is effectively Date === stringalways false. Even if it were stringified, pg's text form (2026-06-01 00:00:00+00) differs from GitHub's ISO form. The round-trip through the column destroys the exact original string, so the gate matched nothing and re-enqueued a metadata job for every PR in the window, every night.

Evidence (prod, night of 6/25)

Every repo's [backfill-summary] showed meta_skipped=0 with meta_enqueued = prs_in_window — e.g. e35ventura/taopedia-articles: prs_in_window=5447 meta_enqueued=5447 meta_skipped=0. The files gate (keyed on scoring_data_stored + head/base SHA, not updated_at) worked correctly: files_skipped=5447.

Note: 6/25 was also the first backfill run to ever populate updated_at, so that night alone is ambiguous (first-run also yields meta_skipped=0). This fix is established from the code/format round-trip, which fails regardless of first-run.

Impact

Benign for the rate-limit fix that shipped — PR_METADATA jobs are cheap (1 GraphQL call) and the SHA/content gate is what tamed the flood (0 graphql_rate_limit lines in the last 24h). But the metadata half of the incremental optimisation was dead code.

Fix

  • Normalise both sides to an epoch-ms instant (toEpochMs) before comparing; handles Date, ISO string, and tz-equivalent forms.
  • Fail safe (re-fetch) on null/unparseable input.
  • Correct the entity annotation to Date | string | null.
  • Add spec coverage for the Date-vs-ISO-string reality the old string-only tests never exercised.

Verification

  • npm run build clean, eslint clean, full suite 25/25 pass (4 new Date-vs-string cases).
  • Post-merge/deploy, expect meta_skipped to become large for unchanged firehose repos on steady-state nights (the falsifiable check).

anderdc added 2 commits June 25, 2026 12:56
…kips

The nightly incremental backfill's metadata gate compared the stored PR
`updatedAt` against GitHub's value with raw `===`. The stored side is read
back from a `timestamptz` column, which TypeORM hydrates into a Date (and
the entity mis-annotated as `string`), while the incoming side is GitHub's
ISO string. `Date === string` is always false, and even pg's text form
(`...+00`) differs from GitHub's ISO form (`...Z`), so the gate matched
nothing and re-enqueued a PR_METADATA job for every PR in the window every
night — `meta_skipped` stayed 0.

Benign for the rate-limit fix (metadata jobs are cheap; the SHA/content
gate is what tamed the flood), but the metadata half of the optimisation
was dead code.

Normalise both sides to an epoch-ms instant before comparing, fail safe on
null/unparseable input, and correct the entity annotation to `Date | string
| null`. Adds spec coverage for the Date-vs-ISO-string reality that the
old string-only tests never exercised.
@entrius entrius merged commit eb214a6 into test Jun 25, 2026
3 checks passed
@entrius entrius deleted the fix/backfill-metadata-gate-timestamp-compare branch June 25, 2026 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants