fix(backfill): compare PR updatedAt by instant so the metadata gate skips#197
Merged
Merged
Conversation
…kips The nightly incremental backfill's metadata gate compared the stored PR `updatedAt` against GitHub's value with raw `===`. The stored side is read back from a `timestamptz` column, which TypeORM hydrates into a Date (and the entity mis-annotated as `string`), while the incoming side is GitHub's ISO string. `Date === string` is always false, and even pg's text form (`...+00`) differs from GitHub's ISO form (`...Z`), so the gate matched nothing and re-enqueued a PR_METADATA job for every PR in the window every night — `meta_skipped` stayed 0. Benign for the rate-limit fix (metadata jobs are cheap; the SHA/content gate is what tamed the flood), but the metadata half of the optimisation was dead code. Normalise both sides to an epoch-ms instant before comparing, fail safe on null/unparseable input, and correct the entity annotation to `Date | string | null`. Adds spec coverage for the Date-vs-ISO-string reality that the old string-only tests never exercised.
entrius
approved these changes
Jun 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The nightly incremental backfill (#195) gates the
PR_METADATAre-fetch on whether GitHub'supdatedAtmoved, vianeedsMetadataRefresh:stored.updatedAtis read back from atimestamptzcolumn, which TypeORM hydrates into aDate(verified: pg parses OID 1184 → Date; TypeORM'snormalizeHydratedDatereturns a Date in both branches). The entity mis-annotated it asstring. The incoming side is GitHub's ISO string (2026-06-01T00:00:00Z).So the comparison is effectively
Date === string→ always false. Even if it were stringified, pg's text form (2026-06-01 00:00:00+00) differs from GitHub's ISO form. The round-trip through the column destroys the exact original string, so the gate matched nothing and re-enqueued a metadata job for every PR in the window, every night.Evidence (prod, night of 6/25)
Every repo's
[backfill-summary]showedmeta_skipped=0withmeta_enqueued = prs_in_window— e.g.e35ventura/taopedia-articles:prs_in_window=5447 meta_enqueued=5447 meta_skipped=0. The files gate (keyed onscoring_data_stored+ head/base SHA, notupdated_at) worked correctly:files_skipped=5447.Impact
Benign for the rate-limit fix that shipped —
PR_METADATAjobs are cheap (1 GraphQL call) and the SHA/content gate is what tamed the flood (0graphql_rate_limitlines in the last 24h). But the metadata half of the incremental optimisation was dead code.Fix
toEpochMs) before comparing; handles Date, ISO string, and tz-equivalent forms.Date | string | null.Verification
npm run buildclean,eslintclean, full suite 25/25 pass (4 new Date-vs-string cases).meta_skippedto become large for unchanged firehose repos on steady-state nights (the falsifiable check).