Skip to content

Text migration: stop dual-write to legacy text table (Deploy 1/3)#650

Open
mircealungu wants to merge 1 commit into
masterfrom
text-migration-deploy1
Open

Text migration: stop dual-write to legacy text table (Deploy 1/3)#650
mircealungu wants to merge 1 commit into
masterfrom
text-migration-deploy1

Conversation

@mircealungu
Copy link
Copy Markdown
Member

This is Deploy 1 of 3 in finishing the long-deferred migration from the legacy text table to new_text + bookmark_context. Plan: ~/code/zeeguu/docs/ (and the working plan in ~/.claude/plans/wondrous-honking-gadget.md).

What this PR does

Stops the dual-write. Every bookmark today writes the same content to both the legacy text table (via Text.find_or_create) and the new bookmark_contextnew_text chain. This PR keeps only the new-model write. New bookmarks get text_id = NULL; the column stays as a safety net until Deploy 2.

Re-points every bookmark.text.* read that NULL-text new bookmarks would otherwise crash on:

  • Bookmark.url() and Bookmark.article_id() helpers, deriving the article via Article.find_by_source_id(source_id)url.as_string() (mirroring what old Text.url() did).
  • Bookmark.to_json / UserWord.as_dictionary URL and article_id use the helpers.
  • reading_sessions.translations_in_interval raw SQL now joins bookmark → bookmark_context → new_text (the literal new_text becomes text in Deploy 3).
  • User.bookmarks_by_url_by_date had zero callers anywhere — deleted.

Edit flow (/translation + update_bookmark.find_or_create_context) reworked to thread only the BookmarkContext through — no more (text, context) tuple.

Drive-by cleanup: unused article_id parameter dropped from Bookmark.find_or_create (it only fed the now-removed Text.find_or_create call), along with the dead article_id = request.json.get("articleID", ...) in /translation.

Intentionally deferred to Deploy 2

  • get_context()'s self.text.content fallback (only fires for legacy null-context bookmarks; prod has 0 such rows today).
  • The Bookmark.text relationship + text_id column + find_all_for_text_and_user + the joinedload(Bookmark.text) hints in user.py / basicSR.py.

Verification

Prod read-only queries via query-prod:

  • SELECT COUNT(*) FROM bookmark_context WHERE text_id IS NULL0 (no translation.py:535-style NULL deref risk).
  • SELECT COUNT(*) FROM bookmark WHERE text_id IS NOT NULL AND context_id IS NULL0 (no legacy bookmarks drop out of translations_in_interval).
  • SELECT COUNT(*) FROM bookmark WHERE text_id IS NULL AND context_id IS NULL0 (no dual-NULL rows that would hit the deferred get_context() fallback).
  • bookmark.text_id IS_NULLABLEYES (NULL writes succeed at the schema level).

Tests: 319 passed, 1 skipped — matches bare master exactly.

One behaviour note

context_in_content is now hardcoded True in UserWord.as_dictionary (old text.in_content had no new-model equivalent). The web frontend reads this field zero times in src/ — only one assignment in EditBookmarkButton.js that propagates server→local. Title bookmarks are distinguished structurally in the API response (articleInfo.tokenized_title_new.past_bookmarks is a separate array from each paragraph's past_bookmarks), not via this flag. So the constant is harmless, but the field could be stripped from the payload later.

What's next

  • Deploy 2: data-parity gate (Phase-0 queries), drop bookmark.text_id FK + column, DROP TABLE text, delete model/text.py, swap article_pruning.PROTECTING_TABLES "text" for a bookmark → source → article block.
  • Deploy 3: RENAME TABLE new_text → text, class NewText → Text, update the three FK-holders.

🤖 Generated with Claude Code

Long ago Zeeguu introduced `new_text` + `bookmark_context` as the
canonical store for fragment/caption/context text, intended to replace
the old `text` table. The migration was never finished: every bookmark
still dual-wrote both, and `Bookmark.text` was still read from several
hot paths.

This is Deploy 1 of a three-deploy plan: stop the dual-write and
re-point every `Bookmark.text` read onto the new context/source model.
The `bookmark.text_id` column and the `text` table remain in place as
a safety net; Deploy 2 drops them, Deploy 3 renames `new_text → text`.

Writes removed
- `Bookmark.find_or_create` no longer calls `Text.find_or_create`;
  new bookmarks get `text_id = NULL` (column already nullable in prod).
- `update_bookmark.find_or_create_context` reworked to return only
  the `BookmarkContext` (was a `(text, context)` tuple). The dead
  `context_or_word_changed` re-pointed off old `Text` for the same
  reason.
- `/translation` endpoint no longer assigns `bookmark.text`.
- Dropped the unused `Text` import in `bookmarks_and_words.py`.

Reads re-pointed (every `bookmark.text.*` site that NULL-text new
bookmarks would otherwise crash on)
- New `Bookmark.url()` / `Bookmark.article_id()` helpers derive via
  `Article.find_by_source_id(source_id)` → `url.as_string()`.
- `Bookmark.to_json` URL uses the new helper.
- `UserWord.as_dictionary` (high-traffic exercise serialization) uses
  the new helpers; `context_in_content` hardcoded `True` — web has
  zero readers and title bookmarks are distinguished structurally via
  `articleInfo.tokenized_title_new.past_bookmarks`, so the
  per-bookmark flag was vestigial.
- `User.bookmarks_by_url_by_date` had zero callers anywhere; deleted.
- `reading_sessions.translations_in_interval` raw SQL now joins
  `bookmark -> bookmark_context -> new_text`. Deploy 3 renames
  `new_text` to `text` here.

Drive-by cleanup
- Dropped the unused `article_id` parameter from
  `Bookmark.find_or_create` (it only fed the now-deleted
  `Text.find_or_create` call) and the dead `article_id` request-JSON
  read in `/translation`.

Intentionally left for Deploy 2
- `get_context()`'s `self.text.content` fallback fires only for
  legacy null-context bookmarks; today's prod has zero such rows and
  new bookmarks always have a context. Removed alongside the column.
- `Bookmark.text` relationship + `text_id` column, the
  `find_all_for_text_and_user` classmethod, the
  `joinedload(Bookmark.text)` hints in `user.py` / `basicSR.py`.

Verification
- Phase-0 prod queries: zero rows for all three risk sets
  (`bookmark_context.text_id IS NULL`,
   `bookmark.text_id NOT NULL AND context_id IS NULL`,
   dual-NULL bookmarks).
- Full test suite: 319 passed, 1 skipped — matches bare master.

Plan and review findings: ~/.claude/plans/wondrous-honking-gadget.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 5, 2026

ArchLens detected architectural changes in the following views:
diff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant