Text migration: stop dual-write to legacy text table (Deploy 1/3)#650
Open
mircealungu wants to merge 1 commit into
Open
Text migration: stop dual-write to legacy text table (Deploy 1/3)#650mircealungu wants to merge 1 commit into
text table (Deploy 1/3)#650mircealungu wants to merge 1 commit into
Conversation
Long ago Zeeguu introduced `new_text` + `bookmark_context` as the canonical store for fragment/caption/context text, intended to replace the old `text` table. The migration was never finished: every bookmark still dual-wrote both, and `Bookmark.text` was still read from several hot paths. This is Deploy 1 of a three-deploy plan: stop the dual-write and re-point every `Bookmark.text` read onto the new context/source model. The `bookmark.text_id` column and the `text` table remain in place as a safety net; Deploy 2 drops them, Deploy 3 renames `new_text → text`. Writes removed - `Bookmark.find_or_create` no longer calls `Text.find_or_create`; new bookmarks get `text_id = NULL` (column already nullable in prod). - `update_bookmark.find_or_create_context` reworked to return only the `BookmarkContext` (was a `(text, context)` tuple). The dead `context_or_word_changed` re-pointed off old `Text` for the same reason. - `/translation` endpoint no longer assigns `bookmark.text`. - Dropped the unused `Text` import in `bookmarks_and_words.py`. Reads re-pointed (every `bookmark.text.*` site that NULL-text new bookmarks would otherwise crash on) - New `Bookmark.url()` / `Bookmark.article_id()` helpers derive via `Article.find_by_source_id(source_id)` → `url.as_string()`. - `Bookmark.to_json` URL uses the new helper. - `UserWord.as_dictionary` (high-traffic exercise serialization) uses the new helpers; `context_in_content` hardcoded `True` — web has zero readers and title bookmarks are distinguished structurally via `articleInfo.tokenized_title_new.past_bookmarks`, so the per-bookmark flag was vestigial. - `User.bookmarks_by_url_by_date` had zero callers anywhere; deleted. - `reading_sessions.translations_in_interval` raw SQL now joins `bookmark -> bookmark_context -> new_text`. Deploy 3 renames `new_text` to `text` here. Drive-by cleanup - Dropped the unused `article_id` parameter from `Bookmark.find_or_create` (it only fed the now-deleted `Text.find_or_create` call) and the dead `article_id` request-JSON read in `/translation`. Intentionally left for Deploy 2 - `get_context()`'s `self.text.content` fallback fires only for legacy null-context bookmarks; today's prod has zero such rows and new bookmarks always have a context. Removed alongside the column. - `Bookmark.text` relationship + `text_id` column, the `find_all_for_text_and_user` classmethod, the `joinedload(Bookmark.text)` hints in `user.py` / `basicSR.py`. Verification - Phase-0 prod queries: zero rows for all three risk sets (`bookmark_context.text_id IS NULL`, `bookmark.text_id NOT NULL AND context_id IS NULL`, dual-NULL bookmarks). - Full test suite: 319 passed, 1 skipped — matches bare master. Plan and review findings: ~/.claude/plans/wondrous-honking-gadget.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

This is Deploy 1 of 3 in finishing the long-deferred migration from the legacy
texttable tonew_text+bookmark_context. Plan:~/code/zeeguu/docs/(and the working plan in~/.claude/plans/wondrous-honking-gadget.md).What this PR does
Stops the dual-write. Every bookmark today writes the same content to both the legacy
texttable (viaText.find_or_create) and the newbookmark_context→new_textchain. This PR keeps only the new-model write. New bookmarks gettext_id = NULL; the column stays as a safety net until Deploy 2.Re-points every
bookmark.text.*read that NULL-text new bookmarks would otherwise crash on:Bookmark.url()andBookmark.article_id()helpers, deriving the article viaArticle.find_by_source_id(source_id)→url.as_string()(mirroring what oldText.url()did).Bookmark.to_json/UserWord.as_dictionaryURL and article_id use the helpers.reading_sessions.translations_in_intervalraw SQL now joinsbookmark → bookmark_context → new_text(the literalnew_textbecomestextin Deploy 3).User.bookmarks_by_url_by_datehad zero callers anywhere — deleted.Edit flow (
/translation+update_bookmark.find_or_create_context) reworked to thread only theBookmarkContextthrough — no more(text, context)tuple.Drive-by cleanup: unused
article_idparameter dropped fromBookmark.find_or_create(it only fed the now-removedText.find_or_createcall), along with the deadarticle_id = request.json.get("articleID", ...)in/translation.Intentionally deferred to Deploy 2
get_context()'sself.text.contentfallback (only fires for legacy null-context bookmarks; prod has 0 such rows today).Bookmark.textrelationship +text_idcolumn +find_all_for_text_and_user+ thejoinedload(Bookmark.text)hints inuser.py/basicSR.py.Verification
Prod read-only queries via
query-prod:SELECT COUNT(*) FROM bookmark_context WHERE text_id IS NULL→ 0 (notranslation.py:535-style NULL deref risk).SELECT COUNT(*) FROM bookmark WHERE text_id IS NOT NULL AND context_id IS NULL→ 0 (no legacy bookmarks drop out oftranslations_in_interval).SELECT COUNT(*) FROM bookmark WHERE text_id IS NULL AND context_id IS NULL→ 0 (no dual-NULL rows that would hit the deferredget_context()fallback).bookmark.text_id IS_NULLABLE→ YES (NULL writes succeed at the schema level).Tests: 319 passed, 1 skipped — matches bare
masterexactly.One behaviour note
context_in_contentis now hardcodedTrueinUserWord.as_dictionary(oldtext.in_contenthad no new-model equivalent). The web frontend reads this field zero times insrc/— only one assignment inEditBookmarkButton.jsthat propagates server→local. Title bookmarks are distinguished structurally in the API response (articleInfo.tokenized_title_new.past_bookmarksis a separate array from each paragraph'spast_bookmarks), not via this flag. So the constant is harmless, but the field could be stripped from the payload later.What's next
bookmark.text_idFK + column,DROP TABLE text, deletemodel/text.py, swaparticle_pruning.PROTECTING_TABLES "text"for abookmark → source → articleblock.RENAME TABLE new_text → text, classNewText → Text, update the three FK-holders.🤖 Generated with Claude Code