Problem
Supersession filtering is currently handled inconsistently across our data discovery endpoints:
| Endpoint |
Current Behavior |
POST /score-sets/search |
Auto-excludes score sets with published superseding versions via build_search_score_sets_query_filter() |
POST /me/score-sets/search |
Same query-level filter as public search |
GET /score-sets/recently-published |
Blanks superseding_score_set based on permission, but no chain filtering |
GET /experiments/{urn}/score-sets |
Filters via ~ScoreSet.superseding_score_set.has() + find_superseded_score_set_tail() dedup |
POST /target-genes/search, GET /target-genes |
Uses find_superseded_score_set_tail() to keep only chain heads |
POST /me/target-genes/search |
Simple filter: superseding_score_set is None |
Variant endpoints (/variants/*) |
No supersession filtering at all — returns data from superseded score sets |
There are at least three different filtering strategies in use (query-level join filter, tail-walk dedup, simple is None check), and some endpoints expose superseded data with no opt-out. API consumers have no way to control this behavior.
Proposal
Introduce a lineage mode query parameter across all data discovery endpoints that controls how superseded score set chains are represented in results.
Query Parameter
| Parameter |
Values |
Default |
lineage |
collapsed, expanded |
collapsed |
Behavior
collapsed (default): Return only data associated with the head node (latest accessible version) of each superseded score set chain. This is the "clean" view — consumers see one representative per lineage.
expanded: Return all data regardless of supersession status. Every score set and its associated variants/targets are included, even if the score set has been superseded by a published replacement.
Affected Endpoints
All data discovery / listing / search endpoints should respect this parameter:
POST /score-sets/search
POST /me/score-sets/search
POST /score-sets/search/filter-options
GET /score-sets/recently-published
GET /experiments/{urn}/score-sets
POST /target-genes/search
GET /target-genes
POST /me/target-genes/search
- Variant search/list endpoints that return data across score sets
Single-resource fetch endpoints (GET /score-sets/{urn}, GET /target-genes/{id}, etc.) are not affected — they should continue to return the requested resource regardless of supersession status, with the superseding_score_set / superseded_score_set fields populated as they are today.
Implementation Considerations
-
Centralize the filtering logic. Today we have build_search_score_sets_query_filter(), find_superseded_score_set_tail(), find_publish_or_private_superseded_score_set_tail(), and fetch_superseding_score_set_in_search_result() in src/mavedb/lib/score_sets.py, plus inline filters in routers. The new parameter should flow through a single, shared mechanism.
-
Public endpoints should resolve chain heads deterministically. Today the "head" of a chain is resolved per-user based on permissions — a contributor with an unpublished superseding draft sees it as the head, while an anonymous user sees the latest published version. This means two users hitting the same public search endpoint can see different results. Public discovery endpoints should always resolve the chain head as the latest published version, regardless of who is making the request. Authenticated /me/ endpoints continue to show the user's full picture including drafts. This simplifies the query-level vs. post-query tradeoff: public endpoints can do pure query-level filtering (no per-user permission checks needed), while /me/ endpoints handle permission-aware post-query resolution.
-
Search request models. ScoreSetsSearch and other Pydantic search models will need a lineage field. Consider an enum (LineageMode) in a shared location.
-
Variant endpoints. Currently, variant search has zero supersession awareness. In collapsed mode, variants belonging to superseded score sets should be excluded (or mapped to their head score set, depending on desired semantics). This is the biggest behavioral change.
-
Backwards compatibility. Since collapsed is the proposed default and most endpoints already exhibit collapsed-like behavior, the default experience should remain largely unchanged. The expanded mode is the new capability.
Current Code References
- Supersession model fields:
src/mavedb/models/score_set.py — superseded_score_set_id, superseded_score_set, superseding_score_set
- Core filtering utilities:
src/mavedb/lib/score_sets.py — build_search_score_sets_query_filter(), find_superseded_score_set_tail(), fetch_superseding_score_set_in_search_result()
- Score set search router:
src/mavedb/routers/score_sets.py
- Experiment score sets:
src/mavedb/routers/experiments.py
- Target gene search:
src/mavedb/routers/target_genes.py
- View models:
src/mavedb/view_models/score_set.py
Problem
Supersession filtering is currently handled inconsistently across our data discovery endpoints:
POST /score-sets/searchbuild_search_score_sets_query_filter()POST /me/score-sets/searchGET /score-sets/recently-publishedsuperseding_score_setbased on permission, but no chain filteringGET /experiments/{urn}/score-sets~ScoreSet.superseding_score_set.has()+find_superseded_score_set_tail()dedupPOST /target-genes/search,GET /target-genesfind_superseded_score_set_tail()to keep only chain headsPOST /me/target-genes/searchsuperseding_score_set is None/variants/*)There are at least three different filtering strategies in use (query-level join filter, tail-walk dedup, simple
is Nonecheck), and some endpoints expose superseded data with no opt-out. API consumers have no way to control this behavior.Proposal
Introduce a lineage mode query parameter across all data discovery endpoints that controls how superseded score set chains are represented in results.
Query Parameter
lineagecollapsed,expandedcollapsedBehavior
collapsed(default): Return only data associated with the head node (latest accessible version) of each superseded score set chain. This is the "clean" view — consumers see one representative per lineage.expanded: Return all data regardless of supersession status. Every score set and its associated variants/targets are included, even if the score set has been superseded by a published replacement.Affected Endpoints
All data discovery / listing / search endpoints should respect this parameter:
POST /score-sets/searchPOST /me/score-sets/searchPOST /score-sets/search/filter-optionsGET /score-sets/recently-publishedGET /experiments/{urn}/score-setsPOST /target-genes/searchGET /target-genesPOST /me/target-genes/searchSingle-resource fetch endpoints (
GET /score-sets/{urn},GET /target-genes/{id}, etc.) are not affected — they should continue to return the requested resource regardless of supersession status, with thesuperseding_score_set/superseded_score_setfields populated as they are today.Implementation Considerations
Centralize the filtering logic. Today we have
build_search_score_sets_query_filter(),find_superseded_score_set_tail(),find_publish_or_private_superseded_score_set_tail(), andfetch_superseding_score_set_in_search_result()insrc/mavedb/lib/score_sets.py, plus inline filters in routers. The new parameter should flow through a single, shared mechanism.Public endpoints should resolve chain heads deterministically. Today the "head" of a chain is resolved per-user based on permissions — a contributor with an unpublished superseding draft sees it as the head, while an anonymous user sees the latest published version. This means two users hitting the same public search endpoint can see different results. Public discovery endpoints should always resolve the chain head as the latest published version, regardless of who is making the request. Authenticated
/me/endpoints continue to show the user's full picture including drafts. This simplifies the query-level vs. post-query tradeoff: public endpoints can do pure query-level filtering (no per-user permission checks needed), while/me/endpoints handle permission-aware post-query resolution.Search request models.
ScoreSetsSearchand other Pydantic search models will need alineagefield. Consider an enum (LineageMode) in a shared location.Variant endpoints. Currently, variant search has zero supersession awareness. In
collapsedmode, variants belonging to superseded score sets should be excluded (or mapped to their head score set, depending on desired semantics). This is the biggest behavioral change.Backwards compatibility. Since
collapsedis the proposed default and most endpoints already exhibit collapsed-like behavior, the default experience should remain largely unchanged. Theexpandedmode is the new capability.Current Code References
src/mavedb/models/score_set.py—superseded_score_set_id,superseded_score_set,superseding_score_setsrc/mavedb/lib/score_sets.py—build_search_score_sets_query_filter(),find_superseded_score_set_tail(),fetch_superseding_score_set_in_search_result()src/mavedb/routers/score_sets.pysrc/mavedb/routers/experiments.pysrc/mavedb/routers/target_genes.pysrc/mavedb/view_models/score_set.py