Summary
The populate_hgvs_for_score_set worker job currently raises NotImplementedError and returns JobExecutionOutcome.skipped() for score sets with multiple targets. Multi-target score sets should be supported since each variant belongs to exactly one target — target membership can be inferred from the hgvs_nt or hgvs_pro prefix.
Background
Multi-target score sets use fully-qualified HGVS format on Variant.hgvs_nt/Variant.hgvs_pro:
- Single-target:
c.1A>G (no prefix)
- Multi-target:
TARGET_NAME:c.1A>G (target name prefix before :)
Each variant pertains to exactly one target. Multi-variant allele IDs (comma-separated ClinGen IDs) are already skipped by the job.
Each TargetGene has its own post_mapped_metadata with per-target transcript accessions and coding information. The HGVS population job needs the correct transcript accession per-target to query ClinGen accurately.
Current behavior
get_target_coding_info() in src/mavedb/lib/target_genes.py rejects multi-target score sets with NotImplementedError when len(score_set.target_genes) != 1. The job catches this and returns JobExecutionOutcome.skipped().
Proposed implementation
-
Refactor get_target_coding_info to return a dict[str, tuple[bool, Optional[str]]] keyed by target name, iterating all targets instead of rejecting multi-target.
-
Update the job query to also select Variant.hgvs_nt and Variant.hgvs_pro so the loop can determine target membership.
-
Add target resolution in the job loop — for multi-target score sets, parse hgvs_nt by splitting on : to extract TARGET_NAME, then look up (is_coding, transcript_accession) from the target info dict. For single-target, use the sole target directly (no prefix parsing needed).
-
Handle edge cases: variant hgvs_nt and Variant.hgvs_pro is None (skip, no target inferable), target name from prefix not found in target info dict (skip with annotation status).
Files to modify
src/mavedb/lib/target_genes.py — get_target_coding_info() signature and implementation
src/mavedb/worker/jobs/external_services/hgvs.py — job query, loop, and target resolution logic
tests/worker/jobs/conftest.py — add multi-target fixtures (score set with 2+ targets, variants with prefixed HGVS)
tests/worker/jobs/external_services/test_hgvs.py — add multi-target test cases
Relevant code references
src/mavedb/lib/validation/dataframe/column.py — validate_variant_formatting() shows the target:hgvs parsing pattern
src/mavedb/lib/validation/dataframe/variant.py — validate_transgenic_variant() splits on : for fully-qualified variants
tests/helpers/constants.py — TEST_MINIMAL_MULTI_TARGET_SCORESET for multi-target test data patterns
Summary
The
populate_hgvs_for_score_setworker job currently raisesNotImplementedErrorand returnsJobExecutionOutcome.skipped()for score sets with multiple targets. Multi-target score sets should be supported since each variant belongs to exactly one target — target membership can be inferred from thehgvs_ntorhgvs_proprefix.Background
Multi-target score sets use fully-qualified HGVS format on
Variant.hgvs_nt/Variant.hgvs_pro:c.1A>G(no prefix)TARGET_NAME:c.1A>G(target name prefix before:)Each variant pertains to exactly one target. Multi-variant allele IDs (comma-separated ClinGen IDs) are already skipped by the job.
Each
TargetGenehas its ownpost_mapped_metadatawith per-target transcript accessions and coding information. The HGVS population job needs the correct transcript accession per-target to query ClinGen accurately.Current behavior
get_target_coding_info()insrc/mavedb/lib/target_genes.pyrejects multi-target score sets withNotImplementedErrorwhenlen(score_set.target_genes) != 1. The job catches this and returnsJobExecutionOutcome.skipped().Proposed implementation
Refactor
get_target_coding_infoto return adict[str, tuple[bool, Optional[str]]]keyed by target name, iterating all targets instead of rejecting multi-target.Update the job query to also select
Variant.hgvs_ntandVariant.hgvs_proso the loop can determine target membership.Add target resolution in the job loop — for multi-target score sets, parse
hgvs_ntby splitting on:to extractTARGET_NAME, then look up(is_coding, transcript_accession)from the target info dict. For single-target, use the sole target directly (no prefix parsing needed).Handle edge cases: variant
hgvs_ntandVariant.hgvs_prois None (skip, no target inferable), target name from prefix not found in target info dict (skip with annotation status).Files to modify
src/mavedb/lib/target_genes.py—get_target_coding_info()signature and implementationsrc/mavedb/worker/jobs/external_services/hgvs.py— job query, loop, and target resolution logictests/worker/jobs/conftest.py— add multi-target fixtures (score set with 2+ targets, variants with prefixed HGVS)tests/worker/jobs/external_services/test_hgvs.py— add multi-target test casesRelevant code references
src/mavedb/lib/validation/dataframe/column.py—validate_variant_formatting()shows thetarget:hgvsparsing patternsrc/mavedb/lib/validation/dataframe/variant.py—validate_transgenic_variant()splits on:for fully-qualified variantstests/helpers/constants.py—TEST_MINIMAL_MULTI_TARGET_SCORESETfor multi-target test data patterns