Skip to content

Add multi-target score set support to HGVS population job #712

@bencap

Description

@bencap

Summary

The populate_hgvs_for_score_set worker job currently raises NotImplementedError and returns JobExecutionOutcome.skipped() for score sets with multiple targets. Multi-target score sets should be supported since each variant belongs to exactly one target — target membership can be inferred from the hgvs_nt or hgvs_pro prefix.

Background

Multi-target score sets use fully-qualified HGVS format on Variant.hgvs_nt/Variant.hgvs_pro:

  • Single-target: c.1A>G (no prefix)
  • Multi-target: TARGET_NAME:c.1A>G (target name prefix before :)

Each variant pertains to exactly one target. Multi-variant allele IDs (comma-separated ClinGen IDs) are already skipped by the job.

Each TargetGene has its own post_mapped_metadata with per-target transcript accessions and coding information. The HGVS population job needs the correct transcript accession per-target to query ClinGen accurately.

Current behavior

get_target_coding_info() in src/mavedb/lib/target_genes.py rejects multi-target score sets with NotImplementedError when len(score_set.target_genes) != 1. The job catches this and returns JobExecutionOutcome.skipped().

Proposed implementation

  1. Refactor get_target_coding_info to return a dict[str, tuple[bool, Optional[str]]] keyed by target name, iterating all targets instead of rejecting multi-target.

  2. Update the job query to also select Variant.hgvs_nt and Variant.hgvs_pro so the loop can determine target membership.

  3. Add target resolution in the job loop — for multi-target score sets, parse hgvs_nt by splitting on : to extract TARGET_NAME, then look up (is_coding, transcript_accession) from the target info dict. For single-target, use the sole target directly (no prefix parsing needed).

  4. Handle edge cases: variant hgvs_nt and Variant.hgvs_pro is None (skip, no target inferable), target name from prefix not found in target info dict (skip with annotation status).

Files to modify

  • src/mavedb/lib/target_genes.pyget_target_coding_info() signature and implementation
  • src/mavedb/worker/jobs/external_services/hgvs.py — job query, loop, and target resolution logic
  • tests/worker/jobs/conftest.py — add multi-target fixtures (score set with 2+ targets, variants with prefixed HGVS)
  • tests/worker/jobs/external_services/test_hgvs.py — add multi-target test cases

Relevant code references

  • src/mavedb/lib/validation/dataframe/column.pyvalidate_variant_formatting() shows the target:hgvs parsing pattern
  • src/mavedb/lib/validation/dataframe/variant.pyvalidate_transgenic_variant() splits on : for fully-qualified variants
  • tests/helpers/constants.pyTEST_MINIMAL_MULTI_TARGET_SCORESET for multi-target test data patterns

Metadata

Metadata

Assignees

No one assigned

    Labels

    app: backendTask implementation touches the backendapp: workerTask implementation touches the workertype: enhancementEnhancement to an existing feature

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions