Skip to content

refactor: pipeline catalog with 3-digit prefix + 010-013 method-as-pipeline entries#39

Open
thc1006 wants to merge 8 commits intoQuantumNoLab:mainfrom
thc1006:refactor/pipeline-catalog
Open

refactor: pipeline catalog with 3-digit prefix + 010-013 method-as-pipeline entries#39
thc1006 wants to merge 8 commits intoQuantumNoLab:mainfrom
thc1006:refactor/pipeline-catalog

Conversation

@thc1006
Copy link
Copy Markdown
Member

@thc1006 thc1006 commented Apr 7, 2026

Summary

Restructures experiments/pipelines/ into a standardized benchmark catalog to support running every method × variant combination as a first-class pipeline entry. Two-tier layout:

  • 001-009 (renumbered from 01-09): ablation groups
  • 010-013 (new): each NQS method from src/qvartools/methods/nqs/ exposed as a runnable pipeline folder, with variants as separate scripts

Result: 33 pipelines in 13 folders, up from 26 pipelines in 9 folders. Room to grow (010-099 for more methods, 100-199 for cross-method sweeps, 200+ for HP sweeps).

Commits (6)

# SHA Description
1 73ab3c2 docs(adr-003): cross-reference #38 and correct AMD-HPC/amd-sbd 95x claim (it's on MI250X, not GB200; GB200 is 2.64x per arXiv:2601.16169)
2 9234326 fix: cherry-picked _IBM_SQD_AVAILABLE test patch from feat/e-pt2-correction so this branch has a clean test baseline on main
3 4ba5f00 refactor(pipelines): rename 9 folders + 9 YAMLs to 3-digit prefix, update run_all_pipelines.py hardcoded checks (line 326, 328), --only help text, and tests/test_run_all_pipelines.py
4 ae14c75 refactor(methods/nqs): extract build_autoregressive_nqs, extract_orbital_counts, validate_initial_basis into new _shared.py; add METHODS_REGISTRY to __init__.py
5 4654a7f feat(pipelines): add 010-013 wrapper scripts (7 total), 4 multi-section YAMLs, extend PIPELINES list to 33 entries. Bonus fix: nqs_sqd.py/nqs_skqd.py were calling mol_info["n_orbitals"] directly but get_molecule() doesn't populate that key — routed through extract_orbital_counts()
6 651beb4 docs(pipelines): update AGENTS.md, README.md, all RST/MD docs for 3-digit catalog + add Tier 2 documentation for 010-013

New pipeline catalog

Tier 1 — Ablation groups (001-009, unchanged in content)

All 26 existing scripts preserved via git mv (100% similarity retained). Only YAML header comments and the run_all_pipelines.py PIPELINES list strings changed.

Tier 2 — Method-as-pipeline catalog (010-013, new)

Folder Variants Backing method
010_hi_nqs_sqd/ default.py, pt2.py, ibm_off.py run_hi_nqs_sqd
011_hi_nqs_skqd/ default.py, ibm_on.py run_hi_nqs_skqd
012_nqs_sqd/ default.py run_nqs_sqd
013_nqs_skqd/ default.py run_nqs_skqd

Each folder has one multi-section YAML (configs/01X_<method>.yaml) where each top-level section corresponds to one variant script. Wrapper scripts follow a consistent ~80 LoC template and print output in the format that run_all_pipelines.py's regex parsers expect.

New public API: qvartools.methods.nqs.METHODS_REGISTRY — dict keyed by method id mapping to {run_fn, config_cls, iterative, has_krylov_expansion, has_ibm_solver, has_pt2_selection, supports_initial_basis, description, pipeline_folder}.

Test plan

  • ruff check src/ tests/ experiments/ — all checks passed
  • ruff format --check on all touched files — 46 files already formatted
  • pytest tests/test_methods/ tests/test_run_all_pipelines.py tests/test_nqs/test_adapters.py -m "not slow and not gpu and not pyscf"62 passed
  • Smoke python experiments/pipelines/run_all_pipelines.py h2 --only 001 005 012 013 --skip-quantum6/6 OK, 5/6 hit chemical accuracy; NQS-SQD/default stays at HF (20.5 mHa) as expected for the pure NQS ablation case
  • Direct smoke tests: 010_hi_nqs_sqd/default.py h2, 011_hi_nqs_skqd/default.py h2, 012_nqs_sqd/default.py h2, 013_nqs_skqd/default.py h2 — all run end-to-end, all hit exact FCI energy on H₂ except 012 (20.5 mHa = HF)
  • grep '0[1-9]_(dci|nf_dci|nf_only|hf_only|iterative_nqs|vqe)' across live tree (excluding .claude/worktrees/) — zero matches
  • (Reviewer) Build Sphinx docs locally: cd docs && make html
  • (Reviewer) Run full test suite with pyscf extras

Out of scope / deferred

Breaking changes

  • --only CLI argument on run_all_pipelines.py: now requires 3-digit prefixes. --only 01 02 no longer works; use --only 001 002. Updated in help text, docstring examples, and all docs.
  • Pipeline folder paths: any external scripts or CI workflows that hardcode 01_dci/... paths need updating. The repo's own CI doesn't reference these paths.

Related

thc1006 added 6 commits April 7, 2026 23:07
…sbd speedup

- Added "Update 2026-04-07" section noting that AMD-HPC/amd-sbd's 95x is
  measured on MI250X (AMD GPU), not GB200 (which is only 2.64x in the same
  paper). This significantly weakens the case for the C++/sbd path on NVIDIA
  hardware.
- Added cross-references to Issue QuantumNoLab#38 (CuPy factored-space Davidson, parallel
  path), and to memory/project_phase2b_davidson_commitment.md.
- Documented that Phase 2b proper Davidson is non-negotiable for multireference
  chemistry (Cr2, dissociation, open-shell) regardless of which backend wins.
Renames the 9 existing pipeline group folders and their YAML configs from
2-digit to 3-digit prefix to leave room for the upcoming 010+ method-as-pipeline
catalog (Issue: standardized open-source benchmark catalog).

Changes:
- git mv all 9 pipeline folders: 01_dci → 001_dci, ..., 09_vqe → 009_vqe
- git mv all 9 YAML configs to match
- Update first-line comment in each YAML to use new prefix
- Update PIPELINES list in run_all_pipelines.py (26 entries)
- Update --skip-iterative check from ("06","07","08") to ("006","007","008")
- Update --skip-quantum check from "09_vqe" to "009_vqe"
- Update --only help text and docstring example to 3-digit
- Update tests/test_run_all_pipelines.py (4 references to 09_vqe)

This is a pure rename + reference update. No behaviour change. All 3 tests in
test_run_all_pipelines.py still pass.

Doc updates (AGENTS.md, README.md, RST files) are scheduled for Phase 4 of
this refactor in a separate commit.
Tests use mock hamiltonian without integrals; IBM solver auto-detect
tries to access hamiltonian.integrals.h1e causing AttributeError.
Light additive refactor of the four NQS methods to remove duplication and
expose a registry that the upcoming 010+ pipeline wrappers (and any future
benchmark catalog tooling) can use to dispatch by method id.

Changes:
- Add src/qvartools/methods/nqs/_shared.py with three helpers extracted
  from the four method runners (build_autoregressive_nqs,
  extract_orbital_counts, validate_initial_basis). Bit-equivalent to
  the inlined code that was there before.
- Update nqs_sqd.py, nqs_skqd.py, hi_nqs_sqd.py, hi_nqs_skqd.py to use
  the new helpers. No behaviour change, no API change, no signature
  change.
- Add METHODS_REGISTRY dict to methods/nqs/__init__.py mapping
  {nqs_sqd, nqs_skqd, hi_nqs_sqd, hi_nqs_skqd} -> run_fn, config_cls,
  metadata flags (iterative/has_krylov_expansion/has_ibm_solver/
  has_pt2_selection/supports_initial_basis), description, and
  pipeline_folder.

Verified:
- All 49 tests in tests/test_methods/ pass (was 46 + 3 cherry-picked
  failures from main; cherry-pick of 3f39fe2 fixed the 3).
- 62 tests across test_run_all_pipelines + test_methods + test_nqs/
  test_adapters all pass.
- Zero changes to algorithmic logic, exception types, return shapes,
  or function signatures.
Adds the four NQS methods from src/qvartools/methods/nqs/ as first-class
runnable pipeline catalog entries, alongside the existing 001-009 ablation
groups. Follows the convention "one method = one folder, variants = multiple
.py inside the folder, one multi-section YAML per method".

New pipelines (7 scripts in 4 folders):

- 010_hi_nqs_sqd/  (HI+NQS+SQD, iterative, optional PT2/IBM)
  * default.py    — baseline, auto IBM detect
  * pt2.py        — use_pt2_selection=True + temperature annealing
  * ibm_off.py    — use_ibm_solver=False (force GPU fallback)

- 011_hi_nqs_skqd/ (HI+NQS+SKQD, iterative + Krylov expansion)
  * default.py    — baseline (use_ibm_solver=False)
  * ibm_on.py     — use_ibm_solver=True + S-CORE recovery

- 012_nqs_sqd/    (two-stage NQS+SQD, no iteration)
  * default.py

- 013_nqs_skqd/   (two-stage NQS+SKQD with Krylov expansion)
  * default.py

Each YAML has one section per variant. Each wrapper script is ~80 LoC, uses
the existing experiments/config_loader.py for CLI+YAML, calls the
qvartools.methods.nqs.run_*() function, and prints results in the format
that run_all_pipelines.py's regex parsers expect (Best energy, Final energy,
Wall time).

run_all_pipelines.py PIPELINES list extended from 26 to 33 entries.

Bonus fix discovered during smoke testing: nqs_sqd.py and nqs_skqd.py used
mol_info["n_orbitals"] directly, but get_molecule() does NOT populate that
key — it must come from hamiltonian.integrals. The HI methods avoided this
via extract_orbital_counts(); the simple methods did not. Fixed by routing
both simple methods through extract_orbital_counts() too. This is a
pre-existing bug that nothing was exercising before; now fixed and verified.

Smoke tests on H2 (CPU):
- 010 hi_nqs_sqd/default:   E=-1.1372838345 Ha (0.0 mHa error)  10.5s
- 011 hi_nqs_skqd/default:  E=-1.1372838345 Ha (0.0 mHa error)  10.6s
- 012 nqs_sqd/default:      E=-1.1167593074 Ha (20.5 mHa error) 74.6s  (HF, expected — pure NQS ablation)
- 013 nqs_skqd/default:     E=-1.1372838345 Ha (0.0 mHa error)  44.4s

All 52 tests in tests/test_methods/ + tests/test_run_all_pipelines.py pass.
ruff check + ruff format clean.

Note: 010_hi_nqs_sqd/correction.py (compute_pt2_correction=True variant) is
intentionally deferred — that config field only exists on the
feat/e-pt2-correction branch (PR QuantumNoLab#37). Will add as a follow-up commit after
PR QuantumNoLab#37 merges to main and this branch rebases.
…tion

Phase 4 of the pipeline catalog refactor: update every doc file that
references the old 0X_ pipeline paths to use the new 3-digit names, and
add a new section documenting the 010-013 method-as-pipeline catalog.

Files updated:
- AGENTS.md
  * directory tree: list all 13 pipeline folders (001-009 + 010-013)
  * config_loader pattern examples use 002_nf_dci paths and 010 wrappers
  * "24 Pipeline Variants" → "33 Pipeline Variants"
- README.md
  * Quick start examples show both 001_dci and 010_hi_nqs_sqd usage
  * --only example uses 3-digit prefixes
- docs/experiments_guide.md
  * All Group 01-09 headers renumbered to 001-009
  * All 24 ablation script paths updated to new prefixes
  * --only example: 01 02 04 → 001 002 004
  * "all 24 pipelines" → "all 33 pipelines"
  * NEW Tier 2 section documenting 010-013 method-as-pipeline catalog
    with per-group variant tables and the numbering convention
- docs/source/user_guide/pipelines.rst
  * Restructured to show "Tier 1: ablation groups (001-009)" and
    "Tier 2: method-as-pipeline catalog (010-013)"
  * Added Tier 2 table with method ids, variants, and METHODS_REGISTRY note
  * Added "Numbering convention" subsection (001-009 / 010-099 / 100-199 / 200+)
  * Updated iterative pipeline code example (mol_info workaround removed —
    extract_orbital_counts handles it now)
  * CLI examples updated to 3-digit
- docs/source/user_guide/yaml_configs.rst
  * Tier 1 / Tier 2 split for the YAML file table
  * Added 009_vqe.yaml + the 4 new 010-013 multi-section YAMLs
  * CLI examples updated to 3-digit
- docs/source/tutorials/h2_pipeline.rst
- docs/source/tutorials/beh2_pipeline.rst
  * CLI examples updated to 3-digit
  * h2 tutorial: added a 010-013 example
- docs/decisions/002-eliminate-torch-numpy-roundtrips.md
  * Updated the validation script paths and "24/24" → "33/33"

Verified: grep for 0[1-9]_(dci|nf_dci|nf_only|hf_only|iterative_nqs|vqe)
in the live tree (excluding .claude/worktrees) returns zero matches.
Copilot AI review requested due to automatic review settings April 7, 2026 16:43
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR restructures experiments/pipelines/ into a 3-digit, standardized benchmark catalog and adds a second “method-as-pipeline” tier (010–013) so each NQS method/variant is runnable as a first-class pipeline entry. It also refactors shared NQS runner helpers into a new internal module and updates docs/tests to match the new catalog.

Changes:

  • Renumber pipeline groups to 3-digit prefixes and extend run_all_pipelines.py to include 010–013 entries (33 total).
  • Add 010–013 method wrapper scripts + multi-section YAML configs, and introduce qvartools.methods.nqs.METHODS_REGISTRY.
  • Extract shared NQS runner utilities into src/qvartools/methods/nqs/_shared.py and adjust NQS runners accordingly.

Reviewed changes

Copilot reviewed 38 out of 64 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
tests/test_run_all_pipelines.py Updates VQE pipeline paths to 3-digit 009_vqe/....
tests/test_methods/test_initial_basis.py Adds patches to disable IBM SQD availability during initial-basis tests.
src/qvartools/methods/nqs/nqs_sqd.py Uses shared helpers for orbital counts + NQS construction.
src/qvartools/methods/nqs/nqs_skqd.py Uses shared helpers for orbital counts + NQS construction.
src/qvartools/methods/nqs/hi_nqs_sqd.py Uses shared helpers for orbital counts, NQS build, and initial-basis validation.
src/qvartools/methods/nqs/hi_nqs_skqd.py Uses shared helpers for orbital counts, NQS build, and initial-basis validation.
src/qvartools/methods/nqs/_shared.py New internal helper module for NQS runner common logic.
src/qvartools/methods/nqs/init.py Adds METHODS_REGISTRY and expands module docstring/exports.
experiments/pipelines/run_all_pipelines.py Renames groups to 3-digit prefixes and registers new 010–013 pipelines.
experiments/pipelines/configs/001_dci.yaml Updates YAML header to 3-digit group prefix.
experiments/pipelines/configs/002_nf_dci.yaml Updates YAML header to 3-digit group prefix.
experiments/pipelines/configs/003_nf_dci_pt2.yaml Updates YAML header to 3-digit group prefix.
experiments/pipelines/configs/004_nf_only.yaml Updates YAML header to 3-digit group prefix.
experiments/pipelines/configs/005_hf_only.yaml Updates YAML header to 3-digit group prefix.
experiments/pipelines/configs/006_iterative_nqs.yaml Updates YAML header to 3-digit group prefix.
experiments/pipelines/configs/007_iterative_nqs_dci.yaml Updates YAML header to 3-digit group prefix.
experiments/pipelines/configs/008_iterative_nqs_dci_pt2.yaml Updates YAML header to 3-digit group prefix.
experiments/pipelines/configs/009_vqe.yaml Updates YAML header to 3-digit group prefix.
experiments/pipelines/configs/010_hi_nqs_sqd.yaml New multi-section YAML for 010 HI+NQS+SQD variants.
experiments/pipelines/configs/011_hi_nqs_skqd.yaml New multi-section YAML for 011 HI+NQS+SKQD variants.
experiments/pipelines/configs/012_nqs_sqd.yaml New YAML for 012 NQS+SQD wrapper.
experiments/pipelines/configs/013_nqs_skqd.yaml New YAML for 013 NQS+SKQD wrapper.
experiments/pipelines/010_hi_nqs_sqd/default.py New 010 wrapper script (default variant).
experiments/pipelines/010_hi_nqs_sqd/pt2.py New 010 wrapper script (pt2 variant).
experiments/pipelines/010_hi_nqs_sqd/ibm_off.py New 010 wrapper script (force IBM off variant).
experiments/pipelines/011_hi_nqs_skqd/default.py New 011 wrapper script (default variant).
experiments/pipelines/011_hi_nqs_skqd/ibm_on.py New 011 wrapper script (force IBM on variant).
experiments/pipelines/012_nqs_sqd/default.py New 012 wrapper script (default variant).
experiments/pipelines/013_nqs_skqd/default.py New 013 wrapper script (default variant).
experiments/pipelines/009_vqe/vqe_uccsd.py Adds runnable VQE-UCCSD pipeline script under 009_vqe/.
experiments/pipelines/009_vqe/vqe_adapt.py Adds runnable ADAPT-VQE pipeline script under 009_vqe/.
experiments/pipelines/001_dci/dci_krylov_classical.py 3-digit catalog path for existing DCI classical pipeline.
experiments/pipelines/001_dci/dci_krylov_quantum.py 3-digit catalog path for existing DCI quantum pipeline.
experiments/pipelines/001_dci/dci_sqd.py 3-digit catalog path for existing DCI SQD pipeline.
experiments/pipelines/002_nf_dci/nf_dci_krylov_classical.py 3-digit catalog path for existing NF+DCI classical pipeline.
experiments/pipelines/002_nf_dci/nf_dci_krylov_quantum.py 3-digit catalog path for existing NF+DCI quantum pipeline.
experiments/pipelines/004_nf_only/nf_krylov_classical.py 3-digit catalog path for existing NF-only classical pipeline.
experiments/pipelines/004_nf_only/nf_sqd.py 3-digit catalog path for existing NF-only SQD pipeline.
experiments/pipelines/005_hf_only/hf_krylov_classical.py 3-digit catalog path for existing HF-only classical pipeline.
experiments/pipelines/005_hf_only/hf_krylov_quantum.py 3-digit catalog path for existing HF-only quantum pipeline.
experiments/pipelines/005_hf_only/hf_sqd.py 3-digit catalog path for existing HF-only SQD pipeline.
experiments/pipelines/006_iterative_nqs/iter_nqs_krylov_classical.py 3-digit catalog path for existing iterative NQS classical pipeline.
experiments/pipelines/006_iterative_nqs/iter_nqs_krylov_quantum.py 3-digit catalog path for existing iterative NQS quantum pipeline.
experiments/pipelines/006_iterative_nqs/iter_nqs_sqd.py 3-digit catalog path for existing iterative NQS SQD pipeline.
experiments/pipelines/007_iterative_nqs_dci/iter_nqs_dci_krylov_classical.py 3-digit catalog path for existing NF+DCI→iter NQS classical pipeline.
experiments/pipelines/007_iterative_nqs_dci/iter_nqs_dci_krylov_quantum.py 3-digit catalog path for existing NF+DCI→iter NQS quantum pipeline.
experiments/pipelines/007_iterative_nqs_dci/iter_nqs_dci_sqd.py 3-digit catalog path for existing NF+DCI→iter NQS SQD pipeline.
experiments/pipelines/008_iterative_nqs_dci_pt2/iter_nqs_dci_pt2_krylov_classical.py 3-digit catalog path for existing NF+DCI+PT2→iter NQS classical pipeline.
experiments/pipelines/008_iterative_nqs_dci_pt2/iter_nqs_dci_pt2_krylov_quantum.py 3-digit catalog path for existing NF+DCI+PT2→iter NQS quantum pipeline.
experiments/pipelines/008_iterative_nqs_dci_pt2/iter_nqs_dci_pt2_sqd.py 3-digit catalog path for existing NF+DCI+PT2→iter NQS SQD pipeline.
README.md Updates pipeline paths/examples and documents method-as-pipeline usage.
docs/source/user_guide/yaml_configs.rst Updates config filenames + documents Tier 2 multi-section YAMLs.
docs/source/user_guide/pipelines.rst Updates pipeline catalog docs for 3-digit scheme and Tier 2.
docs/source/tutorials/h2_pipeline.rst Updates tutorial commands to new 3-digit catalog and pipeline counts.
docs/source/tutorials/beh2_pipeline.rst Updates tutorial commands to new 3-digit catalog.
docs/experiments_guide.md Updates experiment guide to 3-digit prefixes and adds Tier 2 documentation.
docs/decisions/003-gpu-native-sbd-integration.md Adds 2026-04-07 update section and Issue #38 cross-reference.
docs/decisions/002-eliminate-torch-numpy-roundtrips.md Updates expected pipeline counts and example paths.
AGENTS.md Updates repository guide to reflect new catalog layout and pipeline counts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Bug 1 — experiments/pipelines/run_all_pipelines.py:376
    --skip-iterative was only skipping groups 006/007/008 but 010_hi_nqs_sqd
    and 011_hi_nqs_skqd are also iterative (HI methods). Added 010/011 to
    the prefix check. Verified: all 5 iterative groups now listed in skipped
    set when --skip-iterative is active.

Bugs 2-8 — 7 wrapper scripts at line 56 (YAML section precedence)
    load_config() mutates args.device and args.molecule with merged
    defaults (line 125 of config_loader.py: setattr for every merged key),
    so the wrapper's `args.device or section.get("device")` short-circuited
    on args.device="auto" and silently ignored the YAML section's device
    value. Fixed by importing _get_explicit_cli_args from config_loader
    and only using the CLI arg when the user explicitly typed --device on
    the command line. Same fix applied to --molecule (positional). Verified:
    running with --device cpu correctly overrides section; running without
    --device correctly picks up the section value (auto→cuda detection).

Nit 9 — src/qvartools/methods/nqs/__init__.py docstring
    Docstring claimed METHODS_REGISTRY was "used by the 010-013 wrapper
    scripts so they can dispatch by name without hard-importing each method
    module", but the wrappers were doing direct imports. Fixed architecturally:
    all 7 wrappers now dispatch via METHODS_REGISTRY[METHOD_KEY] and only
    import METHODS_REGISTRY (not individual config/run pairs). The __init__.py
    itself still imports all four methods because the registry needs
    callable references — that's a Python fundamental, not a design flaw.

Nit 10 — src/qvartools/methods/nqs/_shared.py docstring
    Docstring said "behaviour-preserving extractions only — no API changes",
    but extract_orbital_counts adds the mol_info→hamiltonian.integrals
    fallback to nqs_sqd/nqs_skqd, which fixes a pre-existing bug in those
    runners (they were end-to-end broken because they accessed
    mol_info["n_orbitals"] directly but get_molecule doesn't populate it).
    Updated docstring to distinguish the three scopes: (1) build_autoregressive_nqs
    = pure refactor, (2) validate_initial_basis = pure refactor, (3)
    extract_orbital_counts = pure refactor for HI methods + bug fix for
    simple methods.

Testing:
- 72 tests pass (test_methods + test_run_all_pipelines + test_config_loader)
  — up from 62 in the previous commit (test_config_loader added to the
  test surface this round)
- ruff check + ruff format clean
- Smoke tested 010_hi_nqs_sqd/default.py h2: reaches FCI exactly (0.0 mHa)
- Smoke tested 012_nqs_sqd/default.py h2 both with --device cpu
  (CLI override works) and without --device (section default picked up,
  auto→cuda detection)
- --skip-iterative correctly lists all 5 iterative groups
  (006,007,008,010,011) in the skipped set

Out of scope (pre-existing, not caused by this PR):
- 007_iterative_nqs_dci/iter_nqs_dci_sqd.py and iter_nqs_dci_krylov_classical.py
  fail with "initial_basis must be integer or bool dtype ... got torch.float32".
  Verified these same scripts fail identically on main (stash-checkout-unstash
  test). Pre-existing bug in the 007 group's handling of initial_basis.
  Should be filed as a separate issue; not touched in this PR.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 38 out of 64 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

thc1006 added a commit to thc1006/qvartools that referenced this pull request Apr 7, 2026
`FlowGuidedKrylovPipeline.extract_and_select_basis()` returns a float32
tensor (cloned from the NF trainer's `accumulated_basis`, which lives in
float for gradient tracking). But `run_hi_nqs_sqd(initial_basis=...)` and
`run_hi_nqs_skqd(initial_basis=...)` strictly validate that
`initial_basis` is integer/bool dtype (binary occupation vectors) and
raise ValueError on float.

This caused both `iter_nqs_dci_sqd.py` and `iter_nqs_dci_krylov_classical.py`
to fail end-to-end with:

    ValueError: initial_basis must be integer or bool dtype
    (binary occupations), got torch.float32

at the `run_hi_nqs_sqd`/`run_hi_nqs_skqd` call site.

Interestingly, the parallel scripts in group 08 (iter_nqs_dci_pt2_*) work
because their basis is passed through `expand_basis_via_connections()`
first, which happens to cast internally. The 007 scripts do not have that
intermediate step, so they hit the bug raw.

Fix: add `.long()` cast to the `basis` result in the two affected scripts,
with a short comment explaining why. Both scripts now reach chemical
accuracy on H2:

    07_iterative_nqs_dci/iter_nqs_dci_sqd.py           h2 cpu 11.9s  0.0 mHa
    07_iterative_nqs_dci/iter_nqs_dci_krylov_classical h2 cpu  9.3s  0.0 mHa

Scope:
- Surgical script-level fix (2 files, 2-line changes each including
  comment). Does not change library-level API contracts.
- The 3rd script in the group (`iter_nqs_dci_krylov_quantum.py`) does not
  use `initial_basis` at all — unaffected.
- Related architectural issue (float32 basis at API boundary with int-only
  validator) could eventually be fixed by either (a) making
  `extract_and_select_basis` return long, or (b) making
  `validate_initial_basis` lenient for float {0,1} values. Out of scope
  for this minimal fix.

Discovered via end-to-end smoke test on `refactor/pipeline-catalog` branch
(PR QuantumNoLab#39). Verified that the failure is pre-existing on main before writing
this fix.
@thc1006 thc1006 requested a review from Copilot April 7, 2026 17:56
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 38 out of 64 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… items

Option C: fix all 8 items from my own deep review on PR QuantumNoLab#39 plus Copilot's
one remaining round-2 finding, then re-ran the full smoke + lint + test net
and did a second ultrathink code review before commit.

1. docs/source/user_guide/pipelines.rst  (Copilot round 2)
   Tier 2 table rows now use the actual on-disk folder names with
   underscore and monospace formatting (``010_hi_nqs_sqd`` etc.), not
   the human-facing "010 hi_nqs_sqd" form. Added a new
   "Programmatic dispatch via ``METHODS_REGISTRY``" subsection
   documenting all 9 registry fields + an idiomatic code example.
   Rewrote the Iterative Pipelines example to show the registry
   dispatch pattern (matches how the 010-013 wrappers now work) while
   noting that direct imports still work.

2. CHANGELOG.md  (P0: user-visible breaking changes were undocumented)
   Added entries under Changed (2x BREAKING for rename + --only),
   Added (Tier 2 catalog, METHODS_REGISTRY, _shared.py, public
   get_explicit_cli_args), and Fixed (nqs_sqd/nqs_skqd mol_info bug
   rediscovered via smoke test).

3. experiments/pipelines/run_all_pipelines.py  (P0: --only silent failure)
   Detect pre-2026-04-07 2-digit --only format and print a migration
   warning to stderr with the recommended 3-digit form. "--only 01 02"
   now emits "use --only 001 002 instead" instead of silently running
   0 pipelines. Verified at runtime.

4. experiments/config_loader.py  (P1: private symbol imported externally)
   Renamed _get_explicit_cli_args -> get_explicit_cli_args (public name
   is now primary). Added _get_explicit_cli_args as a backward-compat
   alias at module bottom so tests/test_config_loader.py keeps working
   without changes. Docstring updated to explain the alias. Internal
   call in load_config() uses the public name.

5-6. 7 wrapper scripts in 010-013 (P1: multi-section YAML silent fallback)
   Regenerated via template with two new module-level helpers:
   - _resolve_section(cfg, variant): extracts the requested section
     from multi-section YAML, warns when falling back to flat cfg if
     OTHER sections exist (indicating the user picked the wrong script
     or the wrong --config file)
   - _build_config_kwargs(section, config_cls, device): filters section
     keys to dataclass fields, warns on unknown keys (catches YAML
     typos like "n_itrations" silently being dropped)
   Also switched to importing get_explicit_cli_args (public name).
   Verified at runtime: wrong-section fallback warns, typo'd YAML key
   warns, default path still reaches FCI exactly on H2.

7. docs/source/user_guide/pipelines.rst  (P2: METHODS_REGISTRY undocumented)
   Covered by #1 above — new subsection documents the registry.

8. docs/source/user_guide/pipelines.rst  (P2: iterative example inconsistent
   with wrapper pattern)
   Covered by #1 above — iterative example now uses registry dispatch.

Verification after all 8 fixes:
- 61 tests pass (tests/test_methods/ + tests/test_run_all_pipelines.py +
  tests/test_config_loader.py -m "not slow and not gpu and not pyscf")
- ruff check src/ tests/ experiments/ — clean
- ruff format --check src/ tests/ experiments/ — 206 files already formatted
- Runtime smoke tests on H2/CPU:
  * --only 01 02 emits migration warning (verified)
  * multi-section YAML with missing section emits fallback warning (verified)
  * YAML with typo'd key emits unknown-key warning (verified)
  * 010_hi_nqs_sqd/default.py h2 still reaches FCI (0.0 mHa, ~9s)
  * run_all_pipelines.py --only 001 005 012 013 --skip-quantum: 6/6 pass
- All 7 wrappers import cleanly with correct METHOD_KEY/VARIANT constants
- nqs_sqd/nqs_skqd still use extract_orbital_counts (grep confirmed)
- RST heading "Programmatic dispatch via ``METHODS_REGISTRY``" is 46 chars,
  underline is 46 ^ (Sphinx requires >= title length, exact match OK)
- METHODS_REGISTRY 9 fields enumerated in docs match __init__.py definition

Second code review findings (all minor, none blocking):
- _resolve_section + _build_config_kwargs helpers duplicated across 7
  wrapper files (~280 lines). Could extract to shared _wrapper_helpers.py
  module in a follow-up. Not a bug.
- --only warning does not catch single-digit --only 1 2. Low impact.
- _resolve_section's isinstance(v, dict) filter could miss dict-valued
  config fields. No current NQS config has dict fields, so no collision.
- CHANGELOG get_explicit_cli_args entry re-worded to clarify the public
  name is now primary, not the other way around.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants