fix: cast initial_basis to long in 07_iterative_nqs_dci scripts (fixes #40)#41
Open
thc1006 wants to merge 1 commit intoQuantumNoLab:mainfrom
Open
fix: cast initial_basis to long in 07_iterative_nqs_dci scripts (fixes #40)#41thc1006 wants to merge 1 commit intoQuantumNoLab:mainfrom
thc1006 wants to merge 1 commit intoQuantumNoLab:mainfrom
Conversation
`FlowGuidedKrylovPipeline.extract_and_select_basis()` returns a float32
tensor (cloned from the NF trainer's `accumulated_basis`, which lives in
float for gradient tracking). But `run_hi_nqs_sqd(initial_basis=...)` and
`run_hi_nqs_skqd(initial_basis=...)` strictly validate that
`initial_basis` is integer/bool dtype (binary occupation vectors) and
raise ValueError on float.
This caused both `iter_nqs_dci_sqd.py` and `iter_nqs_dci_krylov_classical.py`
to fail end-to-end with:
ValueError: initial_basis must be integer or bool dtype
(binary occupations), got torch.float32
at the `run_hi_nqs_sqd`/`run_hi_nqs_skqd` call site.
Interestingly, the parallel scripts in group 08 (iter_nqs_dci_pt2_*) work
because their basis is passed through `expand_basis_via_connections()`
first, which happens to cast internally. The 007 scripts do not have that
intermediate step, so they hit the bug raw.
Fix: add `.long()` cast to the `basis` result in the two affected scripts,
with a short comment explaining why. Both scripts now reach chemical
accuracy on H2:
07_iterative_nqs_dci/iter_nqs_dci_sqd.py h2 cpu 11.9s 0.0 mHa
07_iterative_nqs_dci/iter_nqs_dci_krylov_classical h2 cpu 9.3s 0.0 mHa
Scope:
- Surgical script-level fix (2 files, 2-line changes each including
comment). Does not change library-level API contracts.
- The 3rd script in the group (`iter_nqs_dci_krylov_quantum.py`) does not
use `initial_basis` at all — unaffected.
- Related architectural issue (float32 basis at API boundary with int-only
validator) could eventually be fixed by either (a) making
`extract_and_select_basis` return long, or (b) making
`validate_initial_basis` lenient for float {0,1} values. Out of scope
for this minimal fix.
Discovered via end-to-end smoke test on `refactor/pipeline-catalog` branch
(PR QuantumNoLab#39). Verified that the failure is pre-existing on main before writing
this fix.
There was a problem hiding this comment.
Pull request overview
Fixes an end-to-end runtime failure in the 07_iterative_nqs_dci experiment scripts by ensuring the NF/DCI-selected basis is cast to an integer dtype before being passed as initial_basis to HI+NQS runners that strictly validate integer/bool occupations.
Changes:
- Cast
pipeline.extract_and_select_basis()output totorch.longiniter_nqs_dci_sqd.pybefore callingrun_hi_nqs_sqd(initial_basis=...). - Cast
pipeline.extract_and_select_basis()output totorch.longiniter_nqs_dci_krylov_classical.pybefore callingrun_hi_nqs_skqd(initial_basis=...). - Add short inline comments documenting the dtype mismatch and why the cast is required.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| experiments/pipelines/07_iterative_nqs_dci/iter_nqs_dci_sqd.py | Cast extracted basis to long so run_hi_nqs_sqd(initial_basis=...) passes dtype validation. |
| experiments/pipelines/07_iterative_nqs_dci/iter_nqs_dci_krylov_classical.py | Cast extracted basis to long so run_hi_nqs_skqd(initial_basis=...) passes dtype validation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Minimal surgical fix for #40: add
.long()cast to thebasistensor before passing it asinitial_basisin the two affected 07_iterative_nqs_dci scripts.Closes #40.
Affected files (2)
experiments/pipelines/07_iterative_nqs_dci/iter_nqs_dci_sqd.pyexperiments/pipelines/07_iterative_nqs_dci/iter_nqs_dci_krylov_classical.pyDiff summary
Each file gets one extra
.long()call + a 4-line explanatory comment:Total: 10 insertions, 2 deletions across 2 files. No library changes, no API changes.
Why this scope
See issue #40 for the full root-cause analysis. Short version:
extract_and_select_basis()returns float32 (cloned from NF trainer's accumulated_basis), but the downstreamrun_hi_nqs_sqd/run_hi_nqs_skqdstrictly validate integer/bool dtype. 008's sister scripts work becauseexpand_basis_via_connections()happens to cast internally; 007 doesn't have that intermediate step.Two cleaner long-term architectural fixes (cast at the library source, or relax the validator) are noted in the issue as out-of-scope. This PR is the minimum unblocking patch.
Test plan
Manual end-to-end verification on H2/CPU:
ValueError: initial_basis must be integer or bool dtype ... got torch.float32iter_nqs_dci_sqd.py h2 --device cpu→ Error 0.0 mHa, 11.88 siter_nqs_dci_krylov_classical.py h2 --device cpu→ Error 0.0 mHa, 9.32 sruff check experiments/pipelines/07_iterative_nqs_dci/— cleanruff format --check— cleanextract_and_select_basisor lenient validator)Relationship to PR #39
PR #39 (
refactor/pipeline-catalog) renames07_iterative_nqs_dci/→007_iterative_nqs_dci/viagit mv. This PR modifies file content (not paths) in07_iterative_nqs_dci/. The two PRs touch disjoint aspects of the same files, so git's rename detection should handle the rebase cleanly regardless of merge order.Recommended merge order:
Either order works.