Add CMIP7 QC — per-variable rules for all 293 ESM1-6 variables + batch report integration#456
Open
rbeucher wants to merge 21 commits into
Open
Add CMIP7 QC — per-variable rules for all 293 ESM1-6 variables + batch report integration#456rbeucher wants to merge 21 commits into
rbeucher wants to merge 21 commits into
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #456 +/- ##
=======================================
+ Coverage 76.9% 77.9% +1.0%
=======================================
Files 31 33 +2
Lines 5974 6299 +325
Branches 1107 1176 +69
=======================================
+ Hits 4594 4905 +311
+ Misses 1131 1127 -4
- Partials 249 267 +18
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
- add CMIP7 QC validator module and packaged tas range rules - run QC automatically on CMORised CMIP7 outputs after write/repack - add moppy-qc CLI entrypoint and CLI tests - document notebook and CLI workflows in Sphinx docs
- Generate explicit QC rules for all 293 ACCESS-ESM1-6 mapped variables Each variable now has units, default min/max, and experiment-specific overrides (historical/piControl/ssp*) derived from mapping definitions. No more falling back to unit envelopes at runtime — everything is explicit. - Remove redundant unit_envelopes section from cmip7_ranges.yml They were only ever used to seed the per-variable generation, so keeping them in the YAML was just noise. - Add QC section to batch report (moppy_batch_report.json) When the batch run finishes, the report now includes a qc block with pass/fail counts and per-file failure details (observed vs allowed range). Can be disabled with MOPPY_SKIP_QC=1 or --skip-qc / skip_qc=True. - Add validate_cmip7_output_detailed() for non-raising validation Returns a ValidationResult dataclass so batch collection can gather failures without stopping on the first bad file. - Update docs to reflect explicit per-variable setup and batch QC
tasmax (monthly max temperature): ceiling set 5-10 K above tas since we expect warmer peaks — 340 K historical, 335 K piControl, 345 K ssp* tasmin (monthly min temperature): floor dropped 5 K below tas to allow colder night-time minimums — 175 K floor, 325/320/330 K ceiling per experiment
CMORised files can have lat_bnds/lon_bnds/time_bnds alongside the main data variable. The previous fallback only triggered when there was exactly one data_var, so files like tasmax with 4 vars (tasmax + 3 bounds) raised an error instead of selecting tasmax. Fix: filter out *_bnds variables before falling back to single-variable selection. Still raises if multiple non-bounds variables remain.
That test is about verifying the repack subprocess is called — it was written before QC was wired into the write path. The test fixture dataset doesn't have units on the data variable, so QC was raising. Patch out validate_cmip7_output since QC is already tested separately.
The widened tas range made the piControl regression tests pass when they should fail at 326.5 K. Put the narrower tas limits back in place so the existing QC behavior stays intact.
Added 15 new unit tests: QC Validation Tests (5 new): - test_validate_cmip7_output_requires_variable_id: Validates error handling for missing variable_id - test_validate_cmip7_output_requires_experiment_id: Validates error handling for missing experiment_id - test_validate_cmip7_output_detects_all_missing_values: Validates detection of all-NaN data - test_validate_cmip7_output_detects_infinity_values: Validates detection of infinity values - test_validate_cmip7_output_experiment_pattern_matching: Validates wildcard pattern matching for experiments Batch Report QC Integration Tests (10 new): - test_run_qc_on_output_folder_with_all_passing_files: QC passes with valid files - test_run_qc_on_output_folder_with_failing_files: QC detects invalid files - test_run_qc_on_output_folder_with_mixed_pass_fail: QC tallies mixed results - test_run_qc_on_output_folder_respects_moppy_skip_qc_env_var: MOPPY_SKIP_QC disables QC - test_run_qc_on_output_folder_with_no_nc_files: QC handles empty folders - test_run_qc_on_output_folder_with_nested_files: QC finds files in subdirectories - test_build_batch_report_includes_qc_section_by_default: QC section included by default - test_build_batch_report_omits_qc_section_when_skip_qc_true: skip_qc parameter works - test_write_batch_report_passes_skip_qc_to_build: write_batch_report passes skip_qc - test_run_qc_on_output_folder_includes_detailed_failure_info: Failure details properly included All 39 tests pass. Coverage improved from 52.2% to estimated 80%+ for changed code.
…y blank linespre-commit
fb56bd1 to
8c7325b
Compare
…1-6 and improve nominal resolution handling
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What's this?
QC (quality control) for CMIP7 CMORised output. The idea is simple: after we write a NetCDF file, we want to sanity-check that the data is physically plausible — right units, no junk values, temperature actually looks like temperature, etc.
Why?
We had some basic infra in place but it only really covered
taswith hardcoded experiment limits. This PR generalises that to all 293 ACCESS-ESM1-6 mapped variables and plugs QC results directly into the batch report so it's not a separate thing you have to remember to run.How does it work?
Per-variable rules (
cmip7_ranges.yml)Every variable in the ESM1-6 mapping now has an explicit entry with:
positivedirection)The ranges weren't invented — they come from the mapping definitions themselves. E.g.
evspsblsoihas unitskg m-2 s-1andpositive: up, so its minimum is clamped to 0 (upward flux can't be negative).taskeeps its custom hand-tuned limits. Everything else is auto-derived.Batch report integration
When
moppy-batch-reportruns, it now scans the output folder for.ncfiles and adds aqcblock to the JSON report:{ "qc": { "passed": 42, "failed": 2, "total": 44, "failures": [ { "file": "/output/tas.nc", "variable_id": "tas", "experiment_id": "piControl", "observed_range": [182.0, 329.4], "allowed_range": [180.0, 325.0], "units": "K" } ] } }If you don't want it, set
MOPPY_SKIP_QC=1or pass--skip-qcto the CLI.New
moppy-qcCLI (added in earlier commit on this branch)What's tested?
11 unit tests covering:
Files changed
src/access_moppy/qc/cmip7.py— main validator,validate_cmip7_output_detailed()for non-raising batch usesrc/access_moppy/resources/qc/cmip7_ranges.yml— 293 per-variable rule blockssrc/access_moppy/batch_report.py— QC section added to batch reporttests/unit/test_cmip7_qc.py— full unit test coveragedocs/source/qc_validation.rst— user-facing docs (notebook API, CLI, batch report, how to extend)