Updating data_config.json with HF links and hashes#8699
Updating data_config.json with HF links and hashes#8699e-mny wants to merge 8 commits intoProject-MONAI:devfrom
Conversation
Signed-off-by: enoch <65853622+e-mny@users.noreply.github.com>
Signed-off-by: Enoch Mok <65853622+e-mny@users.noreply.github.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughUpdated tests/testing_data/data_config.json: replaced Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/testing_data/data_config.json (1)
88-92: Migratenrrd_exampleto HuggingFace.The file exists on HuggingFace but the URL still points to GitHub releases. Update to
https://huggingface.co/datasets/MONAI/testing_data/resolve/main/CT_IMAGE_cropped.nrrdfor consistency with other entries.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (1)
tests/testing_data/data_config.json
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
- GitHub Check: quick-py3 (windows-latest)
- GitHub Check: quick-py3 (macOS-latest)
- GitHub Check: flake8-py3 (pytype)
- GitHub Check: flake8-py3 (codeformat)
- GitHub Check: packaging
- GitHub Check: quick-py3 (ubuntu-latest)
- GitHub Check: flake8-py3 (mypy)
- GitHub Check: min-dep-pytorch (2.8.0)
- GitHub Check: build-docs
- GitHub Check: min-dep-pytorch (2.6.0)
- GitHub Check: min-dep-pytorch (2.5.1)
- GitHub Check: min-dep-py3 (3.10)
- GitHub Check: min-dep-py3 (3.12)
- GitHub Check: min-dep-os (ubuntu-latest)
- GitHub Check: min-dep-pytorch (2.7.1)
- GitHub Check: min-dep-os (macOS-latest)
- GitHub Check: min-dep-py3 (3.11)
- GitHub Check: min-dep-py3 (3.9)
- GitHub Check: min-dep-os (windows-latest)
🔇 Additional comments (3)
tests/testing_data/data_config.json (3)
94-105: LGTM!Videos section fully migrated with consistent sha256 hashes.
106-162: LGTM!Models section fully migrated with consistent URL format and sha256 hashes.
163-169:test_meta_fileuses GitHub while model weights use HuggingFace.This is a JSON schema file (not test data) used for metadata validation. The inconsistency may be intentional—model weights were migrated to HuggingFace with sha256 hashes, while schema/infrastructure files remain on GitHub with md5. If migration is desired, note that the same URL is also referenced in
tests/testing_data/metadata.jsonand documentation, requiring coordinated updates.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
|
Hi @e-mny thanks for the help on this one! I think we're in the right direction, however the hash values mentioned in the file are for the file contents themselves and not the repository hashes on Huggingface. These shouldn't change then since the file contents shouldn't be different once downloaded. I think there isn't validation going on since the tests currently pass, specifically I think here and elsewhere the failing hash checks are being ignored (they produce an exception which gets tossed out by I would suggest changing the hashes back to their original values in the file but keep the new Huggingface URLs that you've got, and then check the files can be downloaded and hash checked correctly. This may need some digging around in the code to see how the file contents are used, the file is loaded in testing_data_config for example which is called in various places. You may want to add a test for hash download checking that uses this function and checks that One thing we may want to change later is to raise a different exception when the hash fails here. It should maybe raise something specific to this and so isn't suppressed by |
Updated hash types and values for various datasets in data_config.json. Signed-off-by: Enoch Mok <65853622+e-mny@users.noreply.github.com>
Add tests for downloading a sample TIFF file with hash validation. Signed-off-by: Enoch Mok <65853622+e-mny@users.noreply.github.com>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/test_utils.py`:
- Around line 187-212: The test has multiple issues: quote SAMPLE_TIFF to make
it a string literal, convert the module-level function test_download_url into a
unittest.TestCase method (e.g., class TestUtils(unittest.TestCase) with def
test_download_url(self):) so self.assertRaises is defined and test discovery
works, add a Google-style docstring to that test method, wrap the successful
download call with skip_if_downloading_fails() (or call that helper per existing
pattern) to avoid transient network failures, and keep the hash-mismatch
assertion using self.assertRaises(RuntimeError) while referencing the existing
download_url and SAMPLE_TIFF/SAMPLE_TIFF_HASH/SAMPLE_TIFF_HASH_TYPE symbols.
Ensure necessary imports (tempfile, os, skip_if_downloading_fails) are present.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 55631a61-878a-4de1-a45d-cd5949a0e8ec
📒 Files selected for processing (1)
tests/test_utils.py
There was a problem hiding this comment.
♻️ Duplicate comments (1)
tests/test_utils.py (1)
187-211:⚠️ Potential issue | 🔴 CriticalTest still broken: unquoted URL, stray
self, not a TestCase, no docstring.No changes since last round — same four blockers remain:
- Line 188: URL is a bare token →
SyntaxError(Ruff confirms).- Line 205:
self.assertRaisesin a module-level function →NameError.- Not inside a
unittest.TestCase, so not discovered.- Missing Google-style docstring (per coding guidelines).
Also per PR discussion:
skip_if_downloading_failsswallowsRuntimeErrorcontaining"md5 check"— fine here for sha256, but wrap only the success path so the mismatch assertion stays authoritative. And"wrong_hash"isn't valid hex for sha256; prefer"0" * 64.🛠️ Proposed fix
-SAMPLE_TIFF = ( - https://huggingface.co/datasets/MONAI/testing_data/resolve/main/CMU-1.tiff -) - -SAMPLE_TIFF_HASH = "73a7e89bc15576587c3d68e55d9bf92f09690280166240b48ff4b48230b13bcd" -SAMPLE_TIFF_HASH_TYPE = "sha256" - -def test_download_url(): - with tempfile.TemporaryDirectory() as tempdir: - # successful download with correct hash - download_url( - url=SAMPLE_TIFF, - filepath=os.path.join(tempdir, "model.tiff"), - hash_val=SAMPLE_TIFF_HASH, - hash_type=SAMPLE_TIFF_HASH_TYPE, - ) - - # fails with wrong hash - with self.assertRaises(RuntimeError): - download_url( - url=SAMPLE_TIFF, - filepath=os.path.join(tempdir, "model_bad.tiff"), - hash_val="wrong_hash", - hash_type=SAMPLE_TIFF_HASH_TYPE, - ) +SAMPLE_TIFF = "https://huggingface.co/datasets/MONAI/testing_data/resolve/main/CMU-1.tiff" +SAMPLE_TIFF_HASH = "73a7e89bc15576587c3d68e55d9bf92f09690280166240b48ff4b48230b13bcd" +SAMPLE_TIFF_HASH_TYPE = "sha256" + + +class TestDownloadUrl(unittest.TestCase): + """Exercise ``download_url`` success and hash-mismatch paths.""" + + def test_download_url(self): + """Download a sample TIFF and validate hash handling. + + Raises: + RuntimeError: When the downloaded file's hash does not match. + """ + with tempfile.TemporaryDirectory() as tempdir: + with skip_if_downloading_fails(): + download_url( + url=SAMPLE_TIFF, + filepath=os.path.join(tempdir, "model.tiff"), + hash_val=SAMPLE_TIFF_HASH, + hash_type=SAMPLE_TIFF_HASH_TYPE, + ) + with self.assertRaises(RuntimeError): + download_url( + url=SAMPLE_TIFF, + filepath=os.path.join(tempdir, "model_bad.tiff"), + hash_val="0" * 64, + hash_type=SAMPLE_TIFF_HASH_TYPE, + )As per coding guidelines: "Docstrings should be present for all definition..." and "Ensure new or modified definitions will be covered by existing or new unit tests."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_utils.py` around lines 187 - 211, The URL token SAMPLE_TIFF must be a quoted string and the test must be a TestCase method with a docstring; wrap only the successful download call with skip_if_downloading_fails so download problems don't hide the hash-mismatch assertion. Change SAMPLE_TIFF to a quoted string, create a unittest.TestCase subclass (e.g., class TestUtils(unittest.TestCase)) with a Google-style docstring, move test_download_url into it as def test_download_url(self):, inside use with skip_if_downloading_fails(): call download_url(url=SAMPLE_TIFF, filepath=..., hash_val=SAMPLE_TIFF_HASH, hash_type=SAMPLE_TIFF_HASH_TYPE) and then assert the bad-hash case with self.assertRaises(RuntimeError): download_url(..., hash_val="0"*64, hash_type=SAMPLE_TIFF_HASH_TYPE). Ensure you reference SAMPLE_TIFF, SAMPLE_TIFF_HASH, SAMPLE_TIFF_HASH_TYPE, test_download_url, skip_if_downloading_fails, and download_url when making the edits.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@tests/test_utils.py`:
- Around line 187-211: The URL token SAMPLE_TIFF must be a quoted string and the
test must be a TestCase method with a docstring; wrap only the successful
download call with skip_if_downloading_fails so download problems don't hide the
hash-mismatch assertion. Change SAMPLE_TIFF to a quoted string, create a
unittest.TestCase subclass (e.g., class TestUtils(unittest.TestCase)) with a
Google-style docstring, move test_download_url into it as def
test_download_url(self):, inside use with skip_if_downloading_fails(): call
download_url(url=SAMPLE_TIFF, filepath=..., hash_val=SAMPLE_TIFF_HASH,
hash_type=SAMPLE_TIFF_HASH_TYPE) and then assert the bad-hash case with
self.assertRaises(RuntimeError): download_url(..., hash_val="0"*64,
hash_type=SAMPLE_TIFF_HASH_TYPE). Ensure you reference SAMPLE_TIFF,
SAMPLE_TIFF_HASH, SAMPLE_TIFF_HASH_TYPE, test_download_url,
skip_if_downloading_fails, and download_url when making the edits.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 55570493-9ec6-46ab-85e1-95af310eeb5f
📒 Files selected for processing (1)
tests/test_utils.py
Signed-off-by: Enoch Mok <65853622+e-mny@users.noreply.github.com>
There was a problem hiding this comment.
♻️ Duplicate comments (2)
tests/test_utils.py (2)
187-189:⚠️ Potential issue | 🔴 CriticalQuote the HF URL string; current code is invalid Python.
Line 188 is a bare token, so this file cannot be parsed.
Proposed fix
-SAMPLE_TIFF = ( - https://huggingface.co/datasets/MONAI/testing_data/resolve/main/CMU-1.tiff -) +SAMPLE_TIFF = "https://huggingface.co/datasets/MONAI/testing_data/resolve/main/CMU-1.tiff"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_utils.py` around lines 187 - 189, The SAMPLE_TIFF constant is assigned an unquoted bare token which makes the module invalid; wrap the HF URL string in quotes (e.g., assign SAMPLE_TIFF = "https://huggingface.co/datasets/MONAI/testing_data/resolve/main/CMU-1.tiff") so SAMPLE_TIFF is a valid Python string literal and the file can be parsed.
194-211:⚠️ Potential issue | 🔴 CriticalFix test API usage and harden networked test behavior.
Line 205 uses
unittest.assertRaises, which is incorrect here; this test will fail at runtime. Also add a docstring and wrap the download path withskip_if_downloading_fails()to avoid flaky CI.Proposed fix
-def test_download_url(): - with tempfile.TemporaryDirectory() as tempdir: - # successful download with correct hash - download_url( - url=SAMPLE_TIFF, - filepath=os.path.join(tempdir, "model.tiff"), - hash_val=SAMPLE_TIFF_HASH, - hash_type=SAMPLE_TIFF_HASH_TYPE, - ) - - # fails with wrong hash - with unittest.assertRaises(RuntimeError): - download_url( - url=SAMPLE_TIFF, - filepath=os.path.join(tempdir, "model_bad.tiff"), - hash_val="wrong_hash", - hash_type=SAMPLE_TIFF_HASH_TYPE, - ) +class TestDownloadUrl(unittest.TestCase): + """Exercise download_url success and hash mismatch paths.""" + + def test_download_url(self): + """Validate correct hash acceptance and mismatch failure. + + Raises: + RuntimeError: If downloaded file hash validation fails. + """ + with skip_if_downloading_fails(), tempfile.TemporaryDirectory() as tempdir: + download_url( + url=SAMPLE_TIFF, + filepath=os.path.join(tempdir, "model.tiff"), + hash_val=SAMPLE_TIFF_HASH, + hash_type=SAMPLE_TIFF_HASH_TYPE, + ) + + with self.assertRaises(RuntimeError): + download_url( + url=SAMPLE_TIFF, + filepath=os.path.join(tempdir, "model_bad.tiff"), + hash_val="0" * 64, + hash_type=SAMPLE_TIFF_HASH_TYPE, + )As per coding guidelines, "Docstrings should be present for all definition..." and "Ensure new or modified definitions will be covered by existing or new unit tests."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_utils.py` around lines 194 - 211, The test_download_url function incorrectly uses unittest.assertRaises and lacks a docstring and flaky-network protection: add a one-line docstring to test_download_url, replace the incorrect unittest.assertRaises(...) call with the context-manager form from pytest (pytest.raises(RuntimeError):) for the "fails with wrong hash" case, and wrap the networked download_url calls (both the successful and failing attempts that reference SAMPLE_TIFF / SAMPLE_TIFF_HASH / SAMPLE_TIFF_HASH_TYPE and filepaths like "model.tiff"/"model_bad.tiff") in the skip_if_downloading_fails() context manager (import it if necessary) so CI skips the test if downloads fail.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@tests/test_utils.py`:
- Around line 187-189: The SAMPLE_TIFF constant is assigned an unquoted bare
token which makes the module invalid; wrap the HF URL string in quotes (e.g.,
assign SAMPLE_TIFF =
"https://huggingface.co/datasets/MONAI/testing_data/resolve/main/CMU-1.tiff") so
SAMPLE_TIFF is a valid Python string literal and the file can be parsed.
- Around line 194-211: The test_download_url function incorrectly uses
unittest.assertRaises and lacks a docstring and flaky-network protection: add a
one-line docstring to test_download_url, replace the incorrect
unittest.assertRaises(...) call with the context-manager form from pytest
(pytest.raises(RuntimeError):) for the "fails with wrong hash" case, and wrap
the networked download_url calls (both the successful and failing attempts that
reference SAMPLE_TIFF / SAMPLE_TIFF_HASH / SAMPLE_TIFF_HASH_TYPE and filepaths
like "model.tiff"/"model_bad.tiff") in the skip_if_downloading_fails() context
manager (import it if necessary) so CI skips the test if downloads fail.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e879430d-500e-439b-8bbd-e08fe424e8c8
📒 Files selected for processing (1)
tests/test_utils.py
Refactor test for download_url to use a class-based structure and improve hash validation. Signed-off-by: Enoch Mok <65853622+e-mny@users.noreply.github.com>
Fixes #8510
Description
The data_config.json needs to be updated with HF links in view of migrating to HF.
Types of changes