Validation_goldens modfication by niveditasing · Pull Request #2043 · datacommonsorg/data

niveditasing · 2026-05-28T05:51:46Z

Summary
This PR addresses data filtering bugs, malformed CSV output formatting, path resolution errors, and parsing crashes within the validation pipeline.

Analysis and Resolution

Data Filtering: Namespace Mismatches
Issue: The "must-include" filter file (top_100k_places.csv) loaded IDs without namespaces (e.g., "Earth"), whereas the input dataset (WorldBank.csv) utilized fully qualified names (e.g., "dcid:Earth"). This exact-match failure resulted in all nodes being filtered out (0 golden records generated).

Fix: Integrated strip_namespace() during the comparison step to normalize IDs, allowing the nodes to pass the filter correctly.

Output Formatting: Redundant Serialization in Single-Property CSVs
Issue: The utility function file_write_csv_dict serialized single-property data structures into a cluttered format containing stringified dictionaries (e.g., "{'Column name ': 'Column value'}") instead of a standard single-column layout.

Fix: Replaced the utility call with manual CSV writing to explicitly define the header (Output Column) and write clean, single-column values.

Path Resolution: Silent Mismatches and Execution Context Failures
Issue: Relative paths in validation_config.json resolved against the terminal's Current Working Directory (CWD). When executed from the repository root or via automated batch jobs (which run from a nested temporary folder in output/), the runner failed to locate the golden_data directory. This caused the system to silently load 0 expected records and report a false-positive PASSED status.

Fix (runner.py): Stored self.validation_config_path and self.stats_summary in the constructor. Implemented a dynamic path-climbing resolver in run_validations that traverses parent directories relative to the config path, summary path, or CWD to reliably locate the golden_data folder across both direct and batch execution contexts.

CSV Parsing: Whitespace Insensitivity and Malformed Value Crashes
Issue: The golden CSV file contained leading spaces in headers (e.g., ' StatVar'), causing mismatches against input files. Additionally, unquoted commas within list values (e.g., [InternationalDollar, USDollar]) caused the CSV reader to generate an extra column with a key of None. Sorting keys containing None triggered a Python TypeError crash.

Fix (validator_goldens.py): Introduced a dictionary comprehension sanitizer within load_nodes_from_file:

Python
cleaned_node = {k.strip(): v for k, v in node.items() if k is not None and isinstance(k, str) and k.strip() != ''}
This strips whitespace from headers and drops None keys to prevent parsing crashes.

Verification and Testing Performed:

gemini-code-assist

Code Review

This pull request updates validator_goldens.py to support namespace-stripped matching when generating goldens and replaces the helper function for writing CSVs with custom csv.DictWriter logic. A review comment points out that extracting CSV column headers from only the first node in golden_nodes assumes all nodes share the same keys, which can lead to errors or missing columns if keys vary. It suggests collecting the union of all keys across all nodes instead.

balit-raibot · 2026-05-28T06:35:26Z

@gemini-code-assist review

gemini-code-assist

Code Review

This pull request updates tools/import_validation/validator_goldens.py to import the csv module, strip namespaces when matching nodes against must_include_values, and replace the helper file_write_csv_dict with a custom csv.DictWriter implementation. The feedback suggests adding newline='' when opening the CSV file to prevent platform-specific carriage return issues and simplifying the logic for extracting and sorting unique keys.

niveditasing · 2026-06-05T12:41:36Z

@gemini-code-assist review

gemini-code-assist

Code Review

This pull request introduces path resolution logic for summary report rules in the validation runner, cleans up CSV node keys during loading, and updates golden file generation to use Python's standard csv.DictWriter. Feedback focuses on adhering to PEP 8 line length limits, replacing debug print statements with proper logging.debug calls, and opening CSV files with newline='' to ensure cross-platform compatibility on Windows.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

modified few lines

5f363e3

gemini-code-assist Bot reviewed May 28, 2026

View reviewed changes

Comment thread tools/import_validation/validator_goldens.py

niveditasing added 2 commits May 28, 2026 06:01

modified few lines

399873e

fixed test

bd3f90d

gemini-code-assist Bot reviewed May 28, 2026

View reviewed changes

Comment thread tools/import_validation/validator_goldens.py Outdated

niveditasing added 5 commits May 28, 2026 06:44

fixed test

4f14df9

fixed test

83b5647

fixed test

d88c10f

modified code

416c58b

Merge branch 'master' into validation_golden_fix

e9ed65a

gemini-code-assist Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread tools/import_validation/runner.py

Comment thread tools/import_validation/validator_goldens.py

Comment thread tools/import_validation/validator_goldens.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validation_goldens modfication#2043

Validation_goldens modfication#2043
niveditasing wants to merge 8 commits into
datacommonsorg:masterfrom
niveditasing:validation_golden_fix

niveditasing commented May 28, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

balit-raibot commented May 28, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

niveditasing commented Jun 5, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

niveditasing commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

balit-raibot commented May 28, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

niveditasing commented Jun 5, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

niveditasing commented May 28, 2026 •

edited

Loading