Update tools for agent efficacy wrapper#2059
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
There was a problem hiding this comment.
Code Review
This pull request introduces an agent efficacy evaluation tool, including an HTML dashboard, a calculation script (calculate_efficacy.py), and documentation. It also adds support for resolved places in the pvmap_generator and stat_var_processor, and introduces a test runner for statvar imports. The review feedback highlights several critical runtime issues, including module-level functions in stat_var_test_runner.py that incorrectly include a self parameter, missing imports and typoed function calls in pvmap_generator_metrics.py, and a hardcoded PROJECT_ROOT path. Additionally, storing a mutable dictionary by reference in stat_var_processor.py should be replaced with a shallow copy to avoid shared state bugs, and static methods should be properly decorated.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| return return_status | ||
|
|
||
|
|
||
| def merge_args(self, args: list[dict]) -> dict: |
There was a problem hiding this comment.
The function merge_args is defined at the module level (outside of any class), but it includes self as its first parameter. When called as merge_args([...]) with a single argument, Python will bind the argument to self and raise a TypeError because the args parameter is missing. Please remove the unused self parameter.
| def merge_args(self, args: list[dict]) -> dict: | |
| def merge_args(args: list[dict]) -> dict: |
| return merged_args | ||
|
|
||
|
|
||
| def get_args_list(self, args: dict) -> list: |
There was a problem hiding this comment.
The function get_args_list is defined at the module level, but it includes self as its first parameter. This will cause a TypeError when called as get_args_list(args). Please remove the unused self parameter.
| def get_args_list(self, args: dict) -> list: | |
| def get_args_list(args: dict) -> list: |
| return cmd_args | ||
|
|
||
|
|
||
| def get_arg(self, args_dict: dict, arg: str, default=None) -> str: |
There was a problem hiding this comment.
The function get_arg is defined at the module level, but it includes self as its first parameter. This will cause a TypeError when called as get_arg(args_dict, arg). Please remove the unused self parameter.
| def get_arg(self, args_dict: dict, arg: str, default=None) -> str: | |
| def get_arg(args_dict: dict, arg: str, default=None) -> str: |
| from counters import Counters | ||
| from mcf_diff import diff_mcf_files | ||
| from mcf_file_util import get_value_list | ||
| from stat_var_test_runner import StatVarProcessorTestRunner, run_script |
There was a problem hiding this comment.
The functions merge_args, get_arg, and get_env_dict are used in this file but are not imported, which will cause a NameError at runtime. Please import them from stat_var_test_runner.
| from stat_var_test_runner import StatVarProcessorTestRunner, run_script | |
| from stat_var_test_runner import StatVarProcessorTestRunner, run_script, merge_args, get_arg, get_env_dict |
| cmd_args = dict() | ||
| # Add default arguments for output | ||
|
|
||
| cmd_args = merged_args([ |
There was a problem hiding this comment.
|
|
||
| # Get commandline args for stavtar processor | ||
| # to use generated pvmap files. | ||
| cmd_args = merged_args([ |
There was a problem hiding this comment.
| import datetime | ||
|
|
||
| # Set up paths to import tools | ||
| PROJECT_ROOT = '/usr/local/google/home/nehil/datacommons/import/git/data' |
There was a problem hiding this comment.
The PROJECT_ROOT path is hardcoded to a local directory, which makes the script non-portable. To ensure portability and avoid relying on specific directory structures, use a more robust method to locate the repository root, such as searching for marker files like .git or WORKSPACE.
| PROJECT_ROOT = '/usr/local/google/home/nehil/datacommons/import/git/data' | |
| def find_repo_root(path): | |
| if os.path.exists(os.path.join(path, '.git')) or os.path.exists(os.path.join(path, 'WORKSPACE')): | |
| return path | |
| parent = os.path.dirname(path) | |
| return find_repo_root(parent) if parent != path else path | |
| PROJECT_ROOT = find_repo_root(os.path.abspath(os.path.dirname(__file__))) |
References
- Avoid hardcoding paths that rely on specific directory structures. Instead, use more robust methods to locate the repository root, such as searching for marker files (e.g., .git or WORKSPACE).
| merged_col_pvs, row_index, col_index): | ||
| row_svobs += 1 | ||
| if self._data_pvs is not None: | ||
| self._data_pvs[self._file_context] = merged_col_pvs |
There was a problem hiding this comment.
The dictionary merged_col_pvs is stored directly in self._data_pvs by reference. However, merged_col_pvs is passed to process_stat_var_obs_pvs on line 2274, which mutates it in-place (e.g., via add_default_pvs). Storing mutable dictionaries by reference can lead to unexpected side effects or shared state bugs. Please store a shallow copy of the dictionary instead.
| self._data_pvs[self._file_context] = merged_col_pvs | |
| self._data_pvs[self._file_context] = dict(merged_col_pvs) |
| def get_stats_from_diff_counters(self, | ||
| diff_stats: dict, | ||
| stats: dict = None) -> dict: |
There was a problem hiding this comment.
The method get_stats_from_diff_counters does not use self and is called statically with None as the first argument in calculate_efficacy.py. Please define it as a @staticmethod and remove the unused self parameter to improve code clarity and maintainability.
| def get_stats_from_diff_counters(self, | |
| diff_stats: dict, | |
| stats: dict = None) -> dict: | |
| @staticmethod | |
| def get_stats_from_diff_counters(diff_stats: dict, | |
| stats: dict = None) -> dict: |
| def get_metrics_from_counters(counters_obj): | ||
| # Use metrics formula directly from pvmap_generator_metrics.py | ||
| diff_stats = {'counters': counters_obj.get_counters()} | ||
| stats = PVMapGeneratorMetricsRunner.get_stats_from_diff_counters(None, diff_stats) |
There was a problem hiding this comment.
Once get_stats_from_diff_counters is refactored to a @staticmethod, you can call it directly without passing None as the first argument.
| stats = PVMapGeneratorMetricsRunner.get_stats_from_diff_counters(None, diff_stats) | |
| stats = PVMapGeneratorMetricsRunner.get_stats_from_diff_counters(diff_stats) |
Overview
This PR introduces the Agent Efficacy Wrapper, a new tooling suite designed to evaluate and calculate the quality of the autoschematization agent. It systematically compares the agent's output predictions against human-reviewed "gold standard" directories to calculate key performance metrics (Precision, Recall, and F1 Score).
Key Additions
calculate_efficacy.py): A comprehensive evaluation script that assesses four primary output artifacts (output_pvmap.csv,output_stat_vars.mcf,output_stat_vars_schema.mcf,output.tmcf) generated by the agent against their reviewed counterparts. It relies on advanced semantic graph comparisons (mcf_diff.py) and fingerprinting to ensure accurate evaluation regardless of node ID string variations.Agent_Efficacy_Board.html): An HTML-based reporting dashboard that visualizes the Precision, Recall, F1 scores, True Positives (TP), False Positives (FP), and False Negatives (FN) for the evaluated datasets.EFFICACY_GUIDE.md&USAGE_EXAMPLES.md): Comprehensive guides on understanding the evaluation heuristics and executing the efficacy pipeline in both single-dataset and bulk-evaluation modes.pvmap_generator_metrics.py): New tooling to gather and calculate metrics specific to PV Map generation.stat_var_test_runner.py): A dedicated test runner for executing, validating, and comparing StatVar processing pipelines programmatically.Enhancements
pvmap_generator.py,run_statvar_processor.sh, andstat_var_processor.pyto seamlessly integrate with the new agentic import metric tracking and validation workflows.mcf_diff.pyto bolster graph comparison robustness.(Note: All changes in this PR are strictly isolated to the
tools/directory.)