Skip to content

Update tools for agent efficacy wrapper#2059

Open
nehil-gif wants to merge 1 commit into
datacommonsorg:masterfrom
nehil-gif:feature/nehil/agent_efficacy_wrapper
Open

Update tools for agent efficacy wrapper#2059
nehil-gif wants to merge 1 commit into
datacommonsorg:masterfrom
nehil-gif:feature/nehil/agent_efficacy_wrapper

Conversation

@nehil-gif
Copy link
Copy Markdown

@nehil-gif nehil-gif commented Jun 4, 2026

Overview

This PR introduces the Agent Efficacy Wrapper, a new tooling suite designed to evaluate and calculate the quality of the autoschematization agent. It systematically compares the agent's output predictions against human-reviewed "gold standard" directories to calculate key performance metrics (Precision, Recall, and F1 Score).

Key Additions

  • Efficacy Calculator (calculate_efficacy.py): A comprehensive evaluation script that assesses four primary output artifacts (output_pvmap.csv, output_stat_vars.mcf, output_stat_vars_schema.mcf, output.tmcf) generated by the agent against their reviewed counterparts. It relies on advanced semantic graph comparisons (mcf_diff.py) and fingerprinting to ensure accurate evaluation regardless of node ID string variations.
  • Efficacy Dashboard (Agent_Efficacy_Board.html): An HTML-based reporting dashboard that visualizes the Precision, Recall, F1 scores, True Positives (TP), False Positives (FP), and False Negatives (FN) for the evaluated datasets.
  • Documentation (EFFICACY_GUIDE.md & USAGE_EXAMPLES.md): Comprehensive guides on understanding the evaluation heuristics and executing the efficacy pipeline in both single-dataset and bulk-evaluation modes.
  • Metrics Generator (pvmap_generator_metrics.py): New tooling to gather and calculate metrics specific to PV Map generation.
  • StatVar Test Runner (stat_var_test_runner.py): A dedicated test runner for executing, validating, and comparing StatVar processing pipelines programmatically.

Enhancements

  • Minor architectural updates to pvmap_generator.py, run_statvar_processor.sh, and stat_var_processor.py to seamlessly integrate with the new agentic import metric tracking and validation workflows.
  • Modifications to mcf_diff.py to bolster graph comparison robustness.

(Note: All changes in this PR are strictly isolated to the tools/ directory.)

@google-cla
Copy link
Copy Markdown

google-cla Bot commented Jun 4, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an agent efficacy evaluation tool, including an HTML dashboard, a calculation script (calculate_efficacy.py), and documentation. It also adds support for resolved places in the pvmap_generator and stat_var_processor, and introduces a test runner for statvar imports. The review feedback highlights several critical runtime issues, including module-level functions in stat_var_test_runner.py that incorrectly include a self parameter, missing imports and typoed function calls in pvmap_generator_metrics.py, and a hardcoded PROJECT_ROOT path. Additionally, storing a mutable dictionary by reference in stat_var_processor.py should be replaced with a shallow copy to avoid shared state bugs, and static methods should be properly decorated.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

return return_status


def merge_args(self, args: list[dict]) -> dict:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The function merge_args is defined at the module level (outside of any class), but it includes self as its first parameter. When called as merge_args([...]) with a single argument, Python will bind the argument to self and raise a TypeError because the args parameter is missing. Please remove the unused self parameter.

Suggested change
def merge_args(self, args: list[dict]) -> dict:
def merge_args(args: list[dict]) -> dict:

return merged_args


def get_args_list(self, args: dict) -> list:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The function get_args_list is defined at the module level, but it includes self as its first parameter. This will cause a TypeError when called as get_args_list(args). Please remove the unused self parameter.

Suggested change
def get_args_list(self, args: dict) -> list:
def get_args_list(args: dict) -> list:

return cmd_args


def get_arg(self, args_dict: dict, arg: str, default=None) -> str:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The function get_arg is defined at the module level, but it includes self as its first parameter. This will cause a TypeError when called as get_arg(args_dict, arg). Please remove the unused self parameter.

Suggested change
def get_arg(self, args_dict: dict, arg: str, default=None) -> str:
def get_arg(args_dict: dict, arg: str, default=None) -> str:

from counters import Counters
from mcf_diff import diff_mcf_files
from mcf_file_util import get_value_list
from stat_var_test_runner import StatVarProcessorTestRunner, run_script
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The functions merge_args, get_arg, and get_env_dict are used in this file but are not imported, which will cause a NameError at runtime. Please import them from stat_var_test_runner.

Suggested change
from stat_var_test_runner import StatVarProcessorTestRunner, run_script
from stat_var_test_runner import StatVarProcessorTestRunner, run_script, merge_args, get_arg, get_env_dict

cmd_args = dict()
# Add default arguments for output

cmd_args = merged_args([
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The function merged_args is called here, but the actual function defined in stat_var_test_runner.py is named merge_args. This typo will cause a NameError at runtime. Please correct it to merge_args.

Suggested change
cmd_args = merged_args([
cmd_args = merge_args([


# Get commandline args for stavtar processor
# to use generated pvmap files.
cmd_args = merged_args([
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The function merged_args is called here, but the actual function defined in stat_var_test_runner.py is named merge_args. This typo will cause a NameError at runtime. Please correct it to merge_args.

Suggested change
cmd_args = merged_args([
cmd_args = merge_args([

import datetime

# Set up paths to import tools
PROJECT_ROOT = '/usr/local/google/home/nehil/datacommons/import/git/data'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The PROJECT_ROOT path is hardcoded to a local directory, which makes the script non-portable. To ensure portability and avoid relying on specific directory structures, use a more robust method to locate the repository root, such as searching for marker files like .git or WORKSPACE.

Suggested change
PROJECT_ROOT = '/usr/local/google/home/nehil/datacommons/import/git/data'
def find_repo_root(path):
if os.path.exists(os.path.join(path, '.git')) or os.path.exists(os.path.join(path, 'WORKSPACE')):
return path
parent = os.path.dirname(path)
return find_repo_root(parent) if parent != path else path
PROJECT_ROOT = find_repo_root(os.path.abspath(os.path.dirname(__file__)))
References
  1. Avoid hardcoding paths that rely on specific directory structures. Instead, use more robust methods to locate the repository root, such as searching for marker files (e.g., .git or WORKSPACE).

merged_col_pvs, row_index, col_index):
row_svobs += 1
if self._data_pvs is not None:
self._data_pvs[self._file_context] = merged_col_pvs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The dictionary merged_col_pvs is stored directly in self._data_pvs by reference. However, merged_col_pvs is passed to process_stat_var_obs_pvs on line 2274, which mutates it in-place (e.g., via add_default_pvs). Storing mutable dictionaries by reference can lead to unexpected side effects or shared state bugs. Please store a shallow copy of the dictionary instead.

Suggested change
self._data_pvs[self._file_context] = merged_col_pvs
self._data_pvs[self._file_context] = dict(merged_col_pvs)

Comment on lines +172 to +174
def get_stats_from_diff_counters(self,
diff_stats: dict,
stats: dict = None) -> dict:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The method get_stats_from_diff_counters does not use self and is called statically with None as the first argument in calculate_efficacy.py. Please define it as a @staticmethod and remove the unused self parameter to improve code clarity and maintainability.

Suggested change
def get_stats_from_diff_counters(self,
diff_stats: dict,
stats: dict = None) -> dict:
@staticmethod
def get_stats_from_diff_counters(diff_stats: dict,
stats: dict = None) -> dict:

def get_metrics_from_counters(counters_obj):
# Use metrics formula directly from pvmap_generator_metrics.py
diff_stats = {'counters': counters_obj.get_counters()}
stats = PVMapGeneratorMetricsRunner.get_stats_from_diff_counters(None, diff_stats)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Once get_stats_from_diff_counters is refactored to a @staticmethod, you can call it directly without passing None as the first argument.

Suggested change
stats = PVMapGeneratorMetricsRunner.get_stats_from_diff_counters(None, diff_stats)
stats = PVMapGeneratorMetricsRunner.get_stats_from_diff_counters(diff_stats)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant