Skip to content

Agent Skills - Discoverability#1688

Merged
melo-gonzo merged 4 commits into
NVIDIA:mainfrom
melo-gonzo:discoverability-skill
May 29, 2026
Merged

Agent Skills - Discoverability#1688
melo-gonzo merged 4 commits into
NVIDIA:mainfrom
melo-gonzo:discoverability-skill

Conversation

@melo-gonzo
Copy link
Copy Markdown
Collaborator

PhysicsNeMo Pull Request

Description

This PR sets up canonical agents skills layout and adds a validated skill for 'discovery' of the physicsnemo codebase. Useful for new users, understanding how physicsnemo can help with a particular user problem, and guides users through best places to start as it relates to their specific problem of interest. Does not write, modify, or distribute code - purely for efficient information sharing and surfacing details that may otherwise be hidden in docs, code snippets, or example folders.

NVIDIA validated signature, benchmark results, and evaluation prompts are included per process guidelines.

Note, linters and pre-commit were updated to exclude these peripheral files from formatting as they would render signatures invalid.

Checklist

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 29, 2026

Greptile Summary

This PR establishes the canonical agent-skills directory layout for the PhysicsNeMo repo and introduces the physicsnemo-discover skill — a read-only, live-discovery-oriented agent skill that guides users toward the right model families, datapipes, and examples for SciML/AI4Science tasks without writing or modifying any code.

  • Structure: A new .agents/skills symlink points to the repo-root skills/ directory; markdownlint is excluded from skills/ in both .pre-commit-config.yaml and .markdownlintignore to preserve the cryptographic signature over the skill files.
  • Skill content: SKILL.md defines a "discover, don't remember" philosophy with a structured output template, an abstention path for out-of-scope queries, and companion TAXONOMY.md/RECIPES.md reference files; benchmark results (NVSkills-Eval, 4 tasks, PASS) and an NVIDIA-signed skill.oms.sig are included per process guidelines.
  • Notable benchmark signals: Both evaluation agents show negative effectiveness uplift (-9% claude-code, -5% codex), and the Tier 1 validator raised MEDIUM findings for missing ## Instructions/## Examples schema sections and an outbound git-clone instruction (SDI-2) that has no fallback when network access is unavailable.

Important Files Changed

Filename Overview
skills/physicsnemo-discover/SKILL.md Core skill definition: well-structured with clear scope, output format, abstention template, and live-discovery philosophy; contains a git-clone instruction for headless environments that the benchmark validator flagged as a medium security concern with no fallback if the network call fails.
skills/physicsnemo-discover/BENCHMARK.md Evaluation report showing overall PASS with 4 tasks; both agents show negative effectiveness uplift (-9% claude-code, -5% codex) and several unresolved MEDIUM findings (missing schema sections, long description, git-clone SDI-2).
skills/physicsnemo-discover/references/RECIPES.md Concrete Glob/Grep/Read discovery patterns; well-organized with 11 recipe sections covering all major discovery axes, no issues found.
skills/physicsnemo-discover/references/TAXONOMY.md Navigation scaffold with data-shape routing tables, domain maps, and stability tiers; explicitly warns against using it as a static inventory, no issues found.
skills/physicsnemo-discover/evals/evals.json Four evaluation tasks (2 positive, 2 negative) with ground-truth and expected-behavior checks; covers the core abstention case and a clear-match case.
.pre-commit-config.yaml Adds exclude: ^skills/ to the markdownlint hook; consistent with the .markdownlintignore additions since git tracks skill files under skills/ (the symlink target), not .agents/skills/.
.agents/skills New symlink .agents/skills → ../skills following the canonical agent-skills layout convention; no issues.

Comments Outside Diff (3)

  1. skills/physicsnemo-discover/BENCHMARK.md, line 107-108 (link)

    P2 Negative effectiveness uplift on both agents

    The benchmark shows the skill reduces effectiveness compared to the no-skill baseline for both claude-code (-9%) and codex (-5%). This means agents following the skill complete the evaluated tasks at a lower rate than they would without it. The effect may be an artifact of a 4-task dataset (low statistical power), but it is the most visible signal suggesting the workflow overhead introduced by the skill's discovery constraints outweighs the guidance benefit in at least some cases. It would be worth clarifying whether this dimension is expected to be negative for a discovery-only skill, or whether the evaluation tasks include scenarios where the skill is mistakenly activated and penalized.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  2. skills/physicsnemo-discover/SKILL.md, line 178 (link)

    P2 Shallow-clone instruction flagged by the benchmark validator

    The benchmark's own Tier 1 static validation flagged this line as MEDIUM SECURITY/SDI-2 because it instructs agents to shallow-clone an external Git repository. The mitigations are present (read-only intent, URL is hardcoded, no execution of cloned code), but as written the instruction tells a generally-capable agent to run an outbound network call (git clone https://github.com/NVIDIA/physicsnemo). In sandboxed or network-restricted agent environments this will silently fail, leaving the agent without a repo to search — and the skill has no fallback behavior for that case. A note directing agents to skip the clone and ask the user for the repo path when the network call fails would improve robustness.

  3. skills/physicsnemo-discover/BENCHMARK.md, line 118-122 (link)

    P2 Missing recommended sections acknowledged but not resolved

    The Tier 1 validator surfaced MEDIUM findings for the absent ## Instructions and ## Examples sections in SKILL.md. These sections are typically required for agents to understand how to invoke the skill correctly and to illustrate expected output. The benchmark passed the overall bar despite these gaps, but leaving them unaddressed risks agents misloading or misusing the skill in edge-case activation scenarios.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Reviews (1): Last reviewed commit: "docs: changelog" | Re-trigger Greptile

Comment thread .pre-commit-config.yaml
Copy link
Copy Markdown
Collaborator

@NickGeneva NickGeneva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@NickGeneva
Copy link
Copy Markdown
Collaborator

/ok to test 8a69fb5

@NickGeneva
Copy link
Copy Markdown
Collaborator

/blossom-ci

@melo-gonzo melo-gonzo enabled auto-merge May 29, 2026 19:57
@melo-gonzo melo-gonzo added this pull request to the merge queue May 29, 2026
Merged via the queue into NVIDIA:main with commit f103a41 May 29, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants