fix(static-patterns): filter false positives from documentation and code examples by mimran-khan · Pull Request #140 · NVIDIA/SkillSpector

mimran-khan · 2026-06-22T17:40:54Z

Summary

Static pattern analyzers fire on markdown documentation, fenced code blocks, and inline examples (e.g. curl -k in a README usage guide, rm -rf in a Dockerfile reference) even though these are not executable skill logic. This inflates finding counts and the overall risk score with noise that obscures genuine threats.

Changes

Centralized is_code_example() filter in static_runner.run_static_patterns(): any finding whose surrounding context matches documentation indicators (fenced code blocks, "example:", "for example", "e.g.", etc.) is dropped before conversion to Finding. All 12 static pattern analyzers benefit without per-module changes.
Documentation-path confidence reduction: findings from markdown files in known documentation directories (docs/, procedures/, references/, examples/, guides/) receive a 0.3× confidence multiplier, so they still appear in results but contribute proportionally less to the risk score.
SKILL.md exemption: the primary skill instruction file is explicitly excluded from documentation-path reduction since it contains the agent's executable instructions.

Behavioral Impact

Scenario	Before	After
`curl -k` in `docs/usage.md` inside a code block	TM1 HIGH (confidence 0.6)	Filtered out (code example)
`rm -rf /var/lib/apt` in `docs/docker.md` prose	TM1 HIGH (confidence 0.85)	TM1 with confidence 0.255 (0.85 × 0.3)
`subprocess.run(cmd, shell=True)` in `deploy.py`	TM1 HIGH (confidence 0.9)	TM1 HIGH (confidence 0.9) — unchanged
Pattern in `SKILL.md` code block	TM1 HIGH	Filtered out (code example context)

Test Plan

20 new unit tests covering code-example filtering, documentation-path detection, SKILL.md exemption, and non-documentation paths
Full test suite: 696 passed, 0 failures, 0 regressions
ruff check and ruff format clean
No new mypy errors introduced (pre-existing type widening in state.get() calls unchanged)

Fixes #135

mimran-khan · 2026-06-22T19:14:58Z

PS: These issues were found while running bunch of skill evals using SkillSpector and hence raising all the issues so that i can use these with correct reference to Upstream code

rng1995

Review: filter false positives from documentation and code examples

The documentation-path confidence reduction is reasonable and the SKILL.md exemption is the right instinct. However, I'm requesting changes because of the code-example filter, which I believe is a detection-bypass risk for a security scanner.

Blocking: dropping findings via a broad, spoofable heuristic across all file types

The new check drops any finding whose context matches is_code_example(af.context):

for af in raw:
    if af.context and is_code_example(af.context):
        continue  # finding dropped entirely

is_code_example is a case-insensitive substring match over a ~3-line context window against very generic indicators (```, example:, for example, e.g., such as, documentation, # note:, # warning:, …). Two problems:

It runs for every file type, including executable code (.py, .sh). There is no file-type guard, so a finding in real executable code is dropped if any of those tokens appears within a few lines.
The tokens are trivially attacker-controlled. A malicious skill can suppress a genuine static finding just by adding a nearby comment such as # e.g. usage, # Note: ..., or wrapping code in a fence. The finding then never reaches the report or the LLM meta-analyzer — so this lowers recall even when the LLM is enabled.

For a security tool, silently dropping real detections based on surrounding prose is a regression (missing a true positive is worse than surfacing a false one).

Recommendations

Prefer downweighting (reduce confidence) over dropping — exactly what this PR already does for documentation-path markdown. That preserves recall and lets the meta-analyzer adjudicate.
If dropping is retained, restrict it to non-executable / markdown / doc contexts; never drop a finding in an executable file based on a generic substring in its context window.
Tighten the indicator set — such as, e.g., for example, and documentation are too broad to gate a hard drop.

Tests

Coverage of the happy paths is good. Please add a test asserting that a finding in an executable file is not dropped merely because its context contains a generic indicator word, to lock in the intended scope.

Inter-PR note

PR #143 applies the same is_code_example drop in the --no-llm / meta fallback, so with both merged the same spoofable filter drops findings at two stages, and in offline mode there is no LLM safety net. Also, the 0.3x confidence reduction here compounds with #139's confidence-weighted scoring and #143's 0.4 confidence threshold — a documentation finding at 0.9 * 0.3 = 0.27 would subsequently be dropped by #143. Worth validating the combined effect.

mimran-khan · 2026-06-22T19:55:54Z

Thanks for catching this — you're right that hard-dropping findings in executable files based on generic indicators is a detection-bypass risk.

Changes made:

is_code_example() now only hard-drops findings in non-executable file types (markdown, text, json, yaml, toml)
For executable files (.py, .sh, .js, .ts, etc.), findings with code-example context are downweighted (0.5x confidence) instead of dropped — this preserves recall while the LLM meta-analyzer can still adjudicate
Added _NON_EXECUTABLE_FILE_TYPES frozenset to gate the behavior
Added test test_finding_in_executable_not_dropped_by_generic_indicator that verifies an attacker cannot suppress a genuine finding in a .py file by salting nearby code with # Note: ...

The doc-path 0.3x confidence reduction remains unchanged (only applies to markdown in documentation subdirectories).

…ode examples Static pattern analyzers fire on markdown documentation, fenced code blocks, and inline examples even though these are not executable skill logic. This inflates findings counts and the overall risk score with noise that obscures genuine threats. This commit applies the existing is_code_example() helper centrally in static_runner.run_static_patterns() so all 12 pattern analyzers benefit without per-module changes. Additionally, findings in documentation subdirectories (docs/, procedures/, references/, examples/, guides/) receive a 0.3x confidence reduction since they are unlikely to represent real attack surface. SKILL.md is explicitly excluded from the documentation-path heuristic since it contains the primary skill instructions that agents execute. Fixes NVIDIA#135 Signed-off-by: Mohammed Imran Khan <mohammed_imran.khan@outlook.com>

…le files The is_code_example filter now only hard-drops findings in non-executable file types (markdown, text, json, yaml, toml). For executable files (.py, .sh, .js, .ts, etc.), findings with code-example context indicators are downweighted (0.5x confidence) instead of dropped entirely. This prevents detection bypass where an attacker salts malicious code with nearby comments like '# e.g. usage' or '# Note: ...' to suppress findings. The LLM meta-analyzer can still adjudicate downweighted findings. Adds test asserting that executable file findings survive generic indicator words in their context.

rng1995

Re-review: filter false positives from documentation and code examples

The detection-bypass risk is addressed. Approving.

Verified

The is_code_example hard-drop is now gated to non-executable file types via _NON_EXECUTABLE_FILE_TYPES = frozenset({"markdown","text","json","yaml","toml","other"}). For executable files the finding is downweighted (af.confidence *= _CODE_EXAMPLE_CONFIDENCE_FACTOR, 0.5x) rather than dropped — so a real detection in .py/.sh can't be silently suppressed by salting nearby prose.
Anti-spoofing regression test added: test_finding_in_executable_not_dropped_by_generic_indicator confirms a genuine subprocess.run(..., shell=True) finding in deploy.py survives a # Note: comment.
SKILL.md exemption and the doc-path 0.3x reduction are retained and covered by tests.

Non-blocking nit

_NON_EXECUTABLE_FILE_TYPES includes "other". If _infer_file_type() ever returns "other" for an unknown-but-executable extension, a finding there in code-example context would be hard-dropped. Worth confirming the inference can't misclassify executable content as "other".

rng1995

Re-review: filter false positives from documentation and code examples — approving

The detection-bypass concern is resolved — the broad code-example hard-drop no longer applies to executable files. Approving.

The hard-drop is now gated to non-executable types, and executable files are downweighted instead of dropped (static_runner.py, ~L55-64):

_NON_EXECUTABLE_FILE_TYPES = frozenset({"markdown", "text", "json", "yaml", "toml", "other"})
...
if af.context and is_code_example(af.context):
    if is_non_executable:
        continue                                       # markdown/text/etc. -> drop
    af.confidence *= _CODE_EXAMPLE_CONFIDENCE_FACTOR   # .py/.sh/... -> 0.5x, kept

So a finding in a .py/.sh file can no longer be suppressed by salting generic indicators into nearby context; it's preserved (at reduced confidence) for the meta-analyzer to adjudicate.

Confirmed:

Anti-spoofing regression test test_finding_in_executable_not_dropped_by_generic_indicator — a .py finding survives a salted # Note: ... comment.
SKILL.md exemption and the doc-path 0.3x reduction are retained and covered by tests.

Non-blocking nit: _NON_EXECUTABLE_FILE_TYPES includes "other". Since "other" is the catch-all for unknown extensions, a file whose type is inferred as "other" would still be hard-dropped on code-example context. Worth confirming _infer_file_type can't classify an unknown-but-executable script (e.g. an extensionless shell script) as "other" — otherwise the same bypass could narrowly re-open for that case. Not blocking.

Thanks for addressing the bypass risk.

rng1995 mentioned this pull request Jun 22, 2026

fix(scoring): prevent risk score saturation via per-rule diminishing returns #139

Merged

5 tasks

rng1995 requested changes Jun 22, 2026

View reviewed changes

rng1995 mentioned this pull request Jun 22, 2026

fix(meta-analyzer): add heuristic fallback filter for --no-llm mode #143

Merged

3 tasks

mimran-khan added 2 commits June 23, 2026 01:30

mimran-khan force-pushed the fix/static-patterns-documentation-135 branch from 87f6e2a to 47e4942 Compare June 22, 2026 20:01

mimran-khan mentioned this pull request Jun 22, 2026

[Bug] Static pattern analyzers fire on markdown documentation and code blocks, not just executable skill logic #135

Closed

rng1995 approved these changes Jun 23, 2026

View reviewed changes

rng1995 merged commit b1586b7 into NVIDIA:main Jun 23, 2026

rng1995 mentioned this pull request Jun 23, 2026

fix(static-runner): skip binary/PDF files and filter PE3 .env doc references #157

Open

mimran-khan mentioned this pull request Jun 23, 2026

fix: address non-blocking reviewer nits from #140, #141, #143 #172

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(static-patterns): filter false positives from documentation and code examples#140

fix(static-patterns): filter false positives from documentation and code examples#140
rng1995 merged 2 commits into
NVIDIA:mainfrom
mimran-khan:fix/static-patterns-documentation-135

mimran-khan commented Jun 22, 2026

Uh oh!

mimran-khan commented Jun 22, 2026

Uh oh!

rng1995 left a comment

Uh oh!

mimran-khan commented Jun 22, 2026

Uh oh!

rng1995 left a comment

Uh oh!

rng1995 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mimran-khan commented Jun 22, 2026

Summary

Changes

Behavioral Impact

Test Plan

Uh oh!

mimran-khan commented Jun 22, 2026

Uh oh!

rng1995 left a comment

Choose a reason for hiding this comment

Review: filter false positives from documentation and code examples

Blocking: dropping findings via a broad, spoofable heuristic across all file types

Recommendations

Tests

Inter-PR note

Uh oh!

mimran-khan commented Jun 22, 2026

Uh oh!

rng1995 left a comment

Choose a reason for hiding this comment

Re-review: filter false positives from documentation and code examples

Verified

Non-blocking nit

Uh oh!

rng1995 left a comment

Choose a reason for hiding this comment

Re-review: filter false positives from documentation and code examples — approving

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants