Skip to content

Merge updates from upstream and integrate local security report optimizations#163

Open
hbui290 wants to merge 8 commits into
NVIDIA:mainfrom
hbui290:main
Open

Merge updates from upstream and integrate local security report optimizations#163
hbui290 wants to merge 8 commits into
NVIDIA:mainfrom
hbui290:main

Conversation

@hbui290

@hbui290 hbui290 commented Jun 23, 2026

Copy link
Copy Markdown

This Pull Request merges the upstream changes from NVIDIA/SkillSpector into the codebase and integrates custom enhancements:

  1. Resolved Merge Conflicts: Handled conflicts across core files like llm_analyzer_base.py, report.py, meta_analyzer.py, build_context.py, and test suites.
  2. Custom Scoring Multiplier: Ensured the 1.3x risk scoring multiplier is correctly applied only to findings detected within executable components (rather than scaling the entire report score), combined with the upstream diminishing returns logic.
  3. Optimizations: Addressed edge-case issues in mock initializations during unit tests.

All 725+ tests pass successfully.

@rng1995 rng1995 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: Request changes (strong) — a 19-file upstream-merge grab-bag with some genuinely good ideas (SQLite cache, block-aware chunking, retry/backoff), but several blocking problems including a fail-open detection regression.

Blocking

  1. Detection-coverage regression (fail-open). _SKIP_DIRS in src/skillspector/nodes/build_context.py is expanded to include docs, doc, tests, test, spec, specs, build, dist, out, target, images, media, brand (~L775-808), now matched against the relative path (~L884). A malicious skill can hide a payload in e.g. tests/ or docs/ and evade scanning entirely. For a security scanner this is a serious regression — these must not be skipped by default. (The bundled BAO_CAO_TOI_UU_HOA.md even frames this as fixing a "file omission" bug, but the net effect is more directories silently skipped.)
  2. Committed personal/local artifacts. BAO_CAO_TOI_UU_HOA.md and HUONG_DAN_SU_DUNG.md are non-project docs containing dev-machine absolute paths (file:///Users/winston/.gemini/antigravity-ide/scratch/SkillSpector/...). These should not land in the repo.
  3. load_dotenv() at package import in src/skillspector/__init__.py (~L135-136): an import-time side effect that auto-loads .env from CWD. Risky for a tool run against untrusted skill directories — it could ingest a scanned skill's .env. Move it behind the CLI entrypoint, not package import.
  4. Production code detecting test mocks. LLMAnalyzerBase._original_run_batches = LLMAnalyzerBase.run_batches (llm_analyzer_base.py ~L673) plus the is_patched mock-introspection branch in semantic_security_discovery.py (~L730-756) couple prod behavior to the test harness — a code smell that can mask real regressions.
  5. Scope / conflicts. Bundles unrelated changes and overlaps/conflicts with #157 (binary skip), #159/#164/#116 (zip-slip/ingest, _download_file/_extract_zip), and #142/#153/#122 (_compute_risk_score / scoring + report.py). Please split into focused PRs and rebase.

The good ideas (persistent SQLite cache, smarter chunk_file_by_lines, deterministic finding ordering, retry/backoff) are worth landing — just on their own focused PRs and without the _SKIP_DIRS expansion, the committed local docs, and the import-time load_dotenv().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants