Skip to content

fix(input-handler): validate URLs against SSRF and add zip-slip protection#159

Open
mimran-khan wants to merge 1 commit into
NVIDIA:mainfrom
mimran-khan:fix/input-handler-ssrf-validation
Open

fix(input-handler): validate URLs against SSRF and add zip-slip protection#159
mimran-khan wants to merge 1 commit into
NVIDIA:mainfrom
mimran-khan:fix/input-handler-ssrf-validation

Conversation

@mimran-khan

Copy link
Copy Markdown
Contributor

Summary

The input handler accepts arbitrary URLs for git clone and file download without validating the target host. This enables Server-Side Request Forgery (SSRF) — an attacker could craft a skill referencing internal endpoints (cloud metadata at 169.254.169.254, internal services at 10.x.x.x, etc.) to exfiltrate data or probe the network.

Additionally, _extract_zip uses extractall() without path traversal checks, enabling zip-slip attacks where crafted archives write files outside the extraction directory.

Changes

SSRF Protection

  • Added _is_private_ip() to detect loopback, private, link-local, and reserved addresses
  • Added _validate_url_host() method enforcing allowlist + private IP check
  • Both _clone_git() and _download_file() validate before network access
  • Configurable module-level ALLOWED_GIT_HOSTS and ALLOWED_DOWNLOAD_HOSTS frozensets

Zip-Slip Prevention

  • _extract_zip() iterates all zip members and validates resolved paths stay within extraction directory
  • Raises ValueError with clear message on path traversal attempt

Allowlist Design

  • Git: github.com, gitlab.com, bitbucket.org
  • Download: above + raw.githubusercontent.com, huggingface.co
  • Enterprise deployments can extend these constants

Testing

23 new tests covering:

  • Private IP detection (127.0.0.1, ::1, 10.x, 172.16.x, 192.168.x, 169.254.x)
  • Public IP passes
  • Unresolvable hostnames blocked (fail-closed)
  • Internal git URLs rejected
  • Cloud metadata endpoints rejected
  • Allowed hosts succeed (with mocked subprocess/httpx)
  • Zip-slip attack blocked
  • Normal/nested zips extract correctly
  • Allowlist configuration assertions

Existing 6 input handler tests continue to pass.

Fixes #147

…ip-slip protection

Three security hardening changes to the input handler:

1. SSRF protection: Both git clone and file download now validate the
   target hostname against an allowlist of known-safe hosts (GitHub,
   GitLab, Bitbucket, HuggingFace). URLs resolving to private/loopback
   IPs are blocked regardless of hostname.

2. Zip-slip prevention: Archive extraction now validates every member
   path against the extraction directory to prevent path traversal
   attacks (e.g., ../../etc/passwd entries).

3. Configurable allowlists: ALLOWED_GIT_HOSTS and ALLOWED_DOWNLOAD_HOSTS
   are module-level constants that can be extended for enterprise
   deployments with internal forges.

Fixes NVIDIA#147
@wernerkasselman-au

Copy link
Copy Markdown
Contributor

Thanks for picking this up. I read the post-#159 input_handler.py and the SSRF gating does what it says on the two paths that matter: _validate_url_host is the first statement of both _clone_git and _download_file, and the path I was worried about (a bare http://169.254.169.254/latest/meta-data/... with no .git, which _is_file_url routes into _download_file) now raises on the allowlist branch before it touches the network, since 169.254.169.254 is not in ALLOWED_DOWNLOAD_HOSTS. Good.

One gap remains, and I think it is load-bearing. _download_file still builds httpx.Client(follow_redirects=True, timeout=30) (unchanged by the PR), and _validate_url_host only ever sees the initial url, so the allowlist is a first-hop check, not a per-hop one. A request to an allowlisted host (a github.com release or blob link legitimately 302s out to an object-store host, and any open-redirect on github.com / raw.githubusercontent.com / huggingface.co gives the same primitive) is followed by httpx to wherever the 3xx points with no second validation, so a redirect landing on http://169.254.169.254/... still reaches the metadata service and the body comes back to the scanner. I'd set follow_redirects=False on that client (we control the URLs we expect, so a redirect is already a smell), or keep redirects and rerun _validate_url_host on each hop through an httpx event hook. The first is simpler and I'd lean that way.

On the zip side I'd leave extractall as it is, and I would not treat it as a security gap: on CPython (I checked 3.13) zipfile.extractall strips . and .. components before writing and does not recreate symlinks (a symlink member comes out as a regular file holding the link text), so neither a ../ traversal nor a symlink entry escapes extract_dir. The added namelist() prefix check is belt-and-suspenders rather than load-bearing, which is fine to keep. One small note, zipfile.extractall has no filter= argument (that one is tarfile), so filter="data" would throw if anyone reaches for it here.

Happy to be wrong on the redirect point, push back with the hop you think revalidates if I missed one. Otherwise the allowlist and private-IP work is good and I'd like to see it land, with follow_redirects=False (or per-hop revalidation) folded in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Security] Zip Slip path traversal and SSRF via permissive Git URL validation in InputHandler

2 participants