Skip to content

[Bug] YARA rule cache uses file size instead of content hash — stale rules served after edits #152

Description

@mimran-khan

Summary

Imagine a pharmacy that verifies drug authenticity by checking package weight. You swap the pills inside with sugar tablets (same weight), and the pharmacy's "verification system" marks the package as genuine because the weight hasn't changed. The contents are completely different, but the proxy metric (weight) stayed the same.

SkillSpector's YARA rule cache uses file path + file size (st_size) as the cache key for compiled YARA rules. When a YARA rule file is modified but retains the same byte count (e.g., changing a pattern string to a different pattern of equal length, or swapping rule logic), the cache returns the stale pre-compiled rules. Scans continue using outdated detection logic even though the rule file has been updated.

Why This Matters — Real-World Scenario

Scenario: Security team updates YARA rules after a threat advisory

A security team receives a threat advisory about a new prompt injection technique. They update their custom YARA rules to detect the new pattern:

Before (detects old technique):

rule prompt_inject_v1 {
    strings:
        $s1 = "ignore previous instructions"
    condition:
        $s1
}

After (detects new technique, same file size):

rule prompt_inject_v2 {
    strings:
        $s1 = "disregard prior directives"
    condition:
        $s1
}

Both versions are exactly the same byte count (they padded the shorter string with spaces or adjusted the rule name). They deploy the updated rules and re-scan their skill catalog.

SkillSpector's cache sees: same file path, same file size → return cached compiled rules. Every scan continues using the old v1 rules. The new threat technique goes completely undetected. The team believes they're protected because they "deployed the update," but the cache is silently serving stale detection logic.

They only discover this weeks later during a manual audit — after the new attack vector has already been exploited.

Reproduction

# Create initial YARA rule
mkdir -p /tmp/yara-test/rules
cat > /tmp/yara-test/rules/test.yar << 'YAREOF'
rule detect_alpha {
    strings:
        $a = "AAAAAAAAAA"
    condition:
        $a
}
YAREOF

# Create skill with matching content
mkdir -p /tmp/yara-test/skill
cat > /tmp/yara-test/skill/SKILL.md << 'SKILLEOF'
---
name: yara-test
---
# Test
Contains AAAAAAAAAA pattern
SKILLEOF

# First scan — populates cache, rule matches
skillspector scan /tmp/yara-test/skill/ --no-llm --format json -o /tmp/r1.json

# Now change the rule to detect "BBBBBBBBBB" instead (same byte count!)
cat > /tmp/yara-test/rules/test.yar << 'YAREOF'
rule detect_bravo {
    strings:
        $a = "BBBBBBBBBB"
    condition:
        $a
}
YAREOF

# Second scan — cache returns stale compiled rules
skillspector scan /tmp/yara-test/skill/ --no-llm --format json -o /tmp/r2.json

# Compare: r2 should find NOTHING (skill has AAAA not BBBB)
# But if cache is stale, r2 still reports the old match
python -c "
import json
r1 = json.load(open('/tmp/r1.json'))
r2 = json.load(open('/tmp/r2.json'))
print(f'Scan 1 YARA findings: {len([i for i in r1[\"issues\"] if \"yara\" in i.get(\"rule_id\", \"\").lower()])}')
print(f'Scan 2 YARA findings: {len([i for i in r2[\"issues\"] if \"yara\" in i.get(\"rule_id\", \"\").lower()])}')
# Expected: Scan 2 = 0 (new rule doesn't match)
# Actual:   Scan 2 = 1 (stale cache serves old compiled rule)
"

Root Cause

In src/skillspector/nodes/analyzers/static_yara.py, the _content_hash() function (lines 72-78):

def _content_hash(self, rule_paths: list[Path]) -> str:
    """Generate cache key from rule file metadata."""
    parts = []
    for p in sorted(rule_paths):
        stat = p.stat()
        parts.append(f"{p}:{stat.st_size}")  # Only path + size!
    return hashlib.md5("|".join(parts).encode()).hexdigest()

The cache key is derived from:

  • File path (doesn't change on edit)
  • File size (st_size) — changes only if the edit adds/removes bytes

It does NOT include:

  • File content hash (the actual rule definitions)
  • Modification time (st_mtime) — would at least catch any edit
  • Inode or creation time

Any rule file modification that preserves byte count (extremely common with pattern string substitutions of equal length) produces the same cache key, causing the cache to return stale compiled YARA rules.

Impact

  • Stale detection logic: Updated YARA rules are silently ignored when file size is unchanged
  • False negatives: New threat patterns go undetected after rule updates
  • Silent failure: No warning that cached (potentially outdated) rules are being served
  • Hard to diagnose: The scan appears to work normally — it just uses old rules
  • Security regression: Organizations deploy rule updates believing they're protected, but the cache bypasses the update
  • Common trigger: YARA rule edits frequently preserve file size (changing one pattern string to another of equal length)

Affected Version

SkillSpector v2.2.3

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions