Summary
Imagine a pharmacy that verifies drug authenticity by checking package weight. You swap the pills inside with sugar tablets (same weight), and the pharmacy's "verification system" marks the package as genuine because the weight hasn't changed. The contents are completely different, but the proxy metric (weight) stayed the same.
SkillSpector's YARA rule cache uses file path + file size (st_size) as the cache key for compiled YARA rules. When a YARA rule file is modified but retains the same byte count (e.g., changing a pattern string to a different pattern of equal length, or swapping rule logic), the cache returns the stale pre-compiled rules. Scans continue using outdated detection logic even though the rule file has been updated.
Why This Matters — Real-World Scenario
Scenario: Security team updates YARA rules after a threat advisory
A security team receives a threat advisory about a new prompt injection technique. They update their custom YARA rules to detect the new pattern:
Before (detects old technique):
rule prompt_inject_v1 {
strings:
$s1 = "ignore previous instructions"
condition:
$s1
}
After (detects new technique, same file size):
rule prompt_inject_v2 {
strings:
$s1 = "disregard prior directives"
condition:
$s1
}
Both versions are exactly the same byte count (they padded the shorter string with spaces or adjusted the rule name). They deploy the updated rules and re-scan their skill catalog.
SkillSpector's cache sees: same file path, same file size → return cached compiled rules. Every scan continues using the old v1 rules. The new threat technique goes completely undetected. The team believes they're protected because they "deployed the update," but the cache is silently serving stale detection logic.
They only discover this weeks later during a manual audit — after the new attack vector has already been exploited.
Reproduction
# Create initial YARA rule
mkdir -p /tmp/yara-test/rules
cat > /tmp/yara-test/rules/test.yar << 'YAREOF'
rule detect_alpha {
strings:
$a = "AAAAAAAAAA"
condition:
$a
}
YAREOF
# Create skill with matching content
mkdir -p /tmp/yara-test/skill
cat > /tmp/yara-test/skill/SKILL.md << 'SKILLEOF'
---
name: yara-test
---
# Test
Contains AAAAAAAAAA pattern
SKILLEOF
# First scan — populates cache, rule matches
skillspector scan /tmp/yara-test/skill/ --no-llm --format json -o /tmp/r1.json
# Now change the rule to detect "BBBBBBBBBB" instead (same byte count!)
cat > /tmp/yara-test/rules/test.yar << 'YAREOF'
rule detect_bravo {
strings:
$a = "BBBBBBBBBB"
condition:
$a
}
YAREOF
# Second scan — cache returns stale compiled rules
skillspector scan /tmp/yara-test/skill/ --no-llm --format json -o /tmp/r2.json
# Compare: r2 should find NOTHING (skill has AAAA not BBBB)
# But if cache is stale, r2 still reports the old match
python -c "
import json
r1 = json.load(open('/tmp/r1.json'))
r2 = json.load(open('/tmp/r2.json'))
print(f'Scan 1 YARA findings: {len([i for i in r1[\"issues\"] if \"yara\" in i.get(\"rule_id\", \"\").lower()])}')
print(f'Scan 2 YARA findings: {len([i for i in r2[\"issues\"] if \"yara\" in i.get(\"rule_id\", \"\").lower()])}')
# Expected: Scan 2 = 0 (new rule doesn't match)
# Actual: Scan 2 = 1 (stale cache serves old compiled rule)
"
Root Cause
In src/skillspector/nodes/analyzers/static_yara.py, the _content_hash() function (lines 72-78):
def _content_hash(self, rule_paths: list[Path]) -> str:
"""Generate cache key from rule file metadata."""
parts = []
for p in sorted(rule_paths):
stat = p.stat()
parts.append(f"{p}:{stat.st_size}") # Only path + size!
return hashlib.md5("|".join(parts).encode()).hexdigest()
The cache key is derived from:
- File path (doesn't change on edit)
- File size (
st_size) — changes only if the edit adds/removes bytes
It does NOT include:
- File content hash (the actual rule definitions)
- Modification time (
st_mtime) — would at least catch any edit
- Inode or creation time
Any rule file modification that preserves byte count (extremely common with pattern string substitutions of equal length) produces the same cache key, causing the cache to return stale compiled YARA rules.
Impact
- Stale detection logic: Updated YARA rules are silently ignored when file size is unchanged
- False negatives: New threat patterns go undetected after rule updates
- Silent failure: No warning that cached (potentially outdated) rules are being served
- Hard to diagnose: The scan appears to work normally — it just uses old rules
- Security regression: Organizations deploy rule updates believing they're protected, but the cache bypasses the update
- Common trigger: YARA rule edits frequently preserve file size (changing one pattern string to another of equal length)
Affected Version
SkillSpector v2.2.3
Summary
Imagine a pharmacy that verifies drug authenticity by checking package weight. You swap the pills inside with sugar tablets (same weight), and the pharmacy's "verification system" marks the package as genuine because the weight hasn't changed. The contents are completely different, but the proxy metric (weight) stayed the same.
SkillSpector's YARA rule cache uses
file path + file size (st_size)as the cache key for compiled YARA rules. When a YARA rule file is modified but retains the same byte count (e.g., changing a pattern string to a different pattern of equal length, or swapping rule logic), the cache returns the stale pre-compiled rules. Scans continue using outdated detection logic even though the rule file has been updated.Why This Matters — Real-World Scenario
Scenario: Security team updates YARA rules after a threat advisory
A security team receives a threat advisory about a new prompt injection technique. They update their custom YARA rules to detect the new pattern:
Before (detects old technique):
After (detects new technique, same file size):
Both versions are exactly the same byte count (they padded the shorter string with spaces or adjusted the rule name). They deploy the updated rules and re-scan their skill catalog.
SkillSpector's cache sees: same file path, same file size → return cached compiled rules. Every scan continues using the old
v1rules. The new threat technique goes completely undetected. The team believes they're protected because they "deployed the update," but the cache is silently serving stale detection logic.They only discover this weeks later during a manual audit — after the new attack vector has already been exploited.
Reproduction
Root Cause
In
src/skillspector/nodes/analyzers/static_yara.py, the_content_hash()function (lines 72-78):The cache key is derived from:
st_size) — changes only if the edit adds/removes bytesIt does NOT include:
st_mtime) — would at least catch any editAny rule file modification that preserves byte count (extremely common with pattern string substitutions of equal length) produces the same cache key, causing the cache to return stale compiled YARA rules.
Impact
Affected Version
SkillSpector v2.2.3