Skip to content

fix(yara): use content hash for rule cache invalidation#156

Open
mimran-khan wants to merge 2 commits into
NVIDIA:mainfrom
mimran-khan:fix/yara-cache-content-hash
Open

fix(yara): use content hash for rule cache invalidation#156
mimran-khan wants to merge 2 commits into
NVIDIA:mainfrom
mimran-khan:fix/yara-cache-content-hash

Conversation

@mimran-khan

Copy link
Copy Markdown
Contributor

Summary

The YARA rule cache (_content_hash) invalidated based on file paths and sizes only. If a rule file is edited without changing its byte count (e.g., changing a pattern string to a same-length alternative), the module continues serving stale compiled rules from the previous version.

Changes

  • static_yara.py: _content_hash() now hashes actual file content (read_bytes()) instead of just st_size.
  • test_static_yara.py: 3 new tests in TestContentHashInvalidation — validates that same-size edits produce different hashes, identical content produces stable hashes, and _load_rules recompiles after a same-size edit.

Before / After

# Before — size-only, misses same-length edits
h.update(str(p.stat().st_size).encode())

# After — content-based, catches all edits
h.update(p.read_bytes())

Trade-off

Reading file content is slightly slower than stat-only (~microseconds per rule file). Given that YARA rule directories typically contain <50 small files and compilation itself is the expensive step, this is negligible.

Testing

  • 3 new tests pass
  • Full existing YARA test suite continues to pass

Fixes #152

_content_hash() previously hashed file paths and sizes only. If a rule
file was edited without changing its byte count, the cache would serve
stale compiled rules. Switch to hashing actual file content via
read_bytes() so any edit invalidates the cache.

Fixes NVIDIA#152

@keshprad keshprad left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me. thanks!

I'll merge later today after double checking all the unit tests, linting issues, static analysis reports on our internal CI

@keshprad

Copy link
Copy Markdown
Member

just added on minor linting fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] YARA rule cache uses file size instead of content hash — stale rules served after edits

2 participants