feat: assertion traits + Patterns engine incl. context-emit (codeguard:analyze) by henryavila · Pull Request #1 · henryavila/codeguard

henryavila · 2026-06-03T15:34:27Z

Summary

Picks CodeGuard back up after the 0.2.0 release stall. Ships the package-side replan without touching the Arch consumer (read-only reference only): the assertion-trait bug fix, the Patterns engine, and the subscription-friendly context-emit review flow.

1. Fase 1 — assertion traits (`4c662a0`)

The shipped TestQualityAssertions + ParallelSafetyAssertions threw RuntimeException('Not yet implemented') in all 7 methods while the docs marketed them as working. Now implemented via a framework-free AntiPatternScanner (Symfony Finder + localized regex, ported from Arch's inline reference). Dropped 2 now-stale trait.unused baseline entries.

2. Fase 2 — Patterns engine, `codeguard:analyze` (`0dfb953`)

Thin Adjudicator design (docs/specs/2026-06-03-patterns-engine-design.md): src/Analyze/* (Severity, Pattern/DetectionSignal, YamlPatternLoader skipping the 2 non-pattern outliers, FileScopeResolver, PatternMatcher, FindingSchema SSOT, PatternMatch::fromArray anti-hallucination trust boundary, AnalyzeResult, AnalyzeRunner, LlmClient seam + NullLlmClient). --fail-on gating; reuses the reserved analyze.ended telemetry slot. No driver → honest degradation notice (never fakes a clean repo).

3. Increment D — context-emit (`18c4492`)

Subscription-friendly, no metered API (the LLM judgement runs inside a Claude Code session):

--emit writes a work order JSON; --ingest=<file> validates out-of-band findings through the same trust boundary and gates the exit code.
New codeguard-review skill orchestrates emit → batched subagents (N files each) → merge → ingest. Published via the codeguard-skills tag.
Removed the 3 stale Node-era skills. Closes R11.

4. Docs/memory

Rewrote PROJECT-STATUS.md (was wrong on 4 load-bearing facts after a 30-day pre-release freeze) + added the patterns-engine design doc.

Test plan

vendor/bin/pest — 377 → 435 passed / 1053 assertions
vendor/bin/phpstan analyse — level 5, No errors (baseline shrank by 2)
vendor/bin/pint --test — clean
composer test:coverage --min=80 — runs in CI (no local coverage driver)
Field validation: run /codeguard-review on a real project

Out of scope (deferred)

coverage_percent hardcoded -1; dead ai_rules/prepare config cleanup
Fase 3 (schema dump + AI-rules generator); any Arch integration (user constraint)

…aits The two shipped traits previously threw RuntimeException('Not yet implemented') in all 7 methods while CHANGELOG/README marketed them as working — a correctness bug wired into the published TestQualityTest stub. Implement them via a framework-free AntiPatternScanner (Symfony Finder + localized regex) that walks tests/ + database/factories/ honouring an allowlist, ported from Arch's inline grep-based reference. Traits become thin adapters that fail the test (PHPUnit Assert) on violations. - AntiPatternScanner: 7 checks (tautological assertions, Eloquent alias mocking, bare assertNotNull, truncate/forceDelete in tests, DB queries + eager create() in Factory::definition with comment/string stripping) - ResolvesAntiPatternScanner concern shared by both traits - 21 unit tests against hermetic temp fixtures (no Laravel boot) - Drop 2 now-stale trait.unused baseline entries (traits are live again) Suite 377 -> 398 green. PHPStan level 5 clean.

PROJECT-STATUS.md was frozen 30 days in a pre-release snapshot and wrong on 4 load-bearing facts (audited against git+FS via multi-agent review): - "release: nenhum ainda" -> 0.2.0 IS on Packagist (tag pushed) - "68 commits ahead" -> origin/main == HEAD (0 ahead) - Arch "path repo/dev-main" -> Arch consumes via vcs + ^0.2.0 - Arch "usa codeguard:check produtivamente" -> Arch bypasses the CLI, uses package as a one-shot stub-seeder, no config/codeguard.php Rewrite to truth + new plan: package-side first (Patterns engine), Arch integration deferred by user constraint. Corrected scorecard, risks (R9/R10/R11), 28->30 pattern count. Add docs/specs/2026-06-03-patterns-engine-design.md (Thin Adjudicator, chosen by judge-panel): pattern data contract, src/Analyze/* component map, LlmClient seam, TDD increments (MVP A-C ~4.5d network-free, real driver D +1.5d), telemetry reuse, honesty check.

…LLM seam Implements the Thin Adjudicator design (docs/specs/2026-06-03-patterns-engine-design.md), package-side only and network-free for the MVP (Increments A–C). src/Analyze/*: - Severity enum (+ weight/meets for --fail-on threshold) - Pattern + DetectionSignal DTOs; YamlPatternLoader (implements PatternRepository) loads the package's own resources/patterns/{core,php,php-laravel}, applies the verification+examples discriminator (skips preset.yaml/module.yaml outliers) - FileScopeResolver: --changed-only (git via CommandExecutor), --path, --all - PatternMatcher: detection.signals → AnalysisUnit (glob/{}/** + dir + import PSR-4) - LlmClient seam + NullLlmClient default; FindingSchema as request/validation SSOT - PatternMatch::fromArray trust boundary (drops hallucinated/file-mismatched/ out-of-range findings); AnalyzeResult (passed/failed per threshold) - AnalyzeRunner: one LLM call per file, emits analyze.ended (reused telemetry slot) CodeguardAnalyzeCommand: --changed-only|--path|--all, --fail-on=critical|...|never, --context. With no driver it prints an honest degradation notice + telemetry status Skip — it never fakes a clean repo. Tests: 33 new (unit per class + FakeLlmClient + command feature with temp telemetry). Suite 398 -> 433 green. PHPStan level 5 clean. Real driver (Anthropic/subscription) is Increment D, deferred pending the transport decision.

…ew skill Increment D in the subscription-friendly shape the user chose: no metered API. The LLM judgement runs inside a Claude Code session; the package stays the deterministic harness. - AnalyzeRunner: extract units() core; add buildWorkOrder() (serialize units + prompt-ready patterns + finding schema) and ingest() (validate out-of-band findings through the SAME PatternMatch trust boundary, then gate + telemeter). - Pattern::toPromptArray() — LLM-facing projection (no metadata/signals). - codeguard:analyze --emit[/--out] writes a work order JSON (no LLM call); --ingest=<file> validates findings + gates exit by --fail-on. - New codeguard-review skill: emit → fan out batched subagents (one per N files) to judge each file against its patterns on the subscription → merge → ingest. Published via the codeguard-skills tag to .claude/skills. - Remove the 3 stale Node-era skills (codeguard-setup/run/health) — they referenced npx/node_modules/codeguard.yaml/hook-runner.js that the PHP package never produces; codeguard:install owns setup now. Closes R11. Suite 433 -> 435 green (+ --emit/--ingest feature tests). PHPStan level 5 clean.

…sing, baseline Tier 0+1 of the completeness roadmap: cross from "scaffolding" to a reviewer worth gating a contractor PR on. All package-side, subscription-only, no metered API. (Corrections from the adversarial design critic incorporated.) T5 — exact-path attribution: AnalyzeRunner::findUnit matches the exact absolute path first, basename only when unambiguous — two User.php no longer silently cross-attribute. T2 — real `use`-statement parsing (PhpFileInspector, zero new dep): `import` detection signals now match a file's actual imports (namespace-glob), so service-layer/dto/policies/form-requests fire on files that USE them, not just the artifact folder. Catch-all `import: **/*` (the 3 cross-file architectural patterns) excluded from per-file selection until the namespace graph (R3). Class-structure patterns gated to files that declare a class. T4 — accepted-finding baseline (AnalyzeBaseline): `--accept` records findings; future runs suppress + always show "N suppressed". Reverses v0 never-baseline (deliberate). Fingerprint = sha1(pattern_key + relative_file): message-free and line-free so LLM rephrasing / edits-above don't defeat it. Scope-coverage test + docs/patterns-recall.md for the manual recall ritual. Suite 435 -> 452 green. PHPStan level 5 clean.

…Tier 2 plan

Tier 2 R1. Adds k-sample voting to the context-emit review path so a finding survives only when >=2/3 of independent passes agree on it, and its confidence becomes the calibrated vote-share rather than the model's self-reported (and easily inflated) number. - FindingVoter: tally(list<list<PatternMatch>>, minVotes) — identity by pattern_key|file|line; a finding repeated within one sample is one vote; survivors carry confidence = votes/k. - PatternMatch::withConfidence() immutable copy for the vote-share overwrite. - AnalyzeRunner::ingestSamples() validates each sample through the same trust boundary BEFORE voting (hallucinations can't accrue votes); extracted shared validate()/checkCount() helpers, ingest() unchanged. - buildWorkOrder() carries a 'samples' count; --samples=k on emit (capped 1-9); ingest auto-detects a {samples:[[...]]} envelope vs the legacy {findings:[...]} single-sample path (backward compatible). - codeguard-review skill documents the k-pass independent fan-out + envelope. 467 tests (+15), PHPStan level 5 clean, Pint clean.

Tier 2 R2. A second-opinion re-scoring pass lets a fresh subagent re-judge each finding 0-10; the package drops any finding scored 0, keeping the rest with the score surfaced. Composes with R1 voting (vote first, then drop 0-scored survivors) at one extra pass instead of k. - FindingSchema: optional verified_score (integer 0-10), not required. - PatternMatch: nullable verifiedScore parsed in the trust boundary (absent or out-of-range => null/uncritiqued; a clean 0-10 is kept; withConfidence preserves it). The trust boundary parses 0 but does NOT drop it. - AnalyzeRunner::surviveCritique() drops verifiedScore===0 in finish(), so all three paths (synchronous, ingest, voted ingest) get it uniformly, before baseline suppression + gating. - buildWorkOrder carries 'critique'; --critique flag; renderFindings shows [score N/10]. - codeguard-review skill documents the Step 5b critique fan-out. 476 tests (+9), PHPStan level 5 clean, Pint clean.

Tier 2 R3. The catch-all-import patterns (layer-dependency-direction, bounded-contexts, no-circular-dependencies) judge relationships BETWEEN files, so a per-file pass never saw them. This builds the cross-file context they need. - NamespaceGraph: parses scoped files' top-level use edges into a first-party adjacency map (vendor imports dropped) and pre-computes dependency cycles (DFS back-edge detection). Pure fromSources() + disk-reading build(). - PhpFileInspector::fqcn(): namespace + declared type => node id. - PatternMatcher::isGraphLevel()/graphLevel() partition the catch-all-import patterns; workingDirectory() getter; catch-all list deduped into a constant. - buildWorkOrder emits 'graph' {nodes,edges,cycles} + 'architecture.patterns' (the graph-level patterns, listed once — NOT duplicated into per-file units). - ingest/ingestSamples augment attribution with one architectural unit per scoped class file that matched no per-file pattern, so an architectural finding attributes through the same trust boundary (a non-class file in scope is still rejected). Per-file check count is unchanged. - FindingSchema/PatternMatch: optional related_file (the other end of a bad dependency); shown as ' -> FQCN'. withConfidence preserves it. - codeguard-review skill: Step 4b single-subagent architectural review over the graph; scope caveat (use --all for a complete graph). 492 tests (+16), PHPStan level 5 clean, Pint clean.

Tier 2 R4. Six AST-invisible patterns at the center of the goal — policing a contractor dev who does NOT use AI. PHPStan/Deptrac cannot see any of these; they need semantic judgement, which is exactly what the analyze engine adds. - mass-assignment (critical): Model::create($request->all()) / $guarded = [] - raw-sql-injection (critical): input interpolated into whereRaw/DB::raw/statement - missing-authorization (critical): store/update/destroy with no authorize()/policy - eloquent-n-plus-one (warning): relation queried inside a loop, no eager load - missing-database-transaction (warning): related writes not wrapped in transaction - unbounded-query (warning): Model::all()/->get() on a growing table, no paginate/chunk Security smells gate by default (critical); perf/integrity smells report (warning) to limit false-positive blocking. Each carries verification.rules + correct/violation few-shot examples and file/directory detection signals (never catch-all import, so they stay per-file, not graph-level). Corpus: 28 -> 34 patterns. Selection coverage asserted in PatternSelectionCoverageTest; recall is the manual ritual (docs/patterns-recall.md, new R4 priority table). 493 tests (+1 selection), PHPStan level 5 clean, Pint clean.

Canonical status: HEAD ddc8d61, suite 493/1175, Tier 2 phase row added, Analyze layer (15 classes incl FindingVoter+NamespaceGraph), corpus 28->34, scorecard pattern-review ~80% (capped pending field recall). Handoff: dedicated Tier 2 section (R1 voting / R2 critique / R3 graph / R4 corpus) + next-action = field validation. patterns-recall.md already carries the R4 priority table.

…view) Cross-model adversarial review (codex, gpt-5-codex) of PR #1 surfaced three correctness gaps; all fixed with TDD (RED->GREEN), suite 496 passing. - F-001 PatternMatch::fromArray accepted ANY known corpus key via PatternRepository::has(), letting a subagent return a finding for a pattern the file was never scoped to review. Third param is now an explicit list<string> extraAllowedKeys (the run's graph-level keys); AnalyzeRunner threads graphLevelKeys() into run/ingest/ingestSamples. - F-002 AnalyzeRunner::findUnit dropped architectural findings on basename collision: NamespaceGraph emits working-dir-relative paths while AnalysisUnit holds absolute paths, so the exact match never hit and the basename fallback was ambiguous. findUnit now resolves the relative path against the working directory before falling back. - F-003 AntiPatternScanner::bareAssertNotNull flagged every assertNotNull($x); even when followed by a behavioural assertion, contradicting its own contract. Rewritten with look-ahead: flags only when the value is not referenced by a later assert/expect. Review artifact + briefings persisted under .atomic-skills/reviews/.

)

PHPStan on PHP 8.5 reported deadCode.unreachable at the two statements following `expect(fn () => $scope->measure(callable: fn () => throw ...))`: the throwing callable infers Closure(): never, which binds measure()'s template TReturn to never and poisons the control-flow of every following statement. PHP 8.3/8.4 did not infer this, so the committed baseline entry (added from an 8.5 dev box) was unmatched on the CI matrix -> reportUnmatchedIgnoredErrors made CI fail on a non-error. Rewrites the throw assertion as an explicit try/catch (the code after a try-with-never body is reachable via the catch path), which removes the false positive at the source on all PHP versions. Drops the now-unnecessary phpstan-baseline.neon entry (shrinks the baseline, no masking). Verified: phpstan No errors on 8.5 with the entry removed; 496 tests green.

BREAKING CHANGE: minimum PHP is now 8.5 (composer.json php: ^8.3 -> ^8.5). CI matrix reduced to 8.5 only. README/CLAUDE.md/memory updated. composer.lock content-hash is unaffected (platform requirements are excluded from it), so `composer install` stays valid. Goes out in the next minor.

The analyze/phpstan gate stopped masking the coverage step (it used to abort at phpstan first); coverage was 79.8% < 80%. assertNoDbQueriesInFactoryDefinition and assertNoEagerCreateInFactoryDefinition were the largest untested public-API gap (ParallelSafetyAssertions at 50%). Adds two fixture-backed tests exercising both via the same anonymous-subject pattern as the existing trait tests. Total coverage 79.8% -> 80.1%; ParallelSafetyAssertions 50% -> 100%.

…covered)

…/ 0.3.0

henryavila added 6 commits June 3, 2026 09:59

docs(memory): status — Fase 1 traits + Fase 2 Patterns MVP shipped

b73be84

docs(memory): status — Increment D context-emit shipped; R7+R11 closed

0af6a25

henryavila changed the title ~~feat: assertion traits + Patterns engine MVP (codeguard:analyze)~~ feat: assertion traits + Patterns engine incl. context-emit (codeguard:analyze) Jun 3, 2026

henryavila added 15 commits June 3, 2026 14:45

docs(memory): trust threshold (Tier 0+1) shipped + session handoff + …

e258e58

…Tier 2 plan

docs(memory): sync status — codex review of PR #1 + 3 TDD fixes (104d867

cd1c3be

)

docs(memory): mark PR #1 branch pushed/synced (cd1c3be)

484e7f7

docs(memory): sync status — CI fix (PHPStan 8.5 FP) + PHP 8.5+ bump

2ec571a

docs(memory): sync status — coverage 80.1% (ParallelSafetyAssertions …

3b1623e

…covered)

henryavila merged commit ff26dd9 into main Jun 4, 2026
3 checks passed

henryavila added a commit that referenced this pull request Jun 4, 2026

docs(memory): PR #1 merged → main (ff26dd9); next = field validation …

78ad466

…/ 0.3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: assertion traits + Patterns engine incl. context-emit (codeguard:analyze)#1

feat: assertion traits + Patterns engine incl. context-emit (codeguard:analyze)#1
henryavila merged 21 commits into
mainfrom
feat/patterns-engine-foundation

henryavila commented Jun 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

henryavila commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Fase 1 — assertion traits (4c662a0)

2. Fase 2 — Patterns engine, codeguard:analyze (0dfb953)

3. Increment D — context-emit (18c4492)

4. Docs/memory

Test plan

Out of scope (deferred)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

henryavila commented Jun 3, 2026 •

edited

Loading

1. Fase 1 — assertion traits (`4c662a0`)

2. Fase 2 — Patterns engine, `codeguard:analyze` (`0dfb953`)

3. Increment D — context-emit (`18c4492`)