feat: assertion traits + Patterns engine incl. context-emit (codeguard:analyze)#1
Merged
Merged
Conversation
…aits
The two shipped traits previously threw RuntimeException('Not yet
implemented') in all 7 methods while CHANGELOG/README marketed them as
working — a correctness bug wired into the published TestQualityTest stub.
Implement them via a framework-free AntiPatternScanner (Symfony Finder +
localized regex) that walks tests/ + database/factories/ honouring an
allowlist, ported from Arch's inline grep-based reference. Traits become
thin adapters that fail the test (PHPUnit Assert) on violations.
- AntiPatternScanner: 7 checks (tautological assertions, Eloquent alias
mocking, bare assertNotNull, truncate/forceDelete in tests, DB queries
+ eager create() in Factory::definition with comment/string stripping)
- ResolvesAntiPatternScanner concern shared by both traits
- 21 unit tests against hermetic temp fixtures (no Laravel boot)
- Drop 2 now-stale trait.unused baseline entries (traits are live again)
Suite 377 -> 398 green. PHPStan level 5 clean.
PROJECT-STATUS.md was frozen 30 days in a pre-release snapshot and wrong on 4 load-bearing facts (audited against git+FS via multi-agent review): - "release: nenhum ainda" -> 0.2.0 IS on Packagist (tag pushed) - "68 commits ahead" -> origin/main == HEAD (0 ahead) - Arch "path repo/dev-main" -> Arch consumes via vcs + ^0.2.0 - Arch "usa codeguard:check produtivamente" -> Arch bypasses the CLI, uses package as a one-shot stub-seeder, no config/codeguard.php Rewrite to truth + new plan: package-side first (Patterns engine), Arch integration deferred by user constraint. Corrected scorecard, risks (R9/R10/R11), 28->30 pattern count. Add docs/specs/2026-06-03-patterns-engine-design.md (Thin Adjudicator, chosen by judge-panel): pattern data contract, src/Analyze/* component map, LlmClient seam, TDD increments (MVP A-C ~4.5d network-free, real driver D +1.5d), telemetry reuse, honesty check.
…LLM seam
Implements the Thin Adjudicator design (docs/specs/2026-06-03-patterns-engine-design.md),
package-side only and network-free for the MVP (Increments A–C).
src/Analyze/*:
- Severity enum (+ weight/meets for --fail-on threshold)
- Pattern + DetectionSignal DTOs; YamlPatternLoader (implements PatternRepository)
loads the package's own resources/patterns/{core,php,php-laravel}, applies the
verification+examples discriminator (skips preset.yaml/module.yaml outliers)
- FileScopeResolver: --changed-only (git via CommandExecutor), --path, --all
- PatternMatcher: detection.signals → AnalysisUnit (glob/{}/** + dir + import PSR-4)
- LlmClient seam + NullLlmClient default; FindingSchema as request/validation SSOT
- PatternMatch::fromArray trust boundary (drops hallucinated/file-mismatched/
out-of-range findings); AnalyzeResult (passed/failed per threshold)
- AnalyzeRunner: one LLM call per file, emits analyze.ended (reused telemetry slot)
CodeguardAnalyzeCommand: --changed-only|--path|--all, --fail-on=critical|...|never,
--context. With no driver it prints an honest degradation notice + telemetry status
Skip — it never fakes a clean repo.
Tests: 33 new (unit per class + FakeLlmClient + command feature with temp telemetry).
Suite 398 -> 433 green. PHPStan level 5 clean. Real driver (Anthropic/subscription)
is Increment D, deferred pending the transport decision.
…ew skill Increment D in the subscription-friendly shape the user chose: no metered API. The LLM judgement runs inside a Claude Code session; the package stays the deterministic harness. - AnalyzeRunner: extract units() core; add buildWorkOrder() (serialize units + prompt-ready patterns + finding schema) and ingest() (validate out-of-band findings through the SAME PatternMatch trust boundary, then gate + telemeter). - Pattern::toPromptArray() — LLM-facing projection (no metadata/signals). - codeguard:analyze --emit[/--out] writes a work order JSON (no LLM call); --ingest=<file> validates findings + gates exit by --fail-on. - New codeguard-review skill: emit → fan out batched subagents (one per N files) to judge each file against its patterns on the subscription → merge → ingest. Published via the codeguard-skills tag to .claude/skills. - Remove the 3 stale Node-era skills (codeguard-setup/run/health) — they referenced npx/node_modules/codeguard.yaml/hook-runner.js that the PHP package never produces; codeguard:install owns setup now. Closes R11. Suite 433 -> 435 green (+ --emit/--ingest feature tests). PHPStan level 5 clean.
…sing, baseline Tier 0+1 of the completeness roadmap: cross from "scaffolding" to a reviewer worth gating a contractor PR on. All package-side, subscription-only, no metered API. (Corrections from the adversarial design critic incorporated.) T5 — exact-path attribution: AnalyzeRunner::findUnit matches the exact absolute path first, basename only when unambiguous — two User.php no longer silently cross-attribute. T2 — real `use`-statement parsing (PhpFileInspector, zero new dep): `import` detection signals now match a file's actual imports (namespace-glob), so service-layer/dto/policies/form-requests fire on files that USE them, not just the artifact folder. Catch-all `import: **/*` (the 3 cross-file architectural patterns) excluded from per-file selection until the namespace graph (R3). Class-structure patterns gated to files that declare a class. T4 — accepted-finding baseline (AnalyzeBaseline): `--accept` records findings; future runs suppress + always show "N suppressed". Reverses v0 never-baseline (deliberate). Fingerprint = sha1(pattern_key + relative_file): message-free and line-free so LLM rephrasing / edits-above don't defeat it. Scope-coverage test + docs/patterns-recall.md for the manual recall ritual. Suite 435 -> 452 green. PHPStan level 5 clean.
Tier 2 R1. Adds k-sample voting to the context-emit review path so a finding
survives only when >=2/3 of independent passes agree on it, and its confidence
becomes the calibrated vote-share rather than the model's self-reported (and
easily inflated) number.
- FindingVoter: tally(list<list<PatternMatch>>, minVotes) — identity by
pattern_key|file|line; a finding repeated within one sample is one vote;
survivors carry confidence = votes/k.
- PatternMatch::withConfidence() immutable copy for the vote-share overwrite.
- AnalyzeRunner::ingestSamples() validates each sample through the same trust
boundary BEFORE voting (hallucinations can't accrue votes); extracted shared
validate()/checkCount() helpers, ingest() unchanged.
- buildWorkOrder() carries a 'samples' count; --samples=k on emit (capped 1-9);
ingest auto-detects a {samples:[[...]]} envelope vs the legacy {findings:[...]}
single-sample path (backward compatible).
- codeguard-review skill documents the k-pass independent fan-out + envelope.
467 tests (+15), PHPStan level 5 clean, Pint clean.
Tier 2 R2. A second-opinion re-scoring pass lets a fresh subagent re-judge each finding 0-10; the package drops any finding scored 0, keeping the rest with the score surfaced. Composes with R1 voting (vote first, then drop 0-scored survivors) at one extra pass instead of k. - FindingSchema: optional verified_score (integer 0-10), not required. - PatternMatch: nullable verifiedScore parsed in the trust boundary (absent or out-of-range => null/uncritiqued; a clean 0-10 is kept; withConfidence preserves it). The trust boundary parses 0 but does NOT drop it. - AnalyzeRunner::surviveCritique() drops verifiedScore===0 in finish(), so all three paths (synchronous, ingest, voted ingest) get it uniformly, before baseline suppression + gating. - buildWorkOrder carries 'critique'; --critique flag; renderFindings shows [score N/10]. - codeguard-review skill documents the Step 5b critique fan-out. 476 tests (+9), PHPStan level 5 clean, Pint clean.
Tier 2 R3. The catch-all-import patterns (layer-dependency-direction,
bounded-contexts, no-circular-dependencies) judge relationships BETWEEN files,
so a per-file pass never saw them. This builds the cross-file context they need.
- NamespaceGraph: parses scoped files' top-level use edges into a first-party
adjacency map (vendor imports dropped) and pre-computes dependency cycles
(DFS back-edge detection). Pure fromSources() + disk-reading build().
- PhpFileInspector::fqcn(): namespace + declared type => node id.
- PatternMatcher::isGraphLevel()/graphLevel() partition the catch-all-import
patterns; workingDirectory() getter; catch-all list deduped into a constant.
- buildWorkOrder emits 'graph' {nodes,edges,cycles} + 'architecture.patterns'
(the graph-level patterns, listed once — NOT duplicated into per-file units).
- ingest/ingestSamples augment attribution with one architectural unit per
scoped class file that matched no per-file pattern, so an architectural
finding attributes through the same trust boundary (a non-class file in scope
is still rejected). Per-file check count is unchanged.
- FindingSchema/PatternMatch: optional related_file (the other end of a bad
dependency); shown as ' -> FQCN'. withConfidence preserves it.
- codeguard-review skill: Step 4b single-subagent architectural review over the
graph; scope caveat (use --all for a complete graph).
492 tests (+16), PHPStan level 5 clean, Pint clean.
Tier 2 R4. Six AST-invisible patterns at the center of the goal — policing a contractor dev who does NOT use AI. PHPStan/Deptrac cannot see any of these; they need semantic judgement, which is exactly what the analyze engine adds. - mass-assignment (critical): Model::create($request->all()) / $guarded = [] - raw-sql-injection (critical): input interpolated into whereRaw/DB::raw/statement - missing-authorization (critical): store/update/destroy with no authorize()/policy - eloquent-n-plus-one (warning): relation queried inside a loop, no eager load - missing-database-transaction (warning): related writes not wrapped in transaction - unbounded-query (warning): Model::all()/->get() on a growing table, no paginate/chunk Security smells gate by default (critical); perf/integrity smells report (warning) to limit false-positive blocking. Each carries verification.rules + correct/violation few-shot examples and file/directory detection signals (never catch-all import, so they stay per-file, not graph-level). Corpus: 28 -> 34 patterns. Selection coverage asserted in PatternSelectionCoverageTest; recall is the manual ritual (docs/patterns-recall.md, new R4 priority table). 493 tests (+1 selection), PHPStan level 5 clean, Pint clean.
Canonical status: HEAD ddc8d61, suite 493/1175, Tier 2 phase row added, Analyze layer (15 classes incl FindingVoter+NamespaceGraph), corpus 28->34, scorecard pattern-review ~80% (capped pending field recall). Handoff: dedicated Tier 2 section (R1 voting / R2 critique / R3 graph / R4 corpus) + next-action = field validation. patterns-recall.md already carries the R4 priority table.
…view) Cross-model adversarial review (codex, gpt-5-codex) of PR #1 surfaced three correctness gaps; all fixed with TDD (RED->GREEN), suite 496 passing. - F-001 PatternMatch::fromArray accepted ANY known corpus key via PatternRepository::has(), letting a subagent return a finding for a pattern the file was never scoped to review. Third param is now an explicit list<string> extraAllowedKeys (the run's graph-level keys); AnalyzeRunner threads graphLevelKeys() into run/ingest/ingestSamples. - F-002 AnalyzeRunner::findUnit dropped architectural findings on basename collision: NamespaceGraph emits working-dir-relative paths while AnalysisUnit holds absolute paths, so the exact match never hit and the basename fallback was ambiguous. findUnit now resolves the relative path against the working directory before falling back. - F-003 AntiPatternScanner::bareAssertNotNull flagged every assertNotNull($x); even when followed by a behavioural assertion, contradicting its own contract. Rewritten with look-ahead: flags only when the value is not referenced by a later assert/expect. Review artifact + briefings persisted under .atomic-skills/reviews/.
PHPStan on PHP 8.5 reported deadCode.unreachable at the two statements following `expect(fn () => $scope->measure(callable: fn () => throw ...))`: the throwing callable infers Closure(): never, which binds measure()'s template TReturn to never and poisons the control-flow of every following statement. PHP 8.3/8.4 did not infer this, so the committed baseline entry (added from an 8.5 dev box) was unmatched on the CI matrix -> reportUnmatchedIgnoredErrors made CI fail on a non-error. Rewrites the throw assertion as an explicit try/catch (the code after a try-with-never body is reachable via the catch path), which removes the false positive at the source on all PHP versions. Drops the now-unnecessary phpstan-baseline.neon entry (shrinks the baseline, no masking). Verified: phpstan No errors on 8.5 with the entry removed; 496 tests green.
BREAKING CHANGE: minimum PHP is now 8.5 (composer.json php: ^8.3 -> ^8.5). CI matrix reduced to 8.5 only. README/CLAUDE.md/memory updated. composer.lock content-hash is unaffected (platform requirements are excluded from it), so `composer install` stays valid. Goes out in the next minor.
The analyze/phpstan gate stopped masking the coverage step (it used to abort at phpstan first); coverage was 79.8% < 80%. assertNoDbQueriesInFactoryDefinition and assertNoEagerCreateInFactoryDefinition were the largest untested public-API gap (ParallelSafetyAssertions at 50%). Adds two fixture-backed tests exercising both via the same anonymous-subject pattern as the existing trait tests. Total coverage 79.8% -> 80.1%; ParallelSafetyAssertions 50% -> 100%.
henryavila
added a commit
that referenced
this pull request
Jun 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Picks CodeGuard back up after the 0.2.0 release stall. Ships the package-side replan without touching the Arch consumer (read-only reference only): the assertion-trait bug fix, the Patterns engine, and the subscription-friendly context-emit review flow.
1. Fase 1 — assertion traits (
4c662a0)The shipped
TestQualityAssertions+ParallelSafetyAssertionsthrewRuntimeException('Not yet implemented')in all 7 methods while the docs marketed them as working. Now implemented via a framework-freeAntiPatternScanner(Symfony Finder + localized regex, ported from Arch's inline reference). Dropped 2 now-staletrait.unusedbaseline entries.2. Fase 2 — Patterns engine,
codeguard:analyze(0dfb953)Thin Adjudicator design (
docs/specs/2026-06-03-patterns-engine-design.md):src/Analyze/*(Severity, Pattern/DetectionSignal, YamlPatternLoader skipping the 2 non-pattern outliers, FileScopeResolver, PatternMatcher, FindingSchema SSOT,PatternMatch::fromArrayanti-hallucination trust boundary, AnalyzeResult, AnalyzeRunner, LlmClient seam + NullLlmClient).--fail-ongating; reuses the reservedanalyze.endedtelemetry slot. No driver → honest degradation notice (never fakes a clean repo).3. Increment D — context-emit (
18c4492)Subscription-friendly, no metered API (the LLM judgement runs inside a Claude Code session):
--emitwrites a work order JSON;--ingest=<file>validates out-of-band findings through the same trust boundary and gates the exit code.codeguard-reviewskill orchestrates emit → batched subagents (N files each) → merge → ingest. Published via thecodeguard-skillstag.4. Docs/memory
Rewrote
PROJECT-STATUS.md(was wrong on 4 load-bearing facts after a 30-day pre-release freeze) + added the patterns-engine design doc.Test plan
vendor/bin/pest— 377 → 435 passed / 1053 assertionsvendor/bin/phpstan analyse— level 5, No errors (baseline shrank by 2)vendor/bin/pint --test— cleancomposer test:coverage --min=80— runs in CI (no local coverage driver)/codeguard-reviewon a real projectOut of scope (deferred)
coverage_percenthardcoded-1; deadai_rules/prepareconfig cleanup