Skip to content

feat: assertion traits + Patterns engine incl. context-emit (codeguard:analyze)#1

Merged
henryavila merged 21 commits into
mainfrom
feat/patterns-engine-foundation
Jun 4, 2026
Merged

feat: assertion traits + Patterns engine incl. context-emit (codeguard:analyze)#1
henryavila merged 21 commits into
mainfrom
feat/patterns-engine-foundation

Conversation

@henryavila
Copy link
Copy Markdown
Owner

@henryavila henryavila commented Jun 3, 2026

Summary

Picks CodeGuard back up after the 0.2.0 release stall. Ships the package-side replan without touching the Arch consumer (read-only reference only): the assertion-trait bug fix, the Patterns engine, and the subscription-friendly context-emit review flow.

1. Fase 1 — assertion traits (4c662a0)

The shipped TestQualityAssertions + ParallelSafetyAssertions threw RuntimeException('Not yet implemented') in all 7 methods while the docs marketed them as working. Now implemented via a framework-free AntiPatternScanner (Symfony Finder + localized regex, ported from Arch's inline reference). Dropped 2 now-stale trait.unused baseline entries.

2. Fase 2 — Patterns engine, codeguard:analyze (0dfb953)

Thin Adjudicator design (docs/specs/2026-06-03-patterns-engine-design.md): src/Analyze/* (Severity, Pattern/DetectionSignal, YamlPatternLoader skipping the 2 non-pattern outliers, FileScopeResolver, PatternMatcher, FindingSchema SSOT, PatternMatch::fromArray anti-hallucination trust boundary, AnalyzeResult, AnalyzeRunner, LlmClient seam + NullLlmClient). --fail-on gating; reuses the reserved analyze.ended telemetry slot. No driver → honest degradation notice (never fakes a clean repo).

3. Increment D — context-emit (18c4492)

Subscription-friendly, no metered API (the LLM judgement runs inside a Claude Code session):

  • --emit writes a work order JSON; --ingest=<file> validates out-of-band findings through the same trust boundary and gates the exit code.
  • New codeguard-review skill orchestrates emit → batched subagents (N files each) → merge → ingest. Published via the codeguard-skills tag.
  • Removed the 3 stale Node-era skills. Closes R11.

4. Docs/memory

Rewrote PROJECT-STATUS.md (was wrong on 4 load-bearing facts after a 30-day pre-release freeze) + added the patterns-engine design doc.

Test plan

  • vendor/bin/pest — 377 → 435 passed / 1053 assertions
  • vendor/bin/phpstan analyse — level 5, No errors (baseline shrank by 2)
  • vendor/bin/pint --test — clean
  • composer test:coverage --min=80 — runs in CI (no local coverage driver)
  • Field validation: run /codeguard-review on a real project

Out of scope (deferred)

  • coverage_percent hardcoded -1; dead ai_rules/prepare config cleanup
  • Fase 3 (schema dump + AI-rules generator); any Arch integration (user constraint)

…aits

The two shipped traits previously threw RuntimeException('Not yet
implemented') in all 7 methods while CHANGELOG/README marketed them as
working — a correctness bug wired into the published TestQualityTest stub.

Implement them via a framework-free AntiPatternScanner (Symfony Finder +
localized regex) that walks tests/ + database/factories/ honouring an
allowlist, ported from Arch's inline grep-based reference. Traits become
thin adapters that fail the test (PHPUnit Assert) on violations.

- AntiPatternScanner: 7 checks (tautological assertions, Eloquent alias
  mocking, bare assertNotNull, truncate/forceDelete in tests, DB queries
  + eager create() in Factory::definition with comment/string stripping)
- ResolvesAntiPatternScanner concern shared by both traits
- 21 unit tests against hermetic temp fixtures (no Laravel boot)
- Drop 2 now-stale trait.unused baseline entries (traits are live again)

Suite 377 -> 398 green. PHPStan level 5 clean.
PROJECT-STATUS.md was frozen 30 days in a pre-release snapshot and wrong
on 4 load-bearing facts (audited against git+FS via multi-agent review):
- "release: nenhum ainda" -> 0.2.0 IS on Packagist (tag pushed)
- "68 commits ahead" -> origin/main == HEAD (0 ahead)
- Arch "path repo/dev-main" -> Arch consumes via vcs + ^0.2.0
- Arch "usa codeguard:check produtivamente" -> Arch bypasses the CLI,
  uses package as a one-shot stub-seeder, no config/codeguard.php

Rewrite to truth + new plan: package-side first (Patterns engine),
Arch integration deferred by user constraint. Corrected scorecard,
risks (R9/R10/R11), 28->30 pattern count.

Add docs/specs/2026-06-03-patterns-engine-design.md (Thin Adjudicator,
chosen by judge-panel): pattern data contract, src/Analyze/* component
map, LlmClient seam, TDD increments (MVP A-C ~4.5d network-free,
real driver D +1.5d), telemetry reuse, honesty check.
…LLM seam

Implements the Thin Adjudicator design (docs/specs/2026-06-03-patterns-engine-design.md),
package-side only and network-free for the MVP (Increments A–C).

src/Analyze/*:
- Severity enum (+ weight/meets for --fail-on threshold)
- Pattern + DetectionSignal DTOs; YamlPatternLoader (implements PatternRepository)
  loads the package's own resources/patterns/{core,php,php-laravel}, applies the
  verification+examples discriminator (skips preset.yaml/module.yaml outliers)
- FileScopeResolver: --changed-only (git via CommandExecutor), --path, --all
- PatternMatcher: detection.signals → AnalysisUnit (glob/{}/** + dir + import PSR-4)
- LlmClient seam + NullLlmClient default; FindingSchema as request/validation SSOT
- PatternMatch::fromArray trust boundary (drops hallucinated/file-mismatched/
  out-of-range findings); AnalyzeResult (passed/failed per threshold)
- AnalyzeRunner: one LLM call per file, emits analyze.ended (reused telemetry slot)

CodeguardAnalyzeCommand: --changed-only|--path|--all, --fail-on=critical|...|never,
--context. With no driver it prints an honest degradation notice + telemetry status
Skip — it never fakes a clean repo.

Tests: 33 new (unit per class + FakeLlmClient + command feature with temp telemetry).
Suite 398 -> 433 green. PHPStan level 5 clean. Real driver (Anthropic/subscription)
is Increment D, deferred pending the transport decision.
…ew skill

Increment D in the subscription-friendly shape the user chose: no metered API.
The LLM judgement runs inside a Claude Code session; the package stays the
deterministic harness.

- AnalyzeRunner: extract units() core; add buildWorkOrder() (serialize units +
  prompt-ready patterns + finding schema) and ingest() (validate out-of-band
  findings through the SAME PatternMatch trust boundary, then gate + telemeter).
- Pattern::toPromptArray() — LLM-facing projection (no metadata/signals).
- codeguard:analyze --emit[/--out] writes a work order JSON (no LLM call);
  --ingest=<file> validates findings + gates exit by --fail-on.
- New codeguard-review skill: emit → fan out batched subagents (one per N files)
  to judge each file against its patterns on the subscription → merge → ingest.
  Published via the codeguard-skills tag to .claude/skills.
- Remove the 3 stale Node-era skills (codeguard-setup/run/health) — they
  referenced npx/node_modules/codeguard.yaml/hook-runner.js that the PHP package
  never produces; codeguard:install owns setup now. Closes R11.

Suite 433 -> 435 green (+ --emit/--ingest feature tests). PHPStan level 5 clean.
@henryavila henryavila changed the title feat: assertion traits + Patterns engine MVP (codeguard:analyze) feat: assertion traits + Patterns engine incl. context-emit (codeguard:analyze) Jun 3, 2026
henryavila added 15 commits June 3, 2026 14:45
…sing, baseline

Tier 0+1 of the completeness roadmap: cross from "scaffolding" to a reviewer
worth gating a contractor PR on. All package-side, subscription-only, no
metered API. (Corrections from the adversarial design critic incorporated.)

T5 — exact-path attribution: AnalyzeRunner::findUnit matches the exact absolute
path first, basename only when unambiguous — two User.php no longer silently
cross-attribute.

T2 — real `use`-statement parsing (PhpFileInspector, zero new dep): `import`
detection signals now match a file's actual imports (namespace-glob), so
service-layer/dto/policies/form-requests fire on files that USE them, not just
the artifact folder. Catch-all `import: **/*` (the 3 cross-file architectural
patterns) excluded from per-file selection until the namespace graph (R3).
Class-structure patterns gated to files that declare a class.

T4 — accepted-finding baseline (AnalyzeBaseline): `--accept` records findings;
future runs suppress + always show "N suppressed". Reverses v0 never-baseline
(deliberate). Fingerprint = sha1(pattern_key + relative_file): message-free and
line-free so LLM rephrasing / edits-above don't defeat it.

Scope-coverage test + docs/patterns-recall.md for the manual recall ritual.

Suite 435 -> 452 green. PHPStan level 5 clean.
Tier 2 R1. Adds k-sample voting to the context-emit review path so a finding
survives only when >=2/3 of independent passes agree on it, and its confidence
becomes the calibrated vote-share rather than the model's self-reported (and
easily inflated) number.

- FindingVoter: tally(list<list<PatternMatch>>, minVotes) — identity by
  pattern_key|file|line; a finding repeated within one sample is one vote;
  survivors carry confidence = votes/k.
- PatternMatch::withConfidence() immutable copy for the vote-share overwrite.
- AnalyzeRunner::ingestSamples() validates each sample through the same trust
  boundary BEFORE voting (hallucinations can't accrue votes); extracted shared
  validate()/checkCount() helpers, ingest() unchanged.
- buildWorkOrder() carries a 'samples' count; --samples=k on emit (capped 1-9);
  ingest auto-detects a {samples:[[...]]} envelope vs the legacy {findings:[...]}
  single-sample path (backward compatible).
- codeguard-review skill documents the k-pass independent fan-out + envelope.

467 tests (+15), PHPStan level 5 clean, Pint clean.
Tier 2 R2. A second-opinion re-scoring pass lets a fresh subagent re-judge each
finding 0-10; the package drops any finding scored 0, keeping the rest with the
score surfaced. Composes with R1 voting (vote first, then drop 0-scored
survivors) at one extra pass instead of k.

- FindingSchema: optional verified_score (integer 0-10), not required.
- PatternMatch: nullable verifiedScore parsed in the trust boundary (absent or
  out-of-range => null/uncritiqued; a clean 0-10 is kept; withConfidence
  preserves it). The trust boundary parses 0 but does NOT drop it.
- AnalyzeRunner::surviveCritique() drops verifiedScore===0 in finish(), so all
  three paths (synchronous, ingest, voted ingest) get it uniformly, before
  baseline suppression + gating.
- buildWorkOrder carries 'critique'; --critique flag; renderFindings shows
  [score N/10].
- codeguard-review skill documents the Step 5b critique fan-out.

476 tests (+9), PHPStan level 5 clean, Pint clean.
Tier 2 R3. The catch-all-import patterns (layer-dependency-direction,
bounded-contexts, no-circular-dependencies) judge relationships BETWEEN files,
so a per-file pass never saw them. This builds the cross-file context they need.

- NamespaceGraph: parses scoped files' top-level use edges into a first-party
  adjacency map (vendor imports dropped) and pre-computes dependency cycles
  (DFS back-edge detection). Pure fromSources() + disk-reading build().
- PhpFileInspector::fqcn(): namespace + declared type => node id.
- PatternMatcher::isGraphLevel()/graphLevel() partition the catch-all-import
  patterns; workingDirectory() getter; catch-all list deduped into a constant.
- buildWorkOrder emits 'graph' {nodes,edges,cycles} + 'architecture.patterns'
  (the graph-level patterns, listed once — NOT duplicated into per-file units).
- ingest/ingestSamples augment attribution with one architectural unit per
  scoped class file that matched no per-file pattern, so an architectural
  finding attributes through the same trust boundary (a non-class file in scope
  is still rejected). Per-file check count is unchanged.
- FindingSchema/PatternMatch: optional related_file (the other end of a bad
  dependency); shown as ' -> FQCN'. withConfidence preserves it.
- codeguard-review skill: Step 4b single-subagent architectural review over the
  graph; scope caveat (use --all for a complete graph).

492 tests (+16), PHPStan level 5 clean, Pint clean.
Tier 2 R4. Six AST-invisible patterns at the center of the goal — policing a
contractor dev who does NOT use AI. PHPStan/Deptrac cannot see any of these;
they need semantic judgement, which is exactly what the analyze engine adds.

- mass-assignment (critical): Model::create($request->all()) / $guarded = []
- raw-sql-injection (critical): input interpolated into whereRaw/DB::raw/statement
- missing-authorization (critical): store/update/destroy with no authorize()/policy
- eloquent-n-plus-one (warning): relation queried inside a loop, no eager load
- missing-database-transaction (warning): related writes not wrapped in transaction
- unbounded-query (warning): Model::all()/->get() on a growing table, no paginate/chunk

Security smells gate by default (critical); perf/integrity smells report
(warning) to limit false-positive blocking. Each carries verification.rules +
correct/violation few-shot examples and file/directory detection signals (never
catch-all import, so they stay per-file, not graph-level). Corpus: 28 -> 34
patterns. Selection coverage asserted in PatternSelectionCoverageTest; recall is
the manual ritual (docs/patterns-recall.md, new R4 priority table).

493 tests (+1 selection), PHPStan level 5 clean, Pint clean.
Canonical status: HEAD ddc8d61, suite 493/1175, Tier 2 phase row added, Analyze
layer (15 classes incl FindingVoter+NamespaceGraph), corpus 28->34, scorecard
pattern-review ~80% (capped pending field recall). Handoff: dedicated Tier 2
section (R1 voting / R2 critique / R3 graph / R4 corpus) + next-action = field
validation. patterns-recall.md already carries the R4 priority table.
…view)

Cross-model adversarial review (codex, gpt-5-codex) of PR #1 surfaced three
correctness gaps; all fixed with TDD (RED->GREEN), suite 496 passing.

- F-001 PatternMatch::fromArray accepted ANY known corpus key via
  PatternRepository::has(), letting a subagent return a finding for a pattern
  the file was never scoped to review. Third param is now an explicit
  list<string> extraAllowedKeys (the run's graph-level keys); AnalyzeRunner
  threads graphLevelKeys() into run/ingest/ingestSamples.
- F-002 AnalyzeRunner::findUnit dropped architectural findings on basename
  collision: NamespaceGraph emits working-dir-relative paths while AnalysisUnit
  holds absolute paths, so the exact match never hit and the basename fallback
  was ambiguous. findUnit now resolves the relative path against the working
  directory before falling back.
- F-003 AntiPatternScanner::bareAssertNotNull flagged every assertNotNull($x);
  even when followed by a behavioural assertion, contradicting its own
  contract. Rewritten with look-ahead: flags only when the value is not
  referenced by a later assert/expect.

Review artifact + briefings persisted under .atomic-skills/reviews/.
PHPStan on PHP 8.5 reported deadCode.unreachable at the two statements
following `expect(fn () => $scope->measure(callable: fn () => throw ...))`:
the throwing callable infers Closure(): never, which binds measure()'s template
TReturn to never and poisons the control-flow of every following statement.
PHP 8.3/8.4 did not infer this, so the committed baseline entry (added from an
8.5 dev box) was unmatched on the CI matrix -> reportUnmatchedIgnoredErrors made
CI fail on a non-error.

Rewrites the throw assertion as an explicit try/catch (the code after a
try-with-never body is reachable via the catch path), which removes the false
positive at the source on all PHP versions. Drops the now-unnecessary
phpstan-baseline.neon entry (shrinks the baseline, no masking). Verified: phpstan
No errors on 8.5 with the entry removed; 496 tests green.
BREAKING CHANGE: minimum PHP is now 8.5 (composer.json php: ^8.3 -> ^8.5).
CI matrix reduced to 8.5 only. README/CLAUDE.md/memory updated. composer.lock
content-hash is unaffected (platform requirements are excluded from it), so
`composer install` stays valid. Goes out in the next minor.
The analyze/phpstan gate stopped masking the coverage step (it used to abort
at phpstan first); coverage was 79.8% < 80%. assertNoDbQueriesInFactoryDefinition
and assertNoEagerCreateInFactoryDefinition were the largest untested public-API
gap (ParallelSafetyAssertions at 50%). Adds two fixture-backed tests exercising
both via the same anonymous-subject pattern as the existing trait tests. Total
coverage 79.8% -> 80.1%; ParallelSafetyAssertions 50% -> 100%.
@henryavila henryavila merged commit ff26dd9 into main Jun 4, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant