Skip to content

fix(plugin): kill roster/disposition/skill-count drift (independent review)#86

Merged
avrabe merged 1 commit into
mainfrom
fix/plugin-roster-disposition-drift
Jun 10, 2026
Merged

fix(plugin): kill roster/disposition/skill-count drift (independent review)#86
avrabe merged 1 commit into
mainfrom
fix/plugin-roster-disposition-drift

Conversation

@avrabe

@avrabe avrabe commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Five drift findings from an independent review — all the same root cause (a fact copied to N places, N-1 drifted), fixed by single-source + reference:

  1. "four skills" injected every session — the memory-inject hook preamble + philosophy + toolchain memory said "four"; there are thirteen. De-enumerated all three (the skills/ dir is the source). Highest priority — misinformation on a hot path.
  2. Tool roster had no single source and disagreed across 5 files. The toolchain memory is now the complete roster — added gale, scry (load-bearing in proof-synthesis), thrum (real per CLAUDE.md + the thrum dashboard — not an orphan to remove), temper, mcp. report-tool-friction + the situational-awareness hook now reference it / are synced.
  3. Disposition miscategorization (real routing cost) — release-planning / release-artifact-pipeline were "drivers" but are judgment-heavy; routing them to a low-effort biddable model weakens triage/scoping. Reclassified.
  4. "Two kinds" wasn't exhaustive (4 skills unclassified). Now three classes — driver {release-execution} · hybrids {release-planning, issue-hunt, release-artifact-pipeline} · explorers {6} · cross-cutting {oracle-gate, report-tool-friction, capture-session-learnings} — every skill in exactly one.
  5. Verified arXiv 2605.26457 — "Verus-SpecGym", the LLM-judge-misses-26% result is real. Citation correct, kept.

Patch → 0.8.1 (drift/consistency, not new capability).

🤖 Generated with Claude Code

…eview)

Five drift findings, all the same root cause (a fact copied to N places, N-1
drifted) — fixed by picking the single source and making the rest reference it:

1. "four skills" injected every session — the inject-pulseengine-memory hook
   preamble enumerated four skills, and philosophy + toolchain memory said "four";
   there are thirteen. De-enumerated all three (the skills/ dir is the source).
2. Tool roster had no single source and disagreed across 5 files. Made the
   toolchain memory the complete roster: added gale, scry, thrum, temper, mcp
   (gale + scry are load-bearing in proof-synthesis; thrum is real per CLAUDE.md +
   the thrum dashboard, not an orphan). report-tool-friction + the
   situational-awareness hook now reference the roster / are synced (+thrum).
3. Disposition miscategorization (real routing cost). release-planning and
   release-artifact-pipeline were "drivers" but are judgment-heavy; routing them to
   a low-effort biddable model weakens triage/scoping. Reclassified.
4. "Two kinds" wasn't exhaustive (4 skills unclassified). Now three classes —
   driver {release-execution}, hybrids {release-planning, issue-hunt,
   release-artifact-pipeline}, explorers {6}, cross-cutting {oracle-gate,
   report-tool-friction, capture-session-learnings} — every skill in exactly one.
5. Verified arXiv 2605.26457 (Verus-SpecGym; the LLM-judge-misses-26% result is
   real) — citation correct, kept.

Patch bump to 0.8.1 (drift/consistency, not new capability).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@avrabe avrabe merged commit 1312c0a into main Jun 10, 2026
1 check passed
@avrabe avrabe deleted the fix/plugin-roster-disposition-drift branch June 10, 2026 19:17
avrabe added a commit that referenced this pull request Jun 11, 2026
…urce

Review of the field-report pass surfaced that the reachability fix
(restating gate rules inline in release-execution and the feature-loop
land step) silently contradicts the de-dup principle #86 established —
and an unstated principle is the thing that drifts next: a future
drift-sweep would "correctly" re-consolidate the inline copies back into
the contract, reintroducing the exact bug the field report caught.

- operating-contract: add "Single-source by default — restate inline
  only where absence is unsafe", making the exception explicit policy.
  The test is unsafe-action vs worse-output; execution-critical safety
  rules are intentionally redundant and labeled, everything else stays
  single-source; a drift-sweep must not consolidate the labeled copies.
- Label both inline copies (release-execution, feature-loop step 8) as
  the deliberate reachability-redundancy exception, pointing back to it.
- Generalize the exit-code-over-summary lesson: it's any grepped verifier
  (pytest, kani, verus, CI log), with cargo as the example, not the rule.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
avrabe added a commit that referenced this pull request Jun 11, 2026
* feat(plugin): harden skills from three field reports (v0.9.0)

Apply concrete fixes from three real-use field reports (v1.35→v1.55 relay,
~8-pass issue-hunt, v0.22→v0.27 gale) of the v0.8.x skills. The throughline:
de-dup discipline left safety-critical gate rules reachable only in the
operating-contract memory — unreachable at execution time when that file
isn't loaded — and left the merge-wait unowned.

release-execution:
- Restate the universal gate rules as a self-contained, reachable checklist
  in the skill (the contract keeps the rationale; the skill carries the
  operational asserts) instead of deferring entirely to the memory.
- Add merge-wait teeth: a PR that merged in seconds didn't wait for checks —
  that's a red flag, not a convenience.

pulseengine-feature-loop:
- Step 8 now owns the merge-wait instead of delegating and assuming.
- Recurring N/A on steps 5/6 (witness/sigil) is a backlog item, not an
  exemption — escalate after three features running (this is how a real
  MC/DC gap and a missing attestation chain stayed hidden).
- Make the self-verify cadence concrete: before every tag, every ~2 features.

issue-hunt:
- Self-echo filter: drop items whose only post-watermark activity is the
  loop's own comments (filter by author + comment id), so it stops chasing
  its own tail.
- Structured pending_gates state (pr, watcher, action_on_green) so a PR
  opened-but-unlanded gets owned and merged on a later pass; resolve gates
  at the top of each pass.
- Re-pull from watermark −60s and dedup (exact-boundary clock skew misses).
- Verify .claude/ state file is actually git-ignored before writing — it
  got committed once.

operating-contract:
- Ground-claims: trust a test exit code, not a grepped "all green" tail;
  run one cargo at a time.
- Campaign invariants as a numbered hard checklist with cadence + the
  second-order gate-hardening traps (paths-ignore × required checks,
  strict:true merge train).
- New block: degraded infrastructure is not failure — queued ≠ failed,
  never clear queued main CI as stale, diagnose before re-dispatching.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(plugin): name the reachability-redundancy exception to single-source

Review of the field-report pass surfaced that the reachability fix
(restating gate rules inline in release-execution and the feature-loop
land step) silently contradicts the de-dup principle #86 established —
and an unstated principle is the thing that drifts next: a future
drift-sweep would "correctly" re-consolidate the inline copies back into
the contract, reintroducing the exact bug the field report caught.

- operating-contract: add "Single-source by default — restate inline
  only where absence is unsafe", making the exception explicit policy.
  The test is unsafe-action vs worse-output; execution-critical safety
  rules are intentionally redundant and labeled, everything else stays
  single-source; a drift-sweep must not consolidate the labeled copies.
- Label both inline copies (release-execution, feature-loop step 8) as
  the deliberate reachability-redundancy exception, pointing back to it.
- Generalize the exit-code-over-summary lesson: it's any grepped verifier
  (pytest, kani, verus, CI log), with cargo as the example, not the rule.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant