Skip to content

feat: Add assertion verification and scope lock red lines (v1.7.0) #11

@qozle

Description

@qozle

Problem

Two behavioral failure patterns emerge repeatedly from algorithm runs that the current red lines don't cover:

1. Unverified factual assertions

The algorithm produces VERIFY phase claims with evidence, but makes unverified factual claims earlier in the run (during OBSERVE, PLAN, or EXECUTE) that users treat as true. Examples: stating market hours without checking, claiming software is free without verifying licensing, asserting a deployment succeeded without confirming. These erode trust independently of whether VERIFY phase evidence is solid.

The existing "No rubber-stamp verification" red line covers VERIFY phase evidence. It doesn't address confident factual claims made earlier in the run as context-setting statements.

2. Scope creep during EXECUTE

The algorithm defines ISC in OBSERVE and refines it through PLAN, but nothing stops new work from being added during EXECUTE without the user's awareness. Common pattern: a research or explore task reveals something buildable, the algorithm starts building it, the user discovers they didn't ask for that. Or: a bug fix also "helpfully" adds cleanup, aliases, or adjacent features.

The ISC Quality Gate enforces criteria completeness before BUILD. There's no symmetric gate enforcing scope containment during EXECUTE.

Evidence

Analyzed 51 failure captures and 78 algorithm reflections over 30 days:

  • Unverified assertions appeared in 7+ distinct failure instances (market hours, licensing, prices, deployment status, API behavior, credential validity, GitHub UI state)
  • Scope expansion without approval appeared in 6+ instances (build from explore, unplanned data pulls, alias additions, adjacent features)
  • Both patterns produced the lowest sentiment ratings in the dataset (avg 3.0–3.1/10 vs overall 5.1/10)

Proposed Fix

Add two red lines to the existing "5 red lines" section:

- **No unverified factual assertions (v1.7.0).** Before stating ANY current-state fact — prices,
  service status, API behavior, software licensing, deployment state, what's visible in a UI —
  verify it with a tool call first. Stating a guess as fact is a trust violation equivalent to
  rubber-stamp verification. If you cannot verify, say "I haven't verified this." This applies
  in every phase, not just VERIFY.

- **No scope expansion without approval (v1.7.0).** The ISC defined at the end of PLAN is the
  complete scope of work. If you discover adjacent work during EXECUTE (cleanup, bonus features,
  extra data pulls, convenience additions), STOP and ask before doing it. "Explore X" = find
  and report. "Fix X" = fix that specific thing. Neither authorizes building Y. Test before any
  unplanned action: "Did the user explicitly request this?" If no → ask.

These follow the same structure as the existing red lines and slot naturally after the v1.6.0 orphaned PASS claims rule.

References

Related: #8 (Phase-Transition Gates for ISC traceability) — that issue covers ISC state tracking; this covers behavioral guardrails during execution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions