Skip to content

ci(claude): bump max-turns 24→48 + sharpen anti-pattern in Tools section#452

Merged
heskew merged 1 commit into
mainfrom
workflow/bump-max-turns-and-search-discipline
May 1, 2026
Merged

ci(claude): bump max-turns 24→48 + sharpen anti-pattern in Tools section#452
heskew merged 1 commit into
mainfrom
workflow/bump-max-turns-and-search-discipline

Conversation

@heskew
Copy link
Copy Markdown
Member

@heskew heskew commented May 1, 2026

Summary

Fixes the immediate symptom from harper PR #450's failed review (run 25234303343): agent burned ~20 turns on `Bash(grep -rn …)` repo-wide searches, hit `--max-turns 24`, posted no review comment, log step skipped. Diff was 5 files / +83/−8 — exploration pattern, not diff size, drove the cost.

What changes

  1. `--max-turns 24 → 48` — band-aid. Doubles worst-case budget; ends the immediate dropouts on PRs with non-trivial cross-symbol exploration.
  2. `## Tools` section rewrite:
    • Adds a turn budget hint (1–2 turns context, 1 turn diff, ~1 per file, rest for searches + posting).
    • Names repo-wide search as THE biggest turn-burner with a concrete failure-mode citation ("recent run that timed out before posting anything").
    • Points at `Grep` tool with `path` parameter scoping as the right tool for the job.

Open question (followup)

In the failed run, the agent's `Bash(grep -rn …)` calls all came back with `is_error: false` — they ran. But `Bash(grep …)` shouldn't match any of the existing allowlist patterns (`Bash(git diff:)`, `Bash(git log:)`, `Bash(git blame:)`, `Bash(git show:)`, `Bash(gh ...:*)`).

Either:

  • claude-code-action permits any `Bash(...)` once the tool is enabled at all, or
  • there's a matching gotcha in our patterns.

Worth digging into so the prompt instruction is backed by an actual deny at the action level. Not blocking this PR.

Test plan

  • Push to a PR after this lands → confirm review completes and posts a marker'd comment.
  • Confirm the agent's tool-call pattern: prefers `Grep` over `Bash(grep)`, doesn't do repo-wide recursive searches.
  • Re-trigger PR Read-only mode #450's review (or wait for next push) → confirm review now lands.

Followup

oauth mirror once this merges.

🤖 Generated with Claude Code

Diagnosis from harper PR #450 review failure (run 25234303343):
agent burned ~20 turns on `Bash(grep -rn …)` searches across the
whole repo before hitting `--max-turns 24`, posted no review
comment, and the log step skipped (no marker'd comment to log).
Diff was small (5 files, +83/−8) — exploration pattern, not diff
size, was the cost driver.

Two changes:

1. **--max-turns 24 → 48.** Cheap band-aid that ends the immediate
   dropouts on PRs with non-trivial cross-symbol exploration. Per
   the existing comment, this is the real cost ceiling — bumping
   doubles the worst-case budget and moves us out of the 'one
   exploratory grep too many = no review at all' regime.

2. **Tools section: explicit turn-budget hint + sharpened
   anti-pattern.** Names "repo-wide search" as THE biggest
   turn-burner, points at the `Grep` tool with `path` scoping, and
   cites the failure mode ("recent run that timed out before
   posting anything") so the agent has concrete signal that
   matches its observed behavior.

Out of scope: investigating whether `Bash(grep …)` is supposed to
be denied by the existing `--allowedTools` list (current entries
scope to `Bash(git diff:*)`, `Bash(git log:*)`, `Bash(git blame:*)`,
`Bash(git show:*)`, `Bash(gh ...:*)` — `Bash(grep …)` shouldn't
match any pattern but ran cleanly in #450's failed run, with
`is_error: false` on every grep tool call). Either claude-code-
action permits any `Bash(...)` once the tool is enabled, or there's
a matching gotcha. Tracked as a follow-up; this PR addresses the
immediate symptom.

oauth needs the same change — will mirror after this lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@heskew heskew requested review from a team as code owners May 1, 2026 22:57
@heskew heskew merged commit 0118c1c into main May 1, 2026
22 of 25 checks passed
@heskew heskew deleted the workflow/bump-max-turns-and-search-discipline branch May 1, 2026 23:01
heskew added a commit to HarperFast/oauth that referenced this pull request May 1, 2026
Mirror of HarperFast/harper#452. Same fix applied to the oauth
review workflow: max-turns doubled, turn-budget hint added,
repo-wide search called out as the single biggest turn-burner.

The triggering harper run is referenced as the concrete failure
mode rather than re-running the same diagnosis on oauth (oauth
hasn't seen the same failure yet, but the prompt and budget are
identical, so the same exposure exists).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cb1kenobi pushed a commit that referenced this pull request May 4, 2026
Symptom: every PR was failing `Auth gate invariants / validate`
with `authorize job has no step setting USERS_TO_CHECK env var`,
even on workflows that clearly set it. Reported on PR #452 and
others.

Cause: the check `USERS_TO_CHECK presence` (added in #417 review-
fixup) was using a jq-flavored expression — `// empty` in
particular — but yq on ubuntu-latest is `mikefarah/yq` (Go), not
jq. yq's lexer rejects `empty`. The script had `2>/dev/null` on
the yq invocation, so the lexer error was eaten and the variable
came back as the empty string, tripping the existence check on
every workflow.

Fix: rewrite the expression in idiomatic yq:
- `.jobs.authorize.steps[].env.USERS_TO_CHECK | select(. != null)`
  — emits a stream of one value per step that sets the env var,
  skipping steps that don't.
- `head -1` collapses the stream to a single value (or empty)
  for the `[ -n ]` check.

Verified locally with mikefarah/yq v4.53.2 against all three
harper claude-*.yml workflows — all pass.

Out of scope but worth following up:
- The `2>/dev/null` swallowed a real yq error. Worth either
  removing the suppression or capturing stderr for diagnostics
  when the expression returns empty.
- oauth's mirror (#68) carries the same bug; a copy of this
  fix needs to land there too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kriszyp pushed a commit that referenced this pull request May 4, 2026
Symptom: every PR was failing `Auth gate invariants / validate`
with `authorize job has no step setting USERS_TO_CHECK env var`,
even on workflows that clearly set it. Reported on PR #452 and
others.

Cause: the check `USERS_TO_CHECK presence` (added in #417 review-
fixup) was using a jq-flavored expression — `// empty` in
particular — but yq on ubuntu-latest is `mikefarah/yq` (Go), not
jq. yq's lexer rejects `empty`. The script had `2>/dev/null` on
the yq invocation, so the lexer error was eaten and the variable
came back as the empty string, tripping the existence check on
every workflow.

Fix: rewrite the expression in idiomatic yq:
- `.jobs.authorize.steps[].env.USERS_TO_CHECK | select(. != null)`
  — emits a stream of one value per step that sets the env var,
  skipping steps that don't.
- `head -1` collapses the stream to a single value (or empty)
  for the `[ -n ]` check.

Verified locally with mikefarah/yq v4.53.2 against all three
harper claude-*.yml workflows — all pass.

Out of scope but worth following up:
- The `2>/dev/null` swallowed a real yq error. Worth either
  removing the suppression or capturing stderr for diagnostics
  when the expression returns empty.
- oauth's mirror (#68) carries the same bug; a copy of this
  fix needs to land there too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kriszyp pushed a commit that referenced this pull request May 5, 2026
Symptom: every PR was failing `Auth gate invariants / validate`
with `authorize job has no step setting USERS_TO_CHECK env var`,
even on workflows that clearly set it. Reported on PR #452 and
others.

Cause: the check `USERS_TO_CHECK presence` (added in #417 review-
fixup) was using a jq-flavored expression — `// empty` in
particular — but yq on ubuntu-latest is `mikefarah/yq` (Go), not
jq. yq's lexer rejects `empty`. The script had `2>/dev/null` on
the yq invocation, so the lexer error was eaten and the variable
came back as the empty string, tripping the existence check on
every workflow.

Fix: rewrite the expression in idiomatic yq:
- `.jobs.authorize.steps[].env.USERS_TO_CHECK | select(. != null)`
  — emits a stream of one value per step that sets the env var,
  skipping steps that don't.
- `head -1` collapses the stream to a single value (or empty)
  for the `[ -n ]` check.

Verified locally with mikefarah/yq v4.53.2 against all three
harper claude-*.yml workflows — all pass.

Out of scope but worth following up:
- The `2>/dev/null` swallowed a real yq error. Worth either
  removing the suppression or capturing stderr for diagnostics
  when the expression returns empty.
- oauth's mirror (#68) carries the same bug; a copy of this
  fix needs to land there too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BboyAkers pushed a commit to BboyAkers/harper that referenced this pull request May 26, 2026
Symptom: every PR was failing `Auth gate invariants / validate`
with `authorize job has no step setting USERS_TO_CHECK env var`,
even on workflows that clearly set it. Reported on PR HarperFast#452 and
others.

Cause: the check `USERS_TO_CHECK presence` (added in HarperFast#417 review-
fixup) was using a jq-flavored expression — `// empty` in
particular — but yq on ubuntu-latest is `mikefarah/yq` (Go), not
jq. yq's lexer rejects `empty`. The script had `2>/dev/null` on
the yq invocation, so the lexer error was eaten and the variable
came back as the empty string, tripping the existence check on
every workflow.

Fix: rewrite the expression in idiomatic yq:
- `.jobs.authorize.steps[].env.USERS_TO_CHECK | select(. != null)`
  — emits a stream of one value per step that sets the env var,
  skipping steps that don't.
- `head -1` collapses the stream to a single value (or empty)
  for the `[ -n ]` check.

Verified locally with mikefarah/yq v4.53.2 against all three
harper claude-*.yml workflows — all pass.

Out of scope but worth following up:
- The `2>/dev/null` swallowed a real yq error. Worth either
  removing the suppression or capturing stderr for diagnostics
  when the expression returns empty.
- oauth's mirror (HarperFast#68) carries the same bug; a copy of this
  fix needs to land there too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BboyAkers pushed a commit to BboyAkers/harper that referenced this pull request May 26, 2026
Spike on PR HarperFast#452's failed run found that `Bash(grep -rn …)` calls
all executed (`permission_denials: []`) despite not matching any
listed `Bash(...:*)` pattern. Per claude-code-action's own
`docs/configuration.md` (v1.0.110): "The base GitHub tools are
always included. Use `--allowedTools` to add additional tools,
and `--disallowedTools` to prevent specific tools from being
used." `--allowedTools` is ADDITIVE, not exclusive. In CI's
non-interactive mode (no `canUseTool` callback), the SDK's
`default` permission mode falls through to "execute" rather
than prompting.

So the comment block in claude-mention.yml claiming "Tool
allowlist is a security boundary — every entry is a potential
RCE primitive if a prompt injection succeeds" is overstated.
The allowlist doesn't gate; the actual containment is elsewhere.
The "deliberately absent / tightened" entries (`Bash(npx:*)`
absent, `Bash(npm install)` no-arg) are documentation of intent,
not enforcement — an injected instruction that ran them would
still execute.

This commit replaces those comment blocks in both
claude-mention.yml and claude-issue-to-pr.yml with honest
accounts of where containment actually lives:

  1. Token scope (repo-scoped, no cross-repo reach).
  2. Branch protection on protected refs.
  3. Auth gate (CODEOWNERS-driven team check) — narrow trigger
     surface.
  4. Allowed labels list (issue-to-pr only).
  5. Runner ephemerality.
  6. Prompt-injection guards in the prompt itself.

The notable-absences notes are kept but reframed as prompt-level
discipline rather than allowlist enforcement, with explicit
notes that real mitigation (e.g., `ignore-scripts=true` in
`.npmrc`, deferring installs to a separate CI job) is a future
concern.

Comments-only. No behavior change. Paired with
`HarperFast/ai-review-prompts#8` which fixes the same misleading
comment in the reusable `_claude-review.yml`. claude-review.yml
in this repo will be replaced by the caller pattern when HarperFast#8 +
the harper migration land — fixing it here is unnecessary churn.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BboyAkers pushed a commit to BboyAkers/harper that referenced this pull request May 26, 2026
…ew-prompts reusable

Day 1 of revised Plan A — replaces harper's inline 463-line
`claude-review.yml` with a ~75-line caller of the reusable
workflow shipped in `HarperFast/ai-review-prompts#8`. The single
`uses:` ref pin (currently `0a5ccbc6...` = ai-review-prompts main
2026-05-05) controls everything that needs to move together:

  * The workflow logic itself (the reusable's authorize +
    review jobs, including the marker-based comment edit, the
    learnings-channel via $RUNNER_TEMP, the post-HarperFast#444/HarperFast#447/HarperFast#452
    calibration shape).
  * The layer files referenced by `review-layers` (universal,
    harper/common, harper/v5).
  * The bash scripts the reusable invokes
    (compose-review-scope, find-prior-review-comment,
    log-review-to-ai-review-log, authorize-claude-workflow).
  * The fix-up commit baking in the honest-allowlist comment
    block (no longer claiming "Tool allowlist is a security
    boundary").

What lands

  * `.github/workflows/claude-review.yml` — replaced with the
    thin caller. Inputs: `review-layers` (universal +
    harper/common + harper/v5), `repo-specific-checks` (the
    Harper-core block, including the "this repo IS Harper core,
    so 'defer to Harper docs' applies to PLUGIN docs, not
    in-repo docs" caveat that was previously inline). Secrets
    passed through.
  * Removes three scripts the reusable now owns:
    `.github/scripts/compose-review-scope.sh`,
    `find-prior-review-comment.sh`,
    `log-review-to-ai-review-log.sh`.
  * Keeps three scripts still used by the inline mention /
    issue-to-pr workflows:
    `.github/scripts/authorize-claude-workflow.sh` (used by
    mention + issue-to-pr's authorize jobs; same script as the
    reusable, intentional duplication until those workflows
    migrate too), `parse-claude-mention.sh`,
    `validate-auth-gate-invariants.sh`.
  * Updates `validate-auth-gate-invariants.sh` to handle
    caller-pattern workflows: when a `claude-*.yml` has no
    inline authorize job, the validator now confirms the
    workflow invokes a `HarperFast/...` reusable via `uses:`
    pinned to a 40-char SHA (mutable refs would let an
    attacker silently repoint to a weakened version of the
    reusable). The reusable's structural invariants are
    validated separately by ai-review-prompts' own
    auth-gate-invariants.yml.

What's NOT in this PR

  * `claude-mention.yml` and `claude-issue-to-pr.yml` stay
    inline. Reusables for those will land later as Day 2 of
    the revised Plan A; same caller-migration pattern.
  * oauth migration to caller — separate PR after this lands.

Verified locally

  * YAML parses for both the new caller and the updated
    validator.
  * `bash .github/scripts/validate-auth-gate-invariants.sh`
    against all three current workflows: passes.
    `claude-review.yml` is recognized as caller-pattern and the
    SHA pin is enforced. The two inline workflows still pass
    the structural check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant