Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@
},
"metadata": {
"description": "Polycli host adapters for Claude Code and related agent CLIs",
"version": "0.6.24"
"version": "0.6.25"
},
"plugins": [
{
"name": "polycli",
"description": "Claude Code adapter for the shared polycli companion",
"version": "0.6.24",
"version": "0.6.25",
"source": "./plugins/polycli"
}
]
Expand Down
4 changes: 2 additions & 2 deletions .github/plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@
},
"metadata": {
"description": "Polycli marketplace for GitHub Copilot CLI",
"version": "0.6.24"
"version": "0.6.25"
},
"plugins": [
{
"name": "polycli-copilot",
"description": "Run the shared polycli companion from GitHub Copilot CLI",
"version": "0.6.24",
"version": "0.6.25",
"source": "./plugins/polycli-copilot"
}
]
Expand Down
18 changes: 18 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,24 @@ Separate from `docs/release.md` (release-focused) and `docs/archive/session-memo

---

## 2026-06-19 — Claude — docs: cc-X domestic-model endpoint recipes (Path-B docs + reference data)

- Added `docs/cc-x-endpoints.md` (human reference) + `docs/cc-x-recipes.json` (machine-readable source of truth) encoding the cc-X pattern: point the EXISTING `claude` runtime (BYOK) or `opencode` (OpenAI-compatible) at a domestic vendor's Anthropic-compatible endpoint via `ANTHROPIC_BASE_URL` + `ANTHROPIC_AUTH_TOKEN` + `ANTHROPIC_MODEL`. Covers 9 entries across 7 PRC core labs (MiniMax, Moonshot Kimi, Zhipu GLM, Alibaba Qwen, DeepSeek, ByteDance Doubao, StepFun, Baidu Qianfan, Tencent) with per-vendor base URL, model-id family, native-CLI grouping, context-window (`autoCompactWindow`), caching note, and a `source` URL+date per entry.
- Encoded the operational gotchas: silent prompt-cache degradation on shim endpoints (dual cache-breakpoint; DeepSeek is the auto-prefix-caching exception), pin a known-good Claude Code version + `CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1`, size `CLAUDE_CODE_AUTO_COMPACT_WINDOW` to the model's context, marketplace (Baidu/Tencent) model-identity instability, and the PRC data-sovereignty/Entity-List gate as SEPARATE from harness choice.
- Honest-default: marketplace/resale endpoints carry `status: "marketplace-unstable"` and `autoCompactWindow: null` (no fabricated model/version pin), mirroring the gemini attempted-vs-used-model caveat in `docs/model-fallback-policy.md`. Enforced by `scripts/validate-cc-x-recipes.mjs` (a pure validator modeled on `validate-fixture-metadata.mjs`) + `scripts/tests/validate-cc-x-recipes.test.mjs` (auto-joined by the npm-test glob); added `npm run validate:cc-x-recipes` for standalone use.
- Documented that cc-X is NOT a polycli provider/adapter/runtime — it rides the existing runtimes via standard env vars; the `claude -p` path forwards them via full `process.env` inheritance, while the tmux allowlist (`CLAUDE_TMUX_ENV_EXACT`) forwards the `ANTHROPIC_*` trio but NOT the `CLAUDE_CODE_*` knobs (documented, not fixed). Clarified that the polycli `minimax`/`mmx-cli` provider is a stateless text/media call, not the MiniMax cc-X coding path.
- Cross-linked from `docs/provider-paths.md` (new subsection + Official-references bullet) and `docs/polycli-v1-public-surface.md` (one out-of-contract sentence). Recorded the no-adapter decision in `docs/roadmap.md` as closed Q10 + an Explicit-non-goals bullet. Zero runtime/production-path code change; `claude.js` env behavior left untouched by design. Verification: `node scripts/validate-cc-x-recipes.mjs` ok (9 entries), `node --test scripts/tests/validate-cc-x-recipes.test.mjs` 5/5, `npm test` + `npm run release:check` green. Snapshot facts are 2026-06-19; the validator guards structure + source-anchoring, not current-truth.

## 2026-06-19 — Claude — adversarial re-verification of the workflow-review remediation

- Independently re-verified the committed remediation sweep (d272042 + 03ae92d) with a Workflow fan-out (9 adversarial auditors -> double-refutation -> completeness critic). 18 raw findings -> 7 confirmed + 1 critic-confirmed; 11 refuted. Confirmed the prior fixes are sound and re-ran full validation (the prior round's open residual #1): `npm test` 544/544, `npm run release:check` exit 0.
- Closed residual #3 (state-root permissions) with real-filesystem evidence: under permissive umask 000, stateRoot/stateDir/jobsDir resolve to 0700 and state.json/job-config to 0600, enforced by explicit chmod (not umask). Characterized residual #4 (orphan `<jobId>.json` result files leak after MAX_JOBS pruning) as a PRE-EXISTING latent issue — `removeJobFile` is a dead export and the old code pruned identically — so it is out of scope for this remediation and left flagged, not fixed.
- Fixed 2 confirmed regressions introduced by the remediation: (1) the opencode host adapter threw on exit code 2, but 2 is the companion's documented soft signal (`health` with no healthy provider, `status --wait` timeout) that still emits a valid JSON envelope on stdout — extracted `isHardCompanionFailure(status)` so exit 2 returns the envelope while exit 1/4/5/crash still reject; (2) `cancelJob` ran `cleanupRuntimePaths` (which deletes a review job's live cwd via cleanupPaths) BEFORE killing the worker — reordered to kill first, then clean up, and skip the runtime-path deletion entirely when the kill fails (worker may still be alive).
- Fixed 2 confirmed incomplete fixes: (1) Grok `SUCCESS_STOP_REASONS` omitted `MaxTokens`, so a truncated-but-visible answer was wrongly marked ok=false — added maxtokens/max_tokens/length (grok-build's real StopReason enum is {EndTurn, MaxTokens, MaxTurnRequests, Refusal, ToolUse, Cancelled}, verified against the installed binary); refusal/cancelled/tool_use/max_turn_requests stay non-success; (2) the run-ledger append path created `~/.polycli/state/<slug>` world-traversable (0o755) via the mode-less ensureParentDir on the run_started event that fires before any other state write — `appendRunLedgerEvent` now calls `ensureStateDir` first to land it 0o700.
- Closed 4 confirmed test gaps (all mutation/RED-proven): pre-existing-0755 dir hardening test for `ensureStateDir` (state-1); state-dir-0700-after-append-only test for the run-ledger path (pwp-2, RED-proven); Grok non-success-stopReason-ALONE failure tests for both parseGrokJsonResult and runGrokPromptStreaming plus a MaxTokens-success test (test-1 + grok-1, RED-proven); sync `runProviderPrompt` explicit-model-before-default fallback test mirroring the streaming case (qwen-model-1); new `scripts/tests/opencode-host.test.mjs` pinning the exit-2 soft-signal contract (oc-status-1).
- All changes respect the Path B architecture boundary: no shared runtime base class, no provider parser promotion into polycli-utils, timing four-state untouched, cleanupPaths still sourced only from internal review temp dirs.
- Verification: focused RED/GREEN proofs for grok-1 and pwp-1 (reverting each fix turns its new test red); focused suite 66/66; `npm test` 544/544 (535 + 9 new tests); `npm run release:check` exit 0 (plugin bundles 5, fixture metadata 17, codex adapter 5; one tmux.jsonl ENOENT flake on the first run was the known full-suite-parallel-load flake — claude.test.js passes 28/28 in isolation, and the re-run was clean). Not published; current unreleased workspace work after v0.6.24.

## 2026-06-16 — Codex — Grok fixture residual cleanup

- Closed the remaining workflow-review residual risk by capturing a real Grok streaming fixture with `grok 0.2.51 (f4f85a6492e) [stable]`: `grok -p 'Reply with exactly HELLO_GROK_FIXTURE and nothing else.' --output-format streaming-json -m grok-build --permission-mode plan --disable-web-search --max-turns 1`.
Expand Down
54 changes: 54 additions & 0 deletions docs/cc-x-endpoints.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# cc-X endpoint recipes (no native CLI cluster)

Snapshot: 2026-06-19. Reference only — not a routing oracle, and **not a polycli runtime**. Review monthly, before release, and whenever a vendor endpoint changes. The machine-readable source of truth is [`cc-x-recipes.json`](./cc-x-recipes.json); this page is its human narration. Re-verify any row against its `source` URL before relying on it.

## What cc-X is

"cc-X" is the pattern of pointing a top-tier agentic-coding harness at a domestic LLM vendor's **Anthropic-compatible** endpoint with three standard environment variables:

```bash
export ANTHROPIC_BASE_URL="https://api.<vendor>/anthropic"
export ANTHROPIC_AUTH_TOKEN="<your vendor key>" # BYOK
export ANTHROPIC_MODEL="<vendor model id>"
```

The harness is **Claude Code** for vendors with no competitive native coding CLI, or **opencode** when the target is an OpenAI-compatible model. cc-X wins for the no-native-CLI cluster because it is the best-AVAILABLE, co-designed, and 5-18x-cheaper scaffold — **not** because Claude Code is the highest-scoring harness (controlled ablations show other open models score higher under other harnesses; that nuance lives in the vendor system cards, not here, per the `docs/roadmap.md` Q7 source discipline that forbids citing un-sourced benchmark scores).

Provider grouping:

- **No competitive native coding CLI → cc-X is the path:** MiniMax, DeepSeek, Zhipu/GLM, StepFun.
- **Has a native CLI → cc-X is a choice, not a default:** Moonshot (Kimi Code), Alibaba (Qwen Code), ByteDance (Trae / trae-agent), Baidu (Comate Zulu-CLI), Tencent (CodeBuddy Code), Xiaomi (MiMo Code).

## How this rides existing polycli runtimes

cc-X is **not** a polycli provider, adapter, or runtime, and this PR adds none. The recipe runs through the EXISTING `claude` runtime (BYOK env, no vendor CLI) or `opencode` (OpenAI-compatible models). polycli already forwards `ANTHROPIC_BASE_URL` / `ANTHROPIC_AUTH_TOKEN` / `ANTHROPIC_MODEL`:

- On the default headless `claude -p` path, the runtime inherits the full `process.env`, so all three (and the `CLAUDE_CODE_*` knobs below) pass through unchanged.
- On the explicit/internal tmux TUI path, the runtime forwards only an `ANTHROPIC_*` allowlist (`CLAUDE_TMUX_ENV_EXACT` in `packages/polycli-runtime/src/claude.js`). The three `ANTHROPIC_*` vars pass through there too, **but `CLAUDE_CODE_AUTO_COMPACT_WINDOW` / `CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS` are NOT in that allowlist and will not reach a tmux session.** Set those two knobs on the default `claude -p` path, or export them inside the tmux session itself.

There is no code to add: set the env vars and run `claude` (or polycli's `claude` provider) normally.

## Operational gotchas (durable)

These are the hard-won knobs the recipes encode. Per-entry specifics (base URLs, model-id families, per-vendor context window) live in `cc-x-recipes.json`.

1. **Prompt caching is silently degraded on shim endpoints.** Claude Code's single cache-breakpoint produces a near-zero hit rate against MiniMax / Kimi shims, so the system prompt + tool schemas get re-billed every turn. Mitigation: use a dual cache-breakpoint and verify the gateway does not gate caching on whether the model is literally named `claude`. **DeepSeek is the exception** — it does automatic server-side prefix caching, so no client mitigation is needed.
2. **Pin a known-good Claude Code version and set `CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1`.** Claude Code auto-attaches experimental `anthropic-beta` headers that periodically 400 third-party endpoints on upgrade.
3. **Set `CLAUDE_CODE_AUTO_COMPACT_WINDOW` to the model's real context** or Claude Code compacts prematurely. Per-model values are in `cc-x-recipes.json` (`autoCompactWindow`): e.g. DeepSeek 128000, Kimi 262144, MiniMax-M3 512000. `null` means we deliberately did not pin one.
4. **Marketplace endpoints have no stable model identity.** See the next section.

## Marketplace endpoints: honest-default refusal to pin

Baidu Qianfan and Tencent's coding gateway are **resale/marketplace** endpoints (`marketplace: true`, `status: "marketplace-unstable"`). One `ANTHROPIC_MODEL` string can silently resolve to a different vendor or version, there is no client-side version pinning, and 2026 price hikes mean model identity is not stable over time. The recipe file deliberately leaves `autoCompactWindow: null` and ships no pinned model id for these entries — fabricating a stable pin would repeat exactly the "attempted vs used model" dishonesty already documented for gemini in [`docs/model-fallback-policy.md`](./model-fallback-policy.md). Treat the model string you send as a *request*, not a guarantee.

## Data sovereignty is a separate gate

PRC data-residency and Entity-List exposure are a **separate** decision from harness choice. The levers are intl endpoints, zero-retention terms, or self-hosted open weights (GLM-5.x MIT, Kimi mod-MIT, Qwen Apache-2.0) — not anything polycli does. China ToS does **not** make cc-X fragile: BYOK + a non-Anthropic base URL is documented and supported by Anthropic. The residual risk is indirect (export-screening could kill the native-Claude fallback; Anthropic could later gate the client), not a ToS trap.

## Not the same as the polycli `minimax` provider

polycli already has a `minimax` provider that calls official `mmx-cli` (`mmx text chat --output json --non-interactive`). That is a **stateless text/media call**, not the MiniMax cc-X coding path. If you want MiniMax-M2/M3 as a coding agent, use the cc-X recipe above (Claude Code against `ANTHROPIC_BASE_URL=https://api.minimax.io/anthropic`), not the `minimax` provider. The `MiniMax text / multimodal` row in [`provider-paths.md`](./provider-paths.md) is the stateless-call path; this page is the coding-agent path.

## Official references checked

Each recipe entry in `cc-x-recipes.json` carries its own `source` URL + date. Re-verify there before relying on a base URL or model id.
Loading