From 336534ab9f5e41564d0f29540dcd9a2f56eef0e1 Mon Sep 17 00:00:00 2001 From: Hanwen Cheng Date: Thu, 11 Jun 2026 09:50:57 +0800 Subject: [PATCH 01/17] =?UTF-8?q?feat:=20#216=20in-sandbox=20authorized=20?= =?UTF-8?q?cred-fetch=20harness=20=E2=80=94=20the=20sandbox=20agent=20pull?= =?UTF-8?q?s=20the=20vaulted=20LLM=20key=20via=20its=20granted=20scope?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The #216 gap: cred-fetch-demo/cred-wire-demo prove the chain master-self (operator==actor skips the scope check, #195) and nothing fetched from INSIDE the sandbox as the agent. This closes it across three layers: - sandbox-agent-isolation.sh: + the #216 cred half — positive (agentkeys cred fetch of the P.3-granted service, in-sandbox identity from ~/.agentkeys/harness-env) and the scope-denial negative (an un-granted probe MUST fail service_not_in_scope; any other outcome fails loud). - phase1-wire-demo.sh: 1.4b stages ~/.agentkeys/harness-env (0600) in the sandbox — also fixes the bare-shell env contract the isolation script silently lacked (it pointed at the stale :8088 MCP default); 1.4c uploads the proof script (absolute path + verified — the bare filename upload was rejected by the aiosandbox file API and curl exited 0 anyway); Phase 4.0 now fetches + plants the LLM key IN-SANDBOX as the agent (plaintext never leaves the sandbox; host-CLI fetch is the compat fallback, operator env stays the labelled DEV-only fallback). - v2-stage3-demo.sh step 18: the #216 cred-side scope triad on the granted agent — cred-fetch cap for the granted service (200), un-granted probe (ServiceNotInScope), and the CI/mock-only live REVOKE transition (setScope drops the service → the same mint is denied → restore), enforcing the '#216 revoke cuts the agent off' acceptance in CI. Also fixes upload_sandbox_isolation_test, which silently no-op'd (relative upload path + unchecked API body). Verified live: prod broker /v1/cap/cred-fetch layered errors; the 1.4b staging command (0600, 11 keys), the new script + the exact Phase 4.0 one-liner against a real aiosandbox; the fixed stage-3 upload. The chain-gated positives run in CI (software master + mock agent) and on the operator's next v2-demo.sh run (Touch ID register + pairing). Docs synced in the same change: harness/CLAUDE.md (inventory rows, sandbox role) + docs/operator-runbook-harness.md (On Sandbox, phase 3/5 proofs, two new Q&A entries). --- docs/operator-runbook-harness.md | 33 +++++++- harness/CLAUDE.md | 10 ++- harness/phase1-wire-demo.sh | 86 ++++++++++++++------ harness/scripts/sandbox-agent-isolation.sh | 88 ++++++++++++++++++++- harness/v2-stage3-demo.sh | 92 +++++++++++++++++++++- 5 files changed, 274 insertions(+), 35 deletions(-) diff --git a/docs/operator-runbook-harness.md b/docs/operator-runbook-harness.md index e059597f..ecc4a883 100644 --- a/docs/operator-runbook-harness.md +++ b/docs/operator-runbook-harness.md @@ -53,6 +53,14 @@ and run the real-agent proof (it signs with the agent's **sandbox-held** key): bash "$HOME/sandbox-agent-isolation.sh" # the REAL agent: the deferred roundtrip (steps 11-12 / 14-15), sandbox-held key ``` +The proof covers BOTH halves of the real agent: the **memory roundtrip** (cap-mint → STS +signed as the agent → worker → S3) AND the **#216 cred fetch** — the agent pulls its +authorized LLM key from the master's vault (`agentkeys cred fetch`, gated by the +`cred:` scope granted at pairing) and an **un-granted probe service is denied +with `service_not_in_scope`** (the permission gate, tested from both sides). It reads its +coordinates from `~/.agentkeys/harness-env`, which the wire phase stages (step 1.4b) — +no hand-exported env needed. + (If `v2-demo.sh` reported the wire phase **skipped — no aiosandbox**, the agent wasn't paired: set the sandbox up and re-pair with `bash harness/v2-demo.sh --from 5`. The wire demo is `--real`-only now — the in-memory `--light` path was removed, #207.) @@ -104,7 +112,10 @@ is the only one CI can't do — no aiosandbox). isolation layers, and the **scope triad** — step 16 master-self cap (operator==actor, no scope → 200), step 17 cross-actor un-granted → ServiceNotInScope, step 18 granted agent (operator≠actor, master granted the scope → 200; the positive delegation proof) — the - #195/#196 gate. **Steps 19–21 (#201)** prove the **Config** data-class isolation (master-only + #195/#196 gate. **Step 18 also runs the #216 cred-side triad**: a **cred-fetch cap** for + the granted service → 200, an un-granted probe → ServiceNotInScope, and (CI/mock only) + the **live revoke transition** — setScope drops the service, the same mint is denied, + then the scope is restored ("revoking the cred scope cuts the agent off", enforced in CI). **Steps 19–21 (#201)** prove the **Config** data-class isolation (master-only taxonomy): step 19 config creds write own `config/` prefix (200) but AccessDenied at the memory/vault buckets (+ memory creds → config bucket AccessDenied); steps 20–21 the cap data-class-mismatch (config cap ↔ memory/cred workers). These are **master-self → run on @@ -123,9 +134,12 @@ is the only one CI can't do — no aiosandbox). - **phase 5 — wire** (`phase1-wire-demo.sh --real --webauthn`): the agent inside the sandbox reads + writes its real memory through `agentkeys wire` — cap-mint → STS relay (`X-Aws-*`) → `memory.litentry.org` → S3 `bots//memory/`, passively injected each turn by the `pre_llm_call` hook. **Pairs the - §10.2 agent AND the master grants its `memory:` scope via Touch ID** (`--webauthn`; the agent's - cap service is `memory:`, so without the grant `memory.get` → `service_not_in_scope`). The - **only** real-memory proof (real-only — the in-memory `--light` path was removed, #207). + §10.2 agent AND the master grants its `memory:` + `cred:` scopes via Touch ID** + (`--webauthn`; without the grant `memory.get` / `cred fetch` → `service_not_in_scope`). **Phase 4.0 + (#216): the agent fetches its LLM key from the master's vault IN-SANDBOX** (`agentkeys cred fetch` + via the granted cred scope, coordinates from the 1.4b-staged `~/.agentkeys/harness-env`) and plants + it into Hermes without the plaintext leaving the sandbox — the operator env key is only a labelled + DEV fallback. The **only** real-memory proof (real-only — the in-memory `--light` path was removed, #207). - **phase 6 — web↔agent parity** (`web-parity-demo.sh`): boots `agentkeys-daemon --ui-bridge` (seeded with the master's J1 + device via the `--ui-bridge-seed-*` seam, so it skips re-onboarding) and plants a dedicated `webparity` probe namespace through the **web** endpoint @@ -297,6 +311,17 @@ Touch-ID phases 1-2; use `--from 3.16` to jump straight to step 16 if the sessio **Q. `DeviceNotActive` / cap-mint device mismatch?** The master isn't registered. Re-run stage 1, or `bash harness/scripts/erc4337-register-master.sh`. +**Q. `sandbox-agent-isolation.sh` says `SKIP: cred coordinates not staged`?** +The sandbox's `~/.agentkeys/harness-env` predates the #216 staging (or the wire phase didn't +finish). Re-run the wire on the operator host — `bash harness/v2-demo.sh --from 5` (or +`bash harness/phase1-wire-demo.sh --real --webauthn`) — step 1.4b rewrites the env file with +the cred coordinates + a fresh session bearer, then re-run the proof in the sandbox. + +**Q. The in-sandbox cred fetch fails with `service_not_in_scope` for the GRANTED service?** +The pairing ran without `--webauthn`, so P.3 never granted the agent's scopes (the grant is a +Touch ID ceremony). Re-run `bash harness/phase1-wire-demo.sh --real --webauthn` and approve the +prompt — the fresh pairing grants `memory:` + the cred service together. + **Q. `MalformedPolicyDocument` / empty AWS results?** Wrong profile/region: `awsp `; always pass `--region "$REGION"` (per [`../CLAUDE.md`](../CLAUDE.md)). diff --git a/harness/CLAUDE.md b/harness/CLAUDE.md index 1f285b75..dac5773f 100644 --- a/harness/CLAUDE.md +++ b/harness/CLAUDE.md @@ -223,8 +223,10 @@ Every orchestrator + the operator runbook MUST keep this split exact: **only the operator/master-side tests**; the **agent-side** steps (stage 3 11-12, signed AS the agent) **`defer`** to the sandbox — never a failure, never mocked on the operator. - **SANDBOX — the real §10.2 agent.** The agent's K10 lives in the sandbox, so the agent-side - roundtrip runs THERE (`phase1-wire-demo.sh --real` pairs it; `sandbox-agent-isolation.sh` - runs the deferred roundtrip with the sandbox-held key via `sbx_exec`). The master never + roundtrip runs THERE (`phase1-wire-demo.sh --real` pairs it + stages `~/.agentkeys/harness-env`; + `sandbox-agent-isolation.sh` runs the deferred roundtrip with the sandbox-held key via + `sbx_exec`: the memory roundtrip PLUS the #216 cred half — `agentkeys cred fetch` of the + authorized service AND the scope-denial negative for an un-granted probe). The master never signs for the agent. This is the real agent-side coverage. - **CI (`--ci`) — headless, no biometric, no sandbox.** Software register (no Touch ID), stub K11 (`WEBAUTHN_MODE=0`), the **mock agent** for the agent-side steps (the sole @@ -251,8 +253,8 @@ sandbox) is **GREEN**, never fail/incomplete. | **`v2-demo.sh`** | **THE single entry point — no flags = phases 1→2→3→4 (memory plant)→5 (wire)→6 (web↔agent parity); wire auto-runs when the aiosandbox is up, else reports INCOMPLETE + exits non-zero (an unexecuted proof is never green — pass `--wire none` to intentionally skip); fail-fast. `PHASE.STEP` addressing (`--from 4.1`, `--only 3.11`). Flags are CI/scoping only.** | (no flags) / `--ci` / `--stage N` / `--from P.S` / `--only P.S` / `--wire real\|light\|none` | | `v2-stage1-demo.sh` | M1 foundation demo | `--only-step N` | | `v2-stage2-demo.sh` | hardening demo | `--only-step N` | -| `v2-stage3-demo.sh` | OIDC + per-actor/data-class isolation proof (23 steps; 16–17 = #196 master-self + cross-actor scope; **19–21 = #201 Config data-class isolation** — master-self layer-3/4 + cap data-class-mismatch, run on the operator, `skip` until config infra is provisioned/deployed; **22 = #207 classifier-worker isolation** — master-self `cap_op_mismatch` (storage cap → classify worker) + `cap_data_class_mismatch` (cross-data-class Classify cap), compute-gate so NO STS, `skip` until the worker is deployed; **23 = cleanup + summary**). **Steps 11-12 / 14-15 sign STS creds AS the agent: on the operator they `defer` to the sandbox (the §10.2 agent key lives in the sandbox) — GREEN, never fail. `--mock-agent` (CI-only, auto-on under `--ci`) provisions a master-held DEV agent so headless CI can prove the roundtrip; a real §10.2 agent proves it in-sandbox via `phase1-wire-demo.sh --real`. When 11-12 run they ALSO assert the **#229 durable-audit receipt**: fetch response `audit_envelope_hash` → envelope fetchable from `AGENTKEYS_WORKER_AUDIT_URL`, hash = keccak256(cbor) (the appendV2/appendRootV2 anchor commitment), no plaintext — skip reasons `audit-receipt-missing` / `audit-url-unset`.** | `--from/--to/--only-step` / `--mock-agent` | -| `phase1-wire-demo.sh` | agent-side `agentkeys wire` demo (real memory only — the in-memory `--light` path was removed, #207); **phase 5 of `v2-demo.sh`** — pairs the §10.2 agent in the sandbox so the On-Sandbox proof (`sandbox-agent-isolation.sh`) can run. **v2-demo runs it `--real --webauthn`** so the master grants the agent's `memory:` scope (Touch ID); the agent's cap service is `memory:`, so without the grant `memory.get` → `service_not_in_scope`. | `--real` (default) / `--webauthn` | +| `v2-stage3-demo.sh` | OIDC + per-actor/data-class isolation proof (23 steps; 16–17 = #196 master-self + cross-actor scope; **18 = granted-agent positives — the memory cap AND the #216 cred-fetch cap for the granted service (200), an un-granted cred probe → ServiceNotInScope, and (CI/mock only) the live #216 REVOKE transition: setScope drops the service → the same cred-fetch mint is denied → restore**; **19–21 = #201 Config data-class isolation** — master-self layer-3/4 + cap data-class-mismatch, run on the operator, `skip` until config infra is provisioned/deployed; **22 = #207 classifier-worker isolation** — master-self `cap_op_mismatch` (storage cap → classify worker) + `cap_data_class_mismatch` (cross-data-class Classify cap), compute-gate so NO STS, `skip` until the worker is deployed; **23 = cleanup + summary**). **Steps 11-12 / 14-15 sign STS creds AS the agent: on the operator they `defer` to the sandbox (the §10.2 agent key lives in the sandbox) — GREEN, never fail. `--mock-agent` (CI-only, auto-on under `--ci`) provisions a master-held DEV agent so headless CI can prove the roundtrip; a real §10.2 agent proves it in-sandbox via `phase1-wire-demo.sh --real`. When 11-12 run they ALSO assert the **#229 durable-audit receipt**: fetch response `audit_envelope_hash` → envelope fetchable from `AGENTKEYS_WORKER_AUDIT_URL`, hash = keccak256(cbor) (the appendV2/appendRootV2 anchor commitment), no plaintext — skip reasons `audit-receipt-missing` / `audit-url-unset`.** | `--from/--to/--only-step` / `--mock-agent` | +| `phase1-wire-demo.sh` | agent-side `agentkeys wire` demo (real memory only — the in-memory `--light` path was removed, #207); **phase 5 of `v2-demo.sh`** — pairs the §10.2 agent in the sandbox so the On-Sandbox proof (`sandbox-agent-isolation.sh`) can run. **v2-demo runs it `--real --webauthn`** so the master grants the agent's `memory:` + `cred:$SERVICE` scopes (Touch ID); without the grant `memory.get` / `cred fetch` → `service_not_in_scope`. **Step 1.4b stages `~/.agentkeys/harness-env` (0600) in the sandbox** (MCP/broker/cred coordinates + the operator session bearer) so the in-sandbox proofs run from a bare shell, and **1.4c uploads `sandbox-agent-isolation.sh`**. **Phase 4.0 (#216) fetches the LLM key IN-SANDBOX, as the agent** (`agentkeys cred fetch` via its granted cred scope) and plants it into `~/.hermes/.env` without the plaintext leaving the sandbox; host-CLI fetch is the compat fallback, operator env the labelled DEV-only fallback. | `--real` (default) / `--webauthn` | | `web-memory-bootstrap.sh` | issue #196 web-memory pre-flight + proof; runbook [`../docs/operator-runbook-web-memory.md`](../docs/operator-runbook-web-memory.md) | `--from/--to/--only-step` | | `memory-plant-demo.sh` | plant a proof memory archive through the REAL chain + read-back (the CLI/CI proof of the plant flow the web "⊕ plant prepared memory" button drives); **phase 4 of `v2-demo.sh`**. Plants into **dedicated `demo-*` namespaces** (never the real travel/personal/family) and **always deletes them on exit** (success OR failure, EXIT trap; `KEEP_DEMO_MEMORY=1` keeps), so test memory never leaks into the master's real store — the real prepared archive is planted ONLY by the user (the button), never by a demo or onboarding. Re-testable; idempotent (`--from 4.1`). | `--from-step/--only-step N` / `--ci` | | `web-parity-demo.sh` | **phase 6 of `v2-demo.sh`** (NOT a standalone front door) — boots `agentkeys-daemon --ui-bridge` SEEDED with the master's J1 + device via the `--ui-bridge-seed-*` daemon seam (skips re-onboarding) + plants a **dedicated `webparity` probe ns** through the **web** endpoint `POST /v1/master/memory/plant`, **deleted on exit** (success or failure). A 200 proves the daemon's chain (cap-mint → STS → worker → S3) == the agent/harness chain — the web↔harness drift gate. **Step 4 (#214)** additionally polls `GET /v1/agent/pairing/pending` and asserts a well-formed `{requests:[…]}` — the master-side web-pairing route reaches the real broker rendezvous (the full claim→register e2e needs a live §10.2 agent request, exercised agent-side). Reuses phases 1-2's build/chain/broker/master (one daemon boot, no re-bootstrap); real-only. | `--from-step/--only-step N` / `--ci` | diff --git a/harness/phase1-wire-demo.sh b/harness/phase1-wire-demo.sh index 85f735e3..fa886bf2 100755 --- a/harness/phase1-wire-demo.sh +++ b/harness/phase1-wire-demo.sh @@ -767,6 +767,33 @@ phase1_sandbox() { || fail "1.4 mcp server" "did not come up — see /tmp/agentkeys-mcp.log in the sandbox" fi + # 1.4b stage the in-sandbox harness coordinates (#216). ~/.agentkeys/harness-env + # (0600) carries the MCP/broker/cred coordinates + the OPERATOR session bearer so + # a bare in-sandbox shell can run the deferred proofs (sandbox-agent-isolation.sh: + # memory via the MCP, the #216 cred fetch via the broker) and Phase 4.0 can run + # `agentkeys cred fetch` AS THE AGENT inside the sandbox. Rewritten each run (the + # session bearer is freshly minted at 0.7); the bearer already rides every + # env_pfx sbx_exec command, so a 0600 file is the stricter staging, not a new + # exposure. All values are single-quote-safe (JWT/URL/ARN/hex/lowercase names). + local henv="$SBX_HOME/.agentkeys/harness-env" + sbx_exec "umask 077; mkdir -p ~/.agentkeys; printf '%s\n' 'AGENTKEYS_MCP_URL=$MCP_URL_IN_SANDBOX' 'AGENTKEYS_MCP_VENDOR_TOKEN=$VENDOR_TOKEN' 'AGENTKEYS_ACTOR_OMNI=$ACTOR_OMNI' 'AGENTKEYS_OPERATOR_OMNI=$OPERATOR_OMNI' 'AGENTKEYS_DEVICE_KEY_HASH=$DEVICE_KEY_HASH' 'AGENTKEYS_SESSION_BEARER=$SESSION_BEARER' 'AGENTKEYS_BROKER_URL=${BROKER_URL%/}' 'AGENTKEYS_WORKER_CRED_URL=${AGENTKEYS_WORKER_CRED_URL:-}' 'VAULT_ROLE_ARN=${VAULT_ROLE_ARN:-}' 'REGION=${REGION:-us-east-1}' 'CRED_SERVICE=$SERVICE' > '$henv'" >/dev/null + if [[ "$(sbx_rc "grep -q '^AGENTKEYS_SESSION_BEARER=.' '$henv'")" == "0" ]]; then + ok "1.4b harness env" "staged $henv (0600) — MCP/broker/cred coordinates for the in-sandbox proofs" + else + fail "1.4b harness env" "could not stage $henv in the sandbox — the in-sandbox cred fetch (4.0) + bare sandbox-agent-isolation.sh runs need it" + fi + # 1.4c upload the deferred-proof script so a wire-only run (no stage 3) still + # leaves the sandbox self-testable: bash $HOME/sandbox-agent-isolation.sh + # The upload `path` MUST be absolute — the aiosandbox file API rejects a bare + # filename with `[Errno 2] ... ''` (verified live on v1.0.0.152) — and sbx_put's + # returned file_path is the only success signal (curl exits 0 on API failure). + local iso_dst="$SBX_HOME/sandbox-agent-isolation.sh" + if [[ "$(sbx_put "$REPO_ROOT/harness/scripts/sandbox-agent-isolation.sh" "$iso_dst" 2>/dev/null)" == "$iso_dst" ]]; then + ok "1.4c isolation test" "uploaded sandbox-agent-isolation.sh — run IN the sandbox: bash \$HOME/sandbox-agent-isolation.sh" + else + skip "1.4c isolation test" "upload failed (non-fatal) — stage 3 also uploads it" + fi + # 1.5 seed the real memory worker. The # agent reads this back in Act 1. Idempotent + scope-aware + --webauthn-gated: # a. namespace already has content → skip everything (no Touch ID); @@ -1058,32 +1085,43 @@ phase4_surprise() { log "Phase 4 — the surprise (real Hermes session in the sandbox)" # 4.0 #216: the agent's LLM key comes from the MASTER'S VAULT (cred-fetch via its - # authorized cred scope), NOT an ambient operator env. Resolve VAULT-FIRST; the - # $OPENROUTER_API_KEY/$LLM_API_KEY env is a DEV-ONLY fallback (clearly labelled). - # The full vault chain is proven headless (master-self) by harness/cred-wire-demo.sh; - # the agent-identity fetch here additionally needs (a) the cred scope granted at - # pairing (P.3 SEED_SCOPE_SERVICES, --webauthn) and (b) the key already vaulted. - local WIRE_KEY="" WIRE_KEY_SRC="" _host_cli="" - if [[ -x "$REPO_ROOT/target/release/agentkeys" ]]; then _host_cli="$REPO_ROOT/target/release/agentkeys" - elif [[ -x "$REPO_ROOT/target/debug/agentkeys" ]]; then _host_cli="$REPO_ROOT/target/debug/agentkeys" - else _host_cli="$(command -v agentkeys 2>/dev/null || true)"; fi - if [[ -n "$_host_cli" && -n "${AGENTKEYS_WORKER_CRED_URL:-}" && -n "${VAULT_ROLE_ARN:-}" \ - && -n "$SESSION_BEARER" && -n "$ACTOR_OMNI" && -n "$OPERATOR_OMNI" && -n "$DEVICE_KEY_HASH" ]]; then - local _fetched - if _fetched="$("$_host_cli" cred fetch "$SERVICE" \ - --operator-omni "$OPERATOR_OMNI" --actor-omni "$ACTOR_OMNI" \ - --device-key-hash "$DEVICE_KEY_HASH" --session-bearer "$SESSION_BEARER" \ - --broker-url "${BROKER_URL%/}" --cred-url "${AGENTKEYS_WORKER_CRED_URL}" \ - --vault-role-arn "${VAULT_ROLE_ARN}" --region "${REGION:-us-east-1}" 2>/dev/null)" \ - && [[ -n "$_fetched" ]]; then - WIRE_KEY="$_fetched"; WIRE_KEY_SRC="the master's VAULT (cred:$SERVICE — #216, the agent's authorized key)" + # authorized cred scope), NOT an ambient operator env — and the fetch runs IN THE + # SANDBOX, by the agent itself: `agentkeys cred fetch` with the 1.4b-staged + # coordinates (the broker checks the cred:$SERVICE scope the master granted at + # P.3), planted straight into ~/.hermes/.env so the plaintext never leaves the + # sandbox (only its sha returns for the log). Fallbacks, clearly labelled: + # (b) host-CLI agent-identity fetch (compat — a stale sandbox binary without the + # `cred` subcommand), (c) $OPENROUTER_API_KEY/$LLM_API_KEY env (DEV-ONLY). The + # headless complement is harness/cred-wire-demo.sh; the standalone in-sandbox + # proof (incl. the scope-denial negative) is sandbox-agent-isolation.sh. + local WIRE_KEY="" WIRE_KEY_SRC="" WIRE_PLANTED=false _host_cli="" _insbx + _insbx="$(sbx_exec "set -a; . ~/.agentkeys/harness-env 2>/dev/null; set +a; export PATH=\$HOME/.local/bin:\$PATH; k=\$(agentkeys cred fetch '$SERVICE' 2>/tmp/cred-fetch.err); if [ -n \"\$k\" ]; then mkdir -p \$HOME/.hermes; ENV=\$HOME/.hermes/.env; touch \"\$ENV\"; grep -v '^OPENROUTER_API_KEY=' \"\$ENV\" > \"\$ENV.tmp\" 2>/dev/null; printf 'OPENROUTER_API_KEY=%s\n' \"\$k\" >> \"\$ENV.tmp\"; mv \"\$ENV.tmp\" \"\$ENV\"; printf 'PLANTED sha=%s len=%s' \"\$(printf %s \"\$k\" | sha256sum 2>/dev/null | cut -c1-12)\" \"\${#k}\"; else tail -c 200 /tmp/cred-fetch.err 2>/dev/null; fi")" + if [[ "$_insbx" == PLANTED* ]]; then + WIRE_PLANTED=true + WIRE_KEY_SRC="the master's VAULT — fetched + planted IN-SANDBOX by the agent (cred:$SERVICE, #216; ${_insbx#PLANTED })" + else + [[ -n "$_insbx" ]] && log " 4.0 in-sandbox vault fetch unavailable ($(echo "$_insbx" | tr '\n' ' ' | cut -c1-160)) — trying the host-CLI fetch" + if [[ -x "$REPO_ROOT/target/release/agentkeys" ]]; then _host_cli="$REPO_ROOT/target/release/agentkeys" + elif [[ -x "$REPO_ROOT/target/debug/agentkeys" ]]; then _host_cli="$REPO_ROOT/target/debug/agentkeys" + else _host_cli="$(command -v agentkeys 2>/dev/null || true)"; fi + if [[ -n "$_host_cli" && -n "${AGENTKEYS_WORKER_CRED_URL:-}" && -n "${VAULT_ROLE_ARN:-}" \ + && -n "$SESSION_BEARER" && -n "$ACTOR_OMNI" && -n "$OPERATOR_OMNI" && -n "$DEVICE_KEY_HASH" ]]; then + local _fetched + if _fetched="$("$_host_cli" cred fetch "$SERVICE" \ + --operator-omni "$OPERATOR_OMNI" --actor-omni "$ACTOR_OMNI" \ + --device-key-hash "$DEVICE_KEY_HASH" --session-bearer "$SESSION_BEARER" \ + --broker-url "${BROKER_URL%/}" --cred-url "${AGENTKEYS_WORKER_CRED_URL}" \ + --vault-role-arn "${VAULT_ROLE_ARN}" --region "${REGION:-us-east-1}" 2>/dev/null)" \ + && [[ -n "$_fetched" ]]; then + WIRE_KEY="$_fetched"; WIRE_KEY_SRC="the master's VAULT (cred:$SERVICE — #216, the agent's authorized key; host-CLI fetch — update the sandbox binary for the in-sandbox path)" + fi fi fi - if [[ -z "$WIRE_KEY" && -n "$LLM_API_KEY" ]]; then + if [[ -z "$WIRE_KEY" && "$WIRE_PLANTED" != true && -n "$LLM_API_KEY" ]]; then WIRE_KEY="$LLM_API_KEY" WIRE_KEY_SRC="operator env \$OPENROUTER_API_KEY (DEV fallback — vault cred:$SERVICE unavailable; #216 wants the vault: grant the cred scope + vault the key, see harness/cred-wire-demo.sh)" fi - if [[ -z "$WIRE_KEY" ]]; then + if [[ -z "$WIRE_KEY" && "$WIRE_PLANTED" != true ]]; then skip "4.0 hermes llm" "no LLM key — neither a vaulted cred:$SERVICE (the #216 path; proven by harness/cred-wire-demo.sh) nor \$OPENROUTER_API_KEY (dev fallback). Skipping the surprise." return fi @@ -1106,7 +1144,11 @@ phase4_surprise() { # be single-line: the sandbox /v1/shell/exec rejects multi-line payloads with # a silent ErrorObservation. Verified (not masked with || true). local env_path='$HOME/.hermes/.env' - sbx_exec "ENV=$env_path; grep -v '^OPENROUTER_API_KEY=' \"\$ENV\" > \"\$ENV.tmp\" 2>/dev/null; printf 'OPENROUTER_API_KEY=%s\n' $(printf '%q' "$WIRE_KEY") >> \"\$ENV.tmp\"; mv \"\$ENV.tmp\" \"\$ENV\"" >/dev/null + # In-sandbox path (4.0a) already planted the key without it leaving the sandbox; + # only the host-fetched / dev-fallback key needs writing from here. + if [[ "$WIRE_PLANTED" != true ]]; then + sbx_exec "ENV=$env_path; grep -v '^OPENROUTER_API_KEY=' \"\$ENV\" > \"\$ENV.tmp\" 2>/dev/null; printf 'OPENROUTER_API_KEY=%s\n' $(printf '%q' "$WIRE_KEY") >> \"\$ENV.tmp\"; mv \"\$ENV.tmp\" \"\$ENV\"" >/dev/null + fi if [[ "$(sbx_rc "grep -q '^OPENROUTER_API_KEY=' $env_path")" != "0" ]]; then fail "4.0 hermes llm" "could not write OPENROUTER_API_KEY to ~/.hermes/.env"; return fi diff --git a/harness/scripts/sandbox-agent-isolation.sh b/harness/scripts/sandbox-agent-isolation.sh index cb2bed45..4344d4f6 100755 --- a/harness/scripts/sandbox-agent-isolation.sh +++ b/harness/scripts/sandbox-agent-isolation.sh @@ -7,18 +7,37 @@ # → memory worker → S3 bots//memory/. The stage-3 MOCK uses a master-held # key (worker plumbing only); THIS uses the genuine sandbox-held key (the real agent). # +# PLUS the #216 cred half: the agent fetches its AUTHORIZED LLM credential from the +# master's vault (`agentkeys cred fetch` → cap-mint cred-fetch → STS → cred worker → +# decrypt), gated by the `cred:` scope the master granted at pairing (P.3). +# • POSITIVE — the granted service round-trips (non-empty secret; only its +# length + sha prefix are logged, never the value). +# • NEGATIVE — an UN-granted probe service MUST be denied with +# service_not_in_scope: the broker's isServiceInScope gate stands between the +# sandbox agent and the vault. Any other outcome (success, or a different +# error) is a FAIL — the permission gate is the thing under test. +# # Prereqs — set up by `bash harness/phase1-wire-demo.sh --real` (run on the operator # host first): the `agentkeys` binary + the §10.2-paired agent device-session + the -# MCP server, all inside the sandbox. The stage-3 script UPLOADS this file to the -# sandbox automatically (to $HOME/sandbox-agent-isolation.sh); you just run it here. +# MCP server + the 0600 env file `~/.agentkeys/harness-env` (step 1.4b stages the +# MCP/broker/cred coordinates + the session bearer), all inside the sandbox. Both +# phase1 and the stage-3 script UPLOAD this file to the sandbox automatically (to +# $HOME/sandbox-agent-isolation.sh); you just run it here. # # bash "$HOME/sandbox-agent-isolation.sh" [namespace] set -uo pipefail +# Coordinates staged by phase1-wire-demo.sh step 1.4b (0600). Without it the +# CLI falls back to whatever AGENTKEYS_* the shell already exports. +HARNESS_ENV="${HARNESS_ENV:-$HOME/.agentkeys/harness-env}" +if [ -f "$HARNESS_ENV" ]; then set -a; . "$HARNESS_ENV"; set +a; fi + NS="${1:-${MEMORY_NS:-travel}}" AGENT_BIN="${AGENT_BIN:-$(command -v agentkeys 2>/dev/null || echo "$HOME/.local/bin/agentkeys")}" [ -x "$AGENT_BIN" ] || { echo "FAIL: no agentkeys binary in the sandbox ($AGENT_BIN) — run 'phase1-wire-demo.sh --real' on the operator host first." >&2; exit 1; } +sha_hex() { { command -v sha256sum >/dev/null 2>&1 && printf '%s' "$1" | sha256sum || printf '%s' "$1" | shasum -a 256; } | awk '{print $1}'; } + content="sandbox-isolation-proof-$$-$(date +%s 2>/dev/null || echo n)" echo "== §10.2 agent isolation — the agent signs with its SANDBOX-held key (not the master) ==" >&2 @@ -42,4 +61,67 @@ fi # enforced at the IAM layer and is already proven by stage-3 steps 4-9. The agent CLI # has no way to target another actor's prefix, so this script proves the POSITIVE path # (the real agent works) — the negative path is the master/mock IAM test in stage 3. -echo "== PASS: tested against the sandbox (the real agent), not the master-held mock. ==" >&2 + +# ─── #216 cred half — the sandbox agent fetches its AUTHORIZED LLM key ─────── +CRED_SERVICE="${CRED_SERVICE:-openrouter}" +CRED_NEGATIVE_SERVICE="${CRED_NEGATIVE_SERVICE:-cred-ungranted-probe}" +echo "== #216 cred fetch — the sandbox agent pulls cred:$CRED_SERVICE from the master's vault ==" >&2 + +if [ -z "${AGENTKEYS_WORKER_CRED_URL:-}" ] || [ -z "${VAULT_ROLE_ARN:-}" ] || [ -z "${AGENTKEYS_SESSION_BEARER:-}" ]; then + echo "SKIP: cred coordinates not staged (need AGENTKEYS_WORKER_CRED_URL + VAULT_ROLE_ARN + AGENTKEYS_SESSION_BEARER in $HARNESS_ENV)." >&2 + echo " Re-run 'bash harness/phase1-wire-demo.sh --real --webauthn' on the operator host — step 1.4b stages them." >&2 + echo "== PARTIAL PASS: memory proven; #216 cred half SKIPPED (stale wire staging — re-stage and re-run). ==" >&2 + exit 0 +fi + +# POSITIVE: fetch the service the master authorized at pairing (P.3 grants the bare +# cred service alongside memory:). Identity/session come from harness-env via +# the CLI's clap env fallbacks; the secret never hits argv or the log. +cred_err="$(mktemp 2>/dev/null || echo "/tmp/cred-fetch.$$.err")" +if fetched="$("$AGENT_BIN" cred fetch "$CRED_SERVICE" 2>"$cred_err")" && [ -n "$fetched" ]; then + echo "OK: agent fetched cred:$CRED_SERVICE from the vault IN-SANDBOX (len=${#fetched}, sha $(sha_hex "$fetched" | cut -c1-12)…) — authorized scope honoured." >&2 + if [ -n "${EXPECTED_CRED_SHA256:-}" ]; then + if [ "$(sha_hex "$fetched")" = "$EXPECTED_CRED_SHA256" ]; then + echo "OK: fetched secret sha == EXPECTED_CRED_SHA256 — the exact master-vaulted value round-tripped." >&2 + else + echo "FAIL: fetched secret sha ($(sha_hex "$fetched")) != EXPECTED_CRED_SHA256 — vault round-trip mismatch." >&2 + rm -f "$cred_err"; exit 1 + fi + fi + # Informational: compare against the key the wire planted into Hermes. Match ⇒ + # Hermes runs on exactly what this agent can independently fetch (vault-wired); + # mismatch is NOT a failure (the wire may have used its labelled dev fallback). + planted="$(grep '^OPENROUTER_API_KEY=' "$HOME/.hermes/.env" 2>/dev/null | head -1 | sed 's/^OPENROUTER_API_KEY=//')" + if [ -n "$planted" ]; then + [ "$(sha_hex "$planted")" = "$(sha_hex "$fetched")" ] \ + && echo "OK: the key wired into Hermes == this fetch — Hermes runs on the vault key." >&2 \ + || echo "note: Hermes' planted key differs from this fetch — the wire likely used the dev-fallback env key (see phase1-wire-demo Phase 4.0)." >&2 + fi +else + err="$(tr '\n' ' ' <"$cred_err" | cut -c1-300)" + rm -f "$cred_err" + if printf '%s' "$err" | grep -qiE 'not.*in.*scope|NotInScope|service_not_in_scope'; then + echo "FAIL: cred:$CRED_SERVICE is NOT granted to this agent — the master must authorize it at pairing. Re-run 'bash harness/phase1-wire-demo.sh --real --webauthn' (P.3 grants it, Touch ID). broker: $err" >&2 + else + echo "FAIL: agent cred fetch errored (not a scope denial): $err" >&2 + fi + exit 1 +fi + +# NEGATIVE: an un-granted probe service MUST be DENIED by the broker scope gate +# (isServiceInScope(operator, agent, probe) == false → service_not_in_scope at +# cap-mint). A success here means the permission gate is broken — fail loud. +if neg_out="$("$AGENT_BIN" cred fetch "$CRED_NEGATIVE_SERVICE" 2>"$cred_err")" && [ -n "$neg_out" ]; then + echo "FAIL: agent fetched UN-granted cred:$CRED_NEGATIVE_SERVICE — the scope gate did not deny an unauthorized service!" >&2 + rm -f "$cred_err"; exit 1 +fi +neg_err="$(tr '\n' ' ' <"$cred_err" | cut -c1-300)" +rm -f "$cred_err" +if printf '%s' "$neg_err" | grep -qiE 'not.*in.*scope|NotInScope|service_not_in_scope'; then + echo "OK: un-granted cred:$CRED_NEGATIVE_SERVICE denied with service_not_in_scope — the authorization gate stands between the agent and the vault." >&2 +else + echo "FAIL: un-granted fetch was rejected for the WRONG reason (want service_not_in_scope): $neg_err" >&2 + exit 1 +fi + +echo "== PASS: tested against the sandbox (the real agent), not the master-held mock — memory roundtrip + #216 authorized cred fetch + scope-denial negative. ==" >&2 diff --git a/harness/v2-stage3-demo.sh b/harness/v2-stage3-demo.sh index 81fae580..fed21840 100755 --- a/harness/v2-stage3-demo.sh +++ b/harness/v2-stage3-demo.sh @@ -604,9 +604,21 @@ upload_sandbox_isolation_test() { local script="$REPO_ROOT/harness/scripts/sandbox-agent-isolation.sh" [ -f "$script" ] || return 0 curl -fsS --max-time 8 "$sbx/healthz" >/dev/null 2>&1 || curl -fsS --max-time 8 "$sbx/v1/sandbox" >/dev/null 2>&1 || return 0 - if curl -sS --max-time 30 -X POST "$sbx/v1/file/upload" -F "file=@$script" -F "path=sandbox-agent-isolation.sh" >/dev/null 2>&1; then + # The upload `path` MUST be absolute (a bare filename gets `[Errno 2] ... ''` + # from the aiosandbox file API — verified live on v1.0.0.152, where this + # function silently no-op'd for months because curl exits 0 on an API-level + # failure). Resolve the sandbox $HOME via the shell API and CHECK the body. + local sbx_home resp + sbx_home=$(curl -sS --max-time 15 -X POST "$sbx/v1/shell/exec" -H 'content-type: application/json' \ + -d '{"command":"printf %s \"$HOME\""}' 2>/dev/null | jq -r '.data.output // empty') || sbx_home="" + [ -n "$sbx_home" ] || return 0 + resp=$(curl -sS --max-time 30 -X POST "$sbx/v1/file/upload" -F "file=@$script" \ + -F "path=$sbx_home/sandbox-agent-isolation.sh" 2>/dev/null) || resp="" + if echo "$resp" | jq -e '(.success // .data.success) == true' >/dev/null 2>&1; then SANDBOX_TEST_UPLOADED=1 info "uploaded sandbox-agent-isolation.sh → the sandbox ($sbx). REAL agent test (sandbox-held key) runs THERE: bash \$HOME/sandbox-agent-isolation.sh" + else + info "sandbox-agent-isolation.sh upload to $sbx did NOT confirm ($(echo "$resp" | tr '\n' ' ' | cut -c1-120)) — run the wire phase (its 1.4c uploads it) or copy it in manually" fi } @@ -1250,8 +1262,12 @@ fi # is honoured (delegation works). Stands ALONE (no STS/worker roundtrip): the cap-mint # is operator-authenticated (mint_cap sends session.jwt), so it needs NO agent key — # only the agent's on-chain device + the grant. +# PLUS the #216 cred-side triad on the same identity: a CRED-FETCH cap for the +# granted service → 200, an un-granted probe → ServiceNotInScope, and (CI/mock +# only) the live REVOKE transition (setScope drops the service → the same mint is +# denied → restore) — "revoking the cred scope cuts the agent off", enforced. if should_run_step 18; then - step "POSITIVE: granted agent (operator!=actor) mints memory cap for the GRANTED service → 200" + step "POSITIVE: granted agent (operator!=actor) mints memory + #216 cred-fetch caps for the GRANTED service → 200 (+ un-granted/revoked denials)" [ -f "$STATE_DIR/session.jwt" ] || die "no session.jwt — re-run step 1" # CI mocks the §10.2 agent with a master-held, scope-granted dev agent; the operator's # real agent carries its device + grant on chain (stage-1 / sandbox pairing). @@ -1280,6 +1296,78 @@ if should_run_step 18; then if [ "$rc" = "200" ]; then ok "granted agent (actor $pg_actor != operator 0x$OWN_ACTOR_OMNI) minted a memory cap for delegated service '$SMOKE_SERVICE' — isServiceInScope honoured" record_ok "granted-agent positive: memory cap minted for delegated service '$SMOKE_SERVICE' (operator!=actor, HTTP 200)" + # ── #216 cred-side scope triad — same agent identity, the CRED-FETCH cap + # route (the cap `agentkeys cred fetch` mints so the agent can pull its + # vaulted LLM key): + # (i) granted service → 200 (the broker authorizes the delegated fetch) + # (ii) un-granted probe → ServiceNotInScope (an agent can't reach vault + # entries the master never authorized) + # (iii) CI-only REVOKE transition — setScope WITHOUT the service → the + # same mint now denied (no stale scope caching), then restore. + # The operator path runs (i)+(ii) only: mutating scope is a Touch ID + # ceremony, and the live denial predicate is covered in-sandbox by + # sandbox-agent-isolation.sh's scope-denial negative. + rc=$(mint_cap cred-fetch "$pg_body") + body=$(cat /tmp/cap.$$.json 2>/dev/null || true); rm -f /tmp/cap.$$.json + if [ "$rc" = "200" ]; then + ok "#216 cred-fetch cap minted for GRANTED service '$SMOKE_SERVICE' (operator!=actor) — the agent may fetch its authorized cred" + record_ok "#216 cred-fetch cap positive (granted service, HTTP 200)" + else + die "#216 cred-fetch cap for the GRANTED service '$SMOKE_SERVICE' returned HTTP $rc — body: $body" + fi + neg_svc="${CRED_NEGATIVE_SERVICE:-cred-ungranted-probe}" + neg_body=$(jq -n --arg op "0x$OWN_ACTOR_OMNI" --arg actor "$pg_actor" \ + --arg svc "$neg_svc" --arg dkh "$pg_dkh" \ + '{operator_omni:$op, actor_omni:$actor, service:$svc, device_key_hash:$dkh}') + rc=$(mint_cap cred-fetch "$neg_body") + body=$(cat /tmp/cap.$$.json 2>/dev/null || true); rm -f /tmp/cap.$$.json + if [ "$rc" = "200" ]; then + die "#216 REGRESSION: cred-fetch cap minted for UN-granted service '$neg_svc' — the scope gate did not deny an unauthorized cred" + elif echo "$body" | grep -qiE "not.*scope|NotInScope|service_not_in_scope"; then + ok "#216 cred-fetch cap for un-granted '$neg_svc' denied with ServiceNotInScope" + record_ok "#216 cred-fetch cap negative (un-granted service rejected)" + else + die "#216 cred-fetch negative returned unexpected HTTP $rc (want ServiceNotInScope) — body: $body" + fi + if [ "$MOCK_AGENT" = 1 ]; then + # (iii) #216 acceptance: "revoke the cred scope → the agent loses the + # key". setScope REPLACES the services list (set-replace), so revoke = + # re-set without the service; restore = re-set with it. heima-scope-set + # is idempotent + master-model-aware (#250: account masters route via + # erc4337-master-exec — the CI software passkey signs headlessly). + # Self-healing: if the restore is interrupted, the next run's + # ensure_mock_agent re-grants (its getScope pre-check sees the drift). + pg_label=$(jq -r '.label // "demo-agent-dev"' "$pg_file") + profile_uc=$(printf '%s' "${AGENTKEYS_CHAIN:-heima}" | tr 'a-z-' 'A-Z_') + scope_addr=$(eval "echo \${SCOPE_CONTRACT_ADDRESS_${profile_uc}:-}") + if [ -z "$scope_addr" ] || [ "$scope_addr" = 0x0 ]; then + prereq_missing scope-not-set "no AgentKeysScope address in env — cannot run the #216 revoke transition" || true + else + revoke_json=$(bash "$REPO_ROOT/scripts/heima-scope-set.sh" --agent "$pg_label" \ + --services "stage3-cred-revoked-placeholder" --scope-address "$scope_addr" | tail -1) || revoke_json="" + if ! echo "$revoke_json" | jq -e '.ok==true and ((.skipped // "") == "" or .skipped=="already-set")' >/dev/null 2>&1; then + prereq_missing scope-not-set "#216 revoke transition: setScope (revoke) did not land — $(echo "$revoke_json" | tr '\n' ' ' | cut -c1-160)" || true + else + rc=$(mint_cap cred-fetch "$pg_body") + body=$(cat /tmp/cap.$$.json 2>/dev/null || true); rm -f /tmp/cap.$$.json + if [ "$rc" = "200" ]; then + die "#216 REGRESSION: cred-fetch cap STILL minted after the scope was revoked — stale scope state at the broker" + elif echo "$body" | grep -qiE "not.*scope|NotInScope|service_not_in_scope"; then + ok "#216 revoke transition: after setScope dropped '$SMOKE_SERVICE', the same cred-fetch mint is denied (ServiceNotInScope) — revoke cuts the agent off" + record_ok "#216 revoke transition (revoked service rejected live)" + else + die "#216 post-revoke cred-fetch returned unexpected HTTP $rc — body: $body" + fi + restore_json=$(bash "$REPO_ROOT/scripts/heima-scope-set.sh" --agent "$pg_label" \ + --services "$SMOKE_SERVICE" --scope-address "$scope_addr" | tail -1) || restore_json="" + echo "$restore_json" | jq -e '.ok==true' >/dev/null 2>&1 \ + && info "#216 revoke transition: scope restored to '$SMOKE_SERVICE' for '$pg_label'" \ + || info "#216 revoke transition: restore did not confirm (next run's ensure_mock_agent self-heals) — $(echo "$restore_json" | tr '\n' ' ' | cut -c1-120)" + fi + fi + else + info "#216 revoke transition: operator run — skipped (scope mutation is a Touch ID ceremony); the denial predicate is covered by (ii) here and by the in-sandbox negative" + fi elif echo "$body" | grep -qiE "not.*scope|NotInScope|service_not_in_scope"; then prereq_missing scope-not-set "agent scope for '$SMOKE_SERVICE' not granted on chain — run \`bash harness/v2-stage1-demo.sh --webauthn\` (step 13 setScope) first. body: $body" || true elif echo "$body" | grep -qiE "DeviceNotActive|device.*not.*active|DeviceBindingMismatch|binding.*mismatch|DeviceRoleMissing|role_missing"; then From 130fc725fd7872c75c3476ee2cb500a03fd56312 Mon Sep 17 00:00:00 2001 From: Hanwen Cheng Date: Thu, 11 Jun 2026 12:10:39 +0800 Subject: [PATCH 02/17] =?UTF-8?q?feat:=20#216=20v2-demo=20phase=205=20runs?= =?UTF-8?q?=20in=20CI=20=E2=80=94=20mock-wire-demo.sh=20emulates=20the=20a?= =?UTF-8?q?iosandbox=20side=20on=20the=20runner?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit CI used to set --wire none, so the entire post-wire agent runtime (the MCP server + agentkeys cred fetch + the hook path) ran nowhere headless — stage-3 steps 11-12 cover raw worker curls, not the runtime the sandbox agent actually uses. - NEW harness/mock-wire-demo.sh (phase 5 under --ci / --wire mock): ensure the sanctioned mock agent + the canonical scope grant → mint operator + agent sessions headless (wallet_sig SIWE) → boot the REAL agentkeys-mcp-server on localhost (http backend + per-actor STS relay, the phase1 1.4 shape) → master-self vault a probe cred under the DEDICATED mock-wire-llm service (never openrouter — can't clobber a real vault entry) → run THE SAME sandbox-agent-isolation.sh the sandbox runs, with EXPECTED_CRED_SHA256: memory roundtrip through the MCP + the #216 authorized cred fetch (sha-exact) + the un-granted scope denial. Key custody stays operator-only by design. - v2-demo.sh: --wire gains 'mock' (CI default; 'none' stays the explicit off), WIRE_RESULT=mocked in the summary. - v2-stage3-demo.sh: ONE canonical mock-agent grant MOCK_SCOPE_SERVICES (openrouter + memory:ci-wire-proof + mock-wire-llm), used by ensure_mock_agent AND the step-18 revoke restore — phases 3 and 5 no longer flip-flop setScope every CI run (set-replace semantics). - _lib.sh: + wallet_sig_mint_jwt (the shared headless SIWE session primitive; temp-file based — jq chokes on multi-line SIWE in vars). - harness-ci.yml: comments/step name now say phases 1-6 with phase 5 mocked (the workflow already passes --ci; behavior switches with the new default). The build step already builds agentkeys-mcp-server. Verified locally: bash -n all, YAML parse, fixture gate green, the MCP server boots with the exact arg shape (healthz ok), and 'v2-demo --stage 5 --wire mock' dispatches preflight → mock-wire-demo → fails LOUD at the no-master chain gate (this laptop's registry has no master — correct; CI's software master is registered). Docs synced: harness/CLAUDE.md (rule 5, CI role, inventory rows) + operator-runbook-harness.md (On CI, flag table, phase-5 bullet, role mapping). --- .github/workflows/harness-ci.yml | 23 ++-- docs/operator-runbook-harness.md | 26 ++-- harness/CLAUDE.md | 21 ++- harness/mock-wire-demo.sh | 216 +++++++++++++++++++++++++++++++ harness/scripts/_lib.sh | 37 ++++++ harness/v2-demo.sh | 31 +++-- harness/v2-stage3-demo.sh | 14 +- 7 files changed, 332 insertions(+), 36 deletions(-) create mode 100755 harness/mock-wire-demo.sh diff --git a/.github/workflows/harness-ci.yml b/.github/workflows/harness-ci.yml index 08cf4ec2..5e21465a 100644 --- a/.github/workflows/harness-ci.yml +++ b/.github/workflows/harness-ci.yml @@ -1,9 +1,11 @@ name: harness CI (no LLM) # Issue #66: deterministic, no-LLM, no-WebAuthn CI that runs the SAME -# production harness orchestrator (harness/v2-demo.sh --ci → phases 1-4 + 6; -# phase 5/wire is the only phase CI can't run — no aiosandbox — so --ci sets -# --wire none) against a parallel TEST instance of the production environment. +# production harness orchestrator (harness/v2-demo.sh --ci → phases 1-6; +# phase 5/wire has no aiosandbox here, so --ci sets --wire mock — +# mock-wire-demo.sh emulates the sandbox side ON the runner: the real MCP +# server + the #216 cred fetch via the master-held mock agent) against a +# parallel TEST instance of the production environment. # (Was three separate v2-stage{1,2,3}-demo.sh steps; switched to the whole # orchestrator so CI also covers phase 4 (memory-plant) + phase 6 (web-parity — # the daemon web-chain runtime proof the #200 restructure added but never wired @@ -920,21 +922,26 @@ jobs: AGENTKEYS_HEALTH_OPTIONAL="config classify" \ bash scripts/wait-stack-healthy.sh - - name: v2-demo on Heima mainnet — phases 1-4 + 6 (wire/phase-5 skipped, no sandbox) + - name: v2-demo on Heima mainnet — phases 1-6 (phase 5 = mock-sandbox wire on the runner) # Run the WHOLE orchestrator (harness/v2-demo.sh) rather than the three # v2-stage{1,2,3}-demo.sh in isolation, so CI also covers the phases the # #200 v2-demo restructure added but never wired into CI: # - phase 4 (memory-plant-demo.sh): the master plants its own memory # through the real chain + read-back. + # - phase 5 (mock-wire-demo.sh via `--wire mock`, auto under --ci): the + # aiosandbox side EMULATED on the runner — boots the real + # agentkeys-mcp-server with the master-held mock agent's identity, + # master-self vaults a probe cred, then runs the SAME + # sandbox-agent-isolation.sh the sandbox runs (memory via MCP + the + # #216 authorized cred fetch + the un-granted scope denial). Proves + # the post-wire agent RUNTIME headless; the real §10.2 sandbox key + # custody remains operator-only. # - phase 6 (web-parity-demo.sh): the daemon's WEB endpoint # POST /v1/master/memory/plant → real chain (cap-mint → STS → worker # → S3). This is the ONLY runtime proof of the parent-control app's # path — stage 3 exercises the CLI/curl path, not the daemon ui-bridge. # (The #203 check-web-api-drift.sh gate in rust-checks covers its # SHAPE at compile/fixture time; THIS step covers runtime reachability.) - # Phase 5 (wire) is the §10.2 agent inside the aiosandbox — CI is headless - # with no sandbox, so `--ci` auto-sets `--wire none` (the one phase that - # genuinely can't run here). # # `--ci` threads to every phase (v2-demo run_phase): stage 1 auto-skips # deploy/email/provision (PRE-PROVISIONED infra — contracts pinned in @@ -966,7 +973,7 @@ jobs: ARGS=(--ci --allow-skip=scope-not-set,config-role-missing,config-worker-unreachable,classify-not-configured,classify-worker-unavailable) case "${STAGE:-}" in 1|2|3) ARGS+=(--stage "$STAGE") ;; - *) ;; # all / empty → full phases 1-4 + 6 (phase 5/wire auto-skipped) + *) ;; # all / empty → full phases 1-6 (phase 5 = the mock-sandbox wire) esac AGENTKEYS_CHAIN=heima bash harness/v2-demo.sh "${ARGS[@]}" diff --git a/docs/operator-runbook-harness.md b/docs/operator-runbook-harness.md index ecc4a883..db64608f 100644 --- a/docs/operator-runbook-harness.md +++ b/docs/operator-runbook-harness.md @@ -71,16 +71,21 @@ CI has **no Touch ID and no sandbox**, so one flag switches to the software regi a **mock** agent + tolerate-skips: ```bash -bash harness/v2-demo.sh --ci # software register, mock agent, tolerate prereq skips; wire OFF (no sandbox in CI) +bash harness/v2-demo.sh --ci # software register, mock agent, tolerate prereq skips; wire MOCKED on the runner ``` `--ci` (or the runner's `$CI`) ⇒ `--signer software` + `--mock-agent` + `--allow-skip` semantics + **stage-1 auto-skips deploy/email/provision** (CI runs against pre-provisioned infra — contracts pinned in secrets, identity via wallet_sig, the -vault/memory buckets+roles an operator one-shot). The mock agent tests the worker -**plumbing only** — not the real §10.2 agent (that's the sandbox run above). This is -exactly what `harness-ci.yml` runs: `v2-demo.sh --ci` → phases 1–4 + 6 (phase 5/wire -is the only one CI can't do — no aiosandbox). +vault/memory buckets+roles an operator one-shot). This is exactly what +`harness-ci.yml` runs: `v2-demo.sh --ci` → **phases 1–6, with phase 5 MOCKED**: +`mock-wire-demo.sh` emulates the aiosandbox side ON the runner — it boots the real +`agentkeys-mcp-server` with the mock agent's identity, master-self vaults a probe +cred, and runs the **same `sandbox-agent-isolation.sh`** the sandbox runs (memory +roundtrip through the MCP + the #216 authorized cred fetch, sha-exact, + the +un-granted scope denial). The mock proves the post-wire agent **runtime**, not the +real §10.2 **key custody** — the agent key is master-held; the sandbox run above +remains the only proof that the key never leaves the sandbox. --- @@ -140,6 +145,9 @@ is the only one CI can't do — no aiosandbox). via the granted cred scope, coordinates from the 1.4b-staged `~/.agentkeys/harness-env`) and plants it into Hermes without the plaintext leaving the sandbox — the operator env key is only a labelled DEV fallback. The **only** real-memory proof (real-only — the in-memory `--light` path was removed, #207). + **On CI** phase 5 runs as `mock-wire-demo.sh` instead (`--wire mock`, auto under `--ci`): the runner + emulates the sandbox side with the master-held mock agent and runs the same proof script — the + post-wire runtime headless, every PR. - **phase 6 — web↔agent parity** (`web-parity-demo.sh`): boots `agentkeys-daemon --ui-bridge` (seeded with the master's J1 + device via the `--ui-bridge-seed-*` seam, so it skips re-onboarding) and plants a dedicated `webparity` probe namespace through the **web** endpoint @@ -228,7 +236,8 @@ on-chain anchor itself is exercised by `scripts/heima-worker-smoke.sh` (stage-2 | Flag | Effect | |---|---| -| `--ci` | software register + auto `--mock-agent` + tolerate prereq `skip`s. Sets `AGENTKEYS_CI=1` (the runner's `$CI` also triggers it). | +| `--ci` | software register + auto `--mock-agent` + tolerate prereq `skip`s + **phase 5 → `--wire mock`** (the runner-local mock-sandbox proof). Sets `AGENTKEYS_CI=1` (the runner's `$CI` also triggers it). | +| `--wire real\|mock\|none` | (v2-demo) force the wire phase: `real` = the sandbox wire (`phase1-wire-demo.sh`), `mock` = the CI mock-sandbox proof (`mock-wire-demo.sh`, auto under `--ci`), `none` = intentionally off. | | `--mock-agent` | (stage 3) mock the sandbox agent with a master-held DEV agent. Auto-applied by `v2-demo.sh`; only needed when running `v2-stage3-demo.sh` directly. | | `--allow-skip=` | (stage 3) opt a prereq into `skip` not `fail`. NOT a release gate. | | `--signer software` | force the file-key register signer directly. | @@ -338,8 +347,9 @@ Rules for any agent (human or AI) working **on** the harness: real sandbox). Flags are CI/dev only. Don't add operator-facing flags — prefer auto-detect (Cargo's own incremental build, sandbox auto-detect, idempotent skips). - **Run-mode mapping (the three roles above):** operator = no flag; `--ci` = software register - + mock agent + tolerate skips; **sandbox** = the agent-side tests run *in* the sandbox - (the master never signs for a sandbox-held key). The mock is plumbing-only, never the real agent. + + mock agent + tolerate skips + the phase-5 **mock-sandbox wire** (`mock-wire-demo.sh` on the + runner); **sandbox** = the agent-side tests run *in* the sandbox (the master never signs for a + sandbox-held key). The mock proves runtime plumbing, never the real agent's key custody. - **Keep the docs in sync — every time a harness script changes** (new flag, new step, renamed script, changed default), update **this runbook AND [`../harness/CLAUDE.md`](../harness/CLAUDE.md) in the same change.** A script change without the doc update is incomplete. diff --git a/harness/CLAUDE.md b/harness/CLAUDE.md index dac5773f..dcf3cf94 100644 --- a/harness/CLAUDE.md +++ b/harness/CLAUDE.md @@ -57,8 +57,12 @@ CLAUDE.md runbook-fix-fold-back policy, applied to every harness edit, not just from env / flags / `operator-workstation.env` / `agentkeys chain show`, never baked in. Temporary exceptions go in [`../hardcoded.md`](../hardcoded.md). **The one sanctioned synthetic agent:** CI may provision a **mock agent** (a master-held DEV - agent, `demo-agent-dev`) for the agent-side wiring steps (stage 3 11-12) — CI has no - sandbox, so the real §10.2 agent can't sign. That mock is **CI-only** (`--ci` / + agent, `demo-agent-dev`) for the agent-side wiring steps (stage 3 11-12) AND the + phase-5 CI mock-sandbox wire proof (`mock-wire-demo.sh`) — CI has no sandbox, so + the real §10.2 agent can't sign. Its grant is ONE canonical list, + `MOCK_SCOPE_SERVICES` (= `openrouter,memory:ci-wire-proof,mock-wire-llm` by + default), defined identically in `v2-stage3-demo.sh` + `mock-wire-demo.sh` so the + two phases never flip-flop setScope. That mock is **CI-only** (`--ci` / `--mock-agent`); operators **never** mock — they `defer` those steps to the sandbox. 6. **Deployer key via `_lib.sh`.** Source `harness/scripts/_lib.sh` and use `resolve_master_key` (raw-hex / mnemonic / `~/.agentkeys/heima-deployer.key`) @@ -233,9 +237,13 @@ Every orchestrator + the operator runbook MUST keep this split exact: sanctioned synthetic agent, contract rule 5), and **stage-1 auto-skips deploy/email/provision** (CI runs against pre-provisioned infra — contracts pinned, wallet_sig identity, buckets/roles an operator one-shot). Tolerates prereq skips. - `harness-ci.yml` runs the WHOLE orchestrator — **`v2-demo.sh --ci` → phases 1–4 + 6** - (phase 5/wire auto-skips: no aiosandbox). So phase 6 (the daemon web-chain runtime - proof) IS exercised in CI; the only phase CI can't run is the sandbox-bound wire. + `harness-ci.yml` runs the WHOLE orchestrator — **`v2-demo.sh --ci` → phases 1–6, + with phase 5 MOCKED**: `mock-wire-demo.sh` emulates the aiosandbox side ON the + runner (boots the real `agentkeys-mcp-server` with the mock agent's identity + + per-actor STS relay, master-self vaults a probe cred, then runs the SAME + `sandbox-agent-isolation.sh` the sandbox runs — memory-via-MCP + the #216 cred + fetch + the scope-denial negative). What the mock cannot prove is sandbox key + custody — that stays the operator's On-Sandbox run. **Fresh-ceremony / re-testable rule:** an operator run must EXERCISE the ceremony (Touch ID), not silently skip it — never let a re-run look "tested" while the biometric never fired. @@ -250,7 +258,7 @@ sandbox) is **GREEN**, never fail/incomplete. | Script | Goal | Entry | |---|---|---| -| **`v2-demo.sh`** | **THE single entry point — no flags = phases 1→2→3→4 (memory plant)→5 (wire)→6 (web↔agent parity); wire auto-runs when the aiosandbox is up, else reports INCOMPLETE + exits non-zero (an unexecuted proof is never green — pass `--wire none` to intentionally skip); fail-fast. `PHASE.STEP` addressing (`--from 4.1`, `--only 3.11`). Flags are CI/scoping only.** | (no flags) / `--ci` / `--stage N` / `--from P.S` / `--only P.S` / `--wire real\|light\|none` | +| **`v2-demo.sh`** | **THE single entry point — no flags = phases 1→2→3→4 (memory plant)→5 (wire)→6 (web↔agent parity); wire auto-runs when the aiosandbox is up, else reports INCOMPLETE + exits non-zero (an unexecuted proof is never green — pass `--wire none` to intentionally skip); fail-fast. Under `--ci` phase 5 is MOCKED, not skipped: `mock-wire-demo.sh` runs the post-wire agent runtime on the runner. `PHASE.STEP` addressing (`--from 4.1`, `--only 3.11`). Flags are CI/scoping only.** | (no flags) / `--ci` / `--stage N` / `--from P.S` / `--only P.S` / `--wire real\|mock\|none` | | `v2-stage1-demo.sh` | M1 foundation demo | `--only-step N` | | `v2-stage2-demo.sh` | hardening demo | `--only-step N` | | `v2-stage3-demo.sh` | OIDC + per-actor/data-class isolation proof (23 steps; 16–17 = #196 master-self + cross-actor scope; **18 = granted-agent positives — the memory cap AND the #216 cred-fetch cap for the granted service (200), an un-granted cred probe → ServiceNotInScope, and (CI/mock only) the live #216 REVOKE transition: setScope drops the service → the same cred-fetch mint is denied → restore**; **19–21 = #201 Config data-class isolation** — master-self layer-3/4 + cap data-class-mismatch, run on the operator, `skip` until config infra is provisioned/deployed; **22 = #207 classifier-worker isolation** — master-self `cap_op_mismatch` (storage cap → classify worker) + `cap_data_class_mismatch` (cross-data-class Classify cap), compute-gate so NO STS, `skip` until the worker is deployed; **23 = cleanup + summary**). **Steps 11-12 / 14-15 sign STS creds AS the agent: on the operator they `defer` to the sandbox (the §10.2 agent key lives in the sandbox) — GREEN, never fail. `--mock-agent` (CI-only, auto-on under `--ci`) provisions a master-held DEV agent so headless CI can prove the roundtrip; a real §10.2 agent proves it in-sandbox via `phase1-wire-demo.sh --real`. When 11-12 run they ALSO assert the **#229 durable-audit receipt**: fetch response `audit_envelope_hash` → envelope fetchable from `AGENTKEYS_WORKER_AUDIT_URL`, hash = keccak256(cbor) (the appendV2/appendRootV2 anchor commitment), no plaintext — skip reasons `audit-receipt-missing` / `audit-url-unset`.** | `--from/--to/--only-step` / `--mock-agent` | @@ -258,6 +266,7 @@ sandbox) is **GREEN**, never fail/incomplete. | `web-memory-bootstrap.sh` | issue #196 web-memory pre-flight + proof; runbook [`../docs/operator-runbook-web-memory.md`](../docs/operator-runbook-web-memory.md) | `--from/--to/--only-step` | | `memory-plant-demo.sh` | plant a proof memory archive through the REAL chain + read-back (the CLI/CI proof of the plant flow the web "⊕ plant prepared memory" button drives); **phase 4 of `v2-demo.sh`**. Plants into **dedicated `demo-*` namespaces** (never the real travel/personal/family) and **always deletes them on exit** (success OR failure, EXIT trap; `KEEP_DEMO_MEMORY=1` keeps), so test memory never leaks into the master's real store — the real prepared archive is planted ONLY by the user (the button), never by a demo or onboarding. Re-testable; idempotent (`--from 4.1`). | `--from-step/--only-step N` / `--ci` | | `web-parity-demo.sh` | **phase 6 of `v2-demo.sh`** (NOT a standalone front door) — boots `agentkeys-daemon --ui-bridge` SEEDED with the master's J1 + device via the `--ui-bridge-seed-*` daemon seam (skips re-onboarding) + plants a **dedicated `webparity` probe ns** through the **web** endpoint `POST /v1/master/memory/plant`, **deleted on exit** (success or failure). A 200 proves the daemon's chain (cap-mint → STS → worker → S3) == the agent/harness chain — the web↔harness drift gate. **Step 4 (#214)** additionally polls `GET /v1/agent/pairing/pending` and asserts a well-formed `{requests:[…]}` — the master-side web-pairing route reaches the real broker rendezvous (the full claim→register e2e needs a live §10.2 agent request, exercised agent-side). Reuses phases 1-2's build/chain/broker/master (one daemon boot, no re-bootstrap); real-only. | `--from-step/--only-step N` / `--ci` | +| `mock-wire-demo.sh` | **CI mock-sandbox wire proof (#216) — CI-ONLY.** Emulates the aiosandbox side ON the runner with the sanctioned mock agent: ensure agent + the canonical `MOCK_SCOPE_SERVICES` grant → mint operator + agent sessions (wallet_sig SIWE via `_lib.sh::wallet_sig_mint_jwt`) → boot the REAL `agentkeys-mcp-server` on `127.0.0.1:$MOCK_MCP_PORT` (http backend + per-actor STS relay, the phase1 1.4 shape) → master-self vault a probe cred under the DEDICATED `mock-wire-llm` service (never `openrouter` — can't clobber a real vault entry) → run the SAME `sandbox-agent-isolation.sh` with a staged harness-env + `EXPECTED_CRED_SHA256`. Proves the post-wire agent RUNTIME (MCP server + CLI) headless; sandbox key custody stays operator-only. **Phase 5 of `v2-demo.sh` under `--ci`** (`--wire mock`). Idempotent; MCP + temp files torn down on EXIT. | `--from-step/--only-step N` | | `cred-fetch-demo.sh` | **#216 agent-side vaulted-key fetch, real e2e** (standalone). A master **vaults** a probe credential via the daemon (web path: cap-mint cred-store → STS → cred worker → S3), then the **agent** fetches it back with `agentkeys cred fetch` (CLI path: cap-mint cred-fetch → STS → cred worker → **decrypt**), asserting the EXACT secret round-trips. Proves the cred half of "the agent uses the key the master authorized it to use" (the Hermes wire is phase1-wire #216 Phase 4.0). Routes through the shared `agentkeys-backend-client` (no re-typed shapes, #204). Idempotent (a FIXED `cred-e2e-probe` service is overwritten each run — never accumulates); daemon killed on exit; real-only. | `--from-step/--only-step N` / `--ci` | | `cred-wire-demo.sh` | **#216 agent-side wire, the FULL e2e** (standalone, headless). Extends `cred-fetch-demo.sh` through the Hermes wire: master vaults the LLM key → **agent cred-fetches it** → **plant into the sandbox Hermes** (`~/.hermes/.env` + `hermes config set model.*`) → **Hermes runs on the vault key** (real LLM smoke), asserting the planted key == the vaulted key (sha) with **no `OPENROUTER_API_KEY` in the agent env**. The durable, no-Touch-ID complement to `phase1-wire-demo.sh` Phase 4.0b. Needs a reachable aiosandbox (`SANDBOX_URL`, default `:8080`) with Hermes installed. Idempotent (FIXED `openrouter` service; `.env` key-line rewritten not appended); daemon killed on exit; real-only. | `--from-step/--only-step N` / `--ci` | | `sandbox-build-push.sh` | **Path-A binary provisioner (utility, not a stage demo).** Cross-builds the agent binaries (`agentkeys` + `agentkeys-mcp-server` + `agentkeys-daemon`) for the sandbox's aarch64-Linux arch in the cached arm64 builder image (sharing phase1-wire-demo.sh's exact `agentkeys-sandbox-builder` image + `agentkeys-sandbox-*` cargo/target volumes → a warm tree re-pushes in seconds) and uploads them to the sandbox's `~/.local/bin` via the file API. **Build + push ONLY** — it never pairs or wires (that's the master's job in the parent-control web UI). Re-run after any local code change so the in-sandbox agent runs current source. | `SANDBOX_URL` / `RUST_BUILD_IMAGE` / `CROSS_RUST_TOOLCHAIN` | diff --git a/harness/mock-wire-demo.sh b/harness/mock-wire-demo.sh new file mode 100755 index 00000000..765bf1b5 --- /dev/null +++ b/harness/mock-wire-demo.sh @@ -0,0 +1,216 @@ +#!/usr/bin/env bash +# harness/mock-wire-demo.sh — CI mock-sandbox wire proof (#216). CI-ONLY. +# +# CI has no aiosandbox, so phase 5 (the wire) used to be OFF there — the entire +# post-wire AGENT RUNTIME (the in-sandbox MCP server + `agentkeys cred fetch` + +# the hook path) ran nowhere headless. This script emulates the aiosandbox side +# ON THE RUNNER with the ONE sanctioned synthetic agent (the master-held mock, +# contract rule 5) and then runs THE SAME proof script the real sandbox runs: +# +# ensure mock agent (registered + canonical scope grant) +# → mint agent + operator sessions (wallet_sig SIWE, headless) +# → boot the REAL agentkeys-mcp-server on localhost (http backend, per-actor +# STS relay — the exact phase1 1.4 shape, minus the sandbox) +# → master-self vault a probe LLM cred (agentkeys cred store) +# → stage a harness-env + run harness/scripts/sandbox-agent-isolation.sh: +# memory roundtrip THROUGH the MCP + #216 cred fetch (positive, sha-exact) +# + the un-granted scope-denial negative +# +# What this proves that stage-3 steps 11-12 (raw curls) don't: the MCP-server + +# CLI runtime layer — the same binaries/wire the sandbox agent uses. What it +# CANNOT prove: sandbox key custody (the mock key is master-held) — that's the +# operator's On-Sandbox run (phase1-wire-demo.sh --real → sandbox-agent- +# isolation.sh in the sandbox). +# +# The canonical mock-agent grant (KEEP IDENTICAL to v2-stage3-demo.sh): +# $SMOKE_SERVICE , memory:$MOCK_WIRE_NS , $MOCK_CRED_SERVICE +# The cred probe uses the DEDICATED $MOCK_CRED_SERVICE (never `openrouter`), so +# a run can never overwrite a real vaulted LLM key. +# +# Idempotent: agent create + scope grant short-circuit on chain state; the vault +# probe service is FIXED (overwritten per run); the memory ns blob is replaced +# per put; the MCP server + temp files are torn down on EXIT. +# +# bash harness/mock-wire-demo.sh # full (CI invokes via v2-demo --ci) +# bash harness/mock-wire-demo.sh --only-step 4 # one step +set -uo pipefail +set +m + +REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)" +ENV_FILE="${ENV_FILE:-$REPO_ROOT/scripts/operator-workstation.env}" +[ -f "$ENV_FILE" ] && { set -a; . "$ENV_FILE"; set +a; } +# shellcheck source=/dev/null +. "$REPO_ROOT/harness/scripts/_lib.sh" + +FROM=1; TO=99; STEP_TOTAL=6 +for a in "$@"; do case "$a" in + --ci) : ;; # accepted for orchestrator symmetry — this script IS the CI path + --from-step) shift; FROM="${1:-1}" ;; --from-step=*) FROM="${a#*=}" ;; + --to-step) shift; TO="${1:-99}" ;; --to-step=*) TO="${a#*=}" ;; + --only-step) shift; FROM="${1:-1}"; TO="$FROM" ;; --only-step=*) FROM="${a#*=}"; TO="$FROM" ;; + --help|-h) sed -n '2,36p' "$0" | sed 's/^# \{0,1\}//'; exit 0 ;; +esac; done +should_run() { [ "$1" -ge "$FROM" ] && [ "$1" -le "$TO" ]; } +c() { [ -t 2 ] && printf '\033[%sm%s\033[0m' "$1" "$2" || printf '%s' "$2"; } +step() { printf '\n%s %s\n' "$(c '1;36' "▸ step $1/$STEP_TOTAL")" "$2" >&2; } +ok() { printf ' %s %s\n' "$(c '1;32' ok)" "$1" >&2; } +skip() { printf ' %s %s\n' "$(c '1;33' skip)" "$1" >&2; } +die() { printf ' %s %s\n' "$(c '1;31' fail)" "$1" >&2; exit 1; } +sha_hex() { { command -v sha256sum >/dev/null 2>&1 && printf '%s' "$1" | sha256sum || printf '%s' "$1" | shasum -a 256; } | awk '{print $1}'; } + +BROKER="${OIDC_ISSUER:-${AGENTKEYS_BROKER_URL:-}}" +REGION="${REGION:-us-east-1}" +SMOKE_SERVICE="${SMOKE_TEST_SERVICE:-openrouter}" +MOCK_WIRE_NS="${MOCK_WIRE_NS:-ci-wire-proof}" +MOCK_CRED_SERVICE="${MOCK_CRED_SERVICE:-mock-wire-llm}" +MOCK_SCOPE_SERVICES="${MOCK_SCOPE_SERVICES:-$SMOKE_SERVICE,memory:$MOCK_WIRE_NS,$MOCK_CRED_SERVICE}" +MOCK_AGENT_LABEL="${AGENTKEYS_MOCK_AGENT_LABEL:-demo-agent-dev}" +MOCK_MCP_PORT="${MOCK_MCP_PORT:-18090}" +VENDOR_TOKEN="${AGENTKEYS_MCP_VENDOR_TOKEN:-harness-tok}" + +# Binaries: the v2-demo preflight prepends target/release to PATH; standalone +# runs resolve release → debug → PATH and build on miss (contract rule 7). +resolve_bin() { + local name="$1" + if [ -x "$REPO_ROOT/target/release/$name" ]; then printf '%s' "$REPO_ROOT/target/release/$name" + elif [ -x "$REPO_ROOT/target/debug/$name" ]; then printf '%s' "$REPO_ROOT/target/debug/$name" + else command -v "$name" 2>/dev/null || true; fi +} + +MPID=""; MLOG=""; ASB_FILE=""; HENV_FILE="" +cleanup() { + [ -n "$MPID" ] && kill "$MPID" 2>/dev/null + rm -f "$MLOG" "$ASB_FILE" "$HENV_FILE" +} +trap cleanup EXIT + +# ─── Step 1: prereqs + master identity ────────────────────────────────────── +if should_run 1; then + step 1 "Prereqs: tools + binaries + broker + the REGISTERED master (signs the mock's chain state)" + for t in cast jq curl; do command -v "$t" >/dev/null 2>&1 || die "missing $t"; done + [ -n "$BROKER" ] || die "no broker URL (OIDC_ISSUER) — the mock wire is real-broker-only" + [ -n "${AGENTKEYS_WORKER_CRED_URL:-}" ] || die "AGENTKEYS_WORKER_CRED_URL unset" + [ -n "${AGENTKEYS_WORKER_MEMORY_URL:-}" ] || die "AGENTKEYS_WORKER_MEMORY_URL unset" + [ -n "${VAULT_ROLE_ARN:-}" ] || die "VAULT_ROLE_ARN unset" + [ -n "${MEMORY_ROLE_ARN:-}" ] || die "MEMORY_ROLE_ARN unset" + CLI_BIN="$(resolve_bin agentkeys)"; MCP_BIN="$(resolve_bin agentkeys-mcp-server)" + if [ -z "$CLI_BIN" ] || [ -z "$MCP_BIN" ]; then + [ -n "${AGENTKEYS_SKIP_CLI_BUILD:-}" ] && die "binaries missing but AGENTKEYS_SKIP_CLI_BUILD set — the preflight should have built them" + ( cd "$REPO_ROOT" && cargo build --release -p agentkeys-cli -p agentkeys-mcp-server ) || die "cargo build failed" + CLI_BIN="$(resolve_bin agentkeys)"; MCP_BIN="$(resolve_bin agentkeys-mcp-server)" + fi + { [ -n "$CLI_BIN" ] && [ -n "$MCP_BIN" ]; } || die "could not resolve agentkeys / agentkeys-mcp-server binaries" + KEY=$(resolve_master_key) || die "no master deployer key" + MASTER_ADDR_LC=$(cast wallet address --private-key "$KEY" | tr 'A-F' 'a-f') + OMNI_RAW=$(printf 'agentkeysevm%s' "$MASTER_ADDR_LC" | shasum -a 256 | awk '{print $1}') + OPERATOR_OMNI="0x$OMNI_RAW" + MASTER_DKH=$(resolve_active_master_dkh "$OMNI_RAW" "$MASTER_ADDR_LC" || true) + [ -n "$MASTER_DKH" ] || die "no ACTIVE master device for operator $OPERATOR_OMNI — run stages 1-2 first (the #164 register); the mock agent's create/grant and the master-self vault all need it" + ok "binaries + env ready; operator ${OPERATOR_OMNI:0:14}…, master device ${MASTER_DKH:0:14}…" +fi + +# ─── Step 2: mock agent — register + the CANONICAL scope grant ────────────── +if should_run 2; then + step 2 "Mock agent '$MOCK_AGENT_LABEL': register (idempotent) + grant [$MOCK_SCOPE_SERVICES]" + profile_uc=$(printf '%s' "${AGENTKEYS_CHAIN:-heima}" | tr 'a-z-' 'A-Z_') + registry_addr=$(eval "echo \${SIDECAR_REGISTRY_ADDRESS_${profile_uc}:-}") + scope_addr=$(eval "echo \${SCOPE_CONTRACT_ADDRESS_${profile_uc}:-}") + { [ -n "$registry_addr" ] && [ "$registry_addr" != 0x0 ]; } || die "no SidecarRegistry address in env" + { [ -n "$scope_addr" ] && [ "$scope_addr" != 0x0 ]; } || die "no AgentKeysScope address in env" + bash "$REPO_ROOT/scripts/heima-agent-create.sh" --label "$MOCK_AGENT_LABEL" --registry-address "$registry_addr" >&2 \ + || die "heima-agent-create.sh failed for '$MOCK_AGENT_LABEL'" + grant_json=$(bash "$REPO_ROOT/scripts/heima-scope-set.sh" --agent "$MOCK_AGENT_LABEL" \ + --services "$MOCK_SCOPE_SERVICES" --scope-address "$scope_addr" | tail -1) || grant_json="" + echo "$grant_json" | jq -e '.ok==true' >/dev/null 2>&1 \ + || die "scope grant did not land: $(echo "$grant_json" | tr '\n' ' ' | cut -c1-200)" + AGENT_FILE="$HOME/.agentkeys/agents/${MOCK_AGENT_LABEL}.json" + MOCK_ACTOR=$(jq -r '.actor_omni // empty' "$AGENT_FILE"); MOCK_ACTOR="0x${MOCK_ACTOR#0x}" + MOCK_DKH=$(jq -r '.device_key_hash // empty' "$AGENT_FILE") + [ -n "$MOCK_DKH" ] || MOCK_DKH=$(cast keccak "$(jq -r '.agent_address // .wallet_address' "$AGENT_FILE" | tr '[:upper:]' '[:lower:]')") + [ -n "${MOCK_ACTOR#0x}" ] || die "agent file $AGENT_FILE missing actor_omni" + ok "mock agent on chain — actor ${MOCK_ACTOR:0:14}…, device ${MOCK_DKH:0:14}…, scope [$MOCK_SCOPE_SERVICES]" +fi + +# ─── Step 3: sessions — operator (cap-mint authz) + agent (per-actor STS) ─── +if should_run 3; then + step 3 "Sessions: operator (deployer SIWE) + agent ('$MOCK_AGENT_LABEL' key SIWE → 0600 file)" + [ -n "${KEY:-}" ] || die "no master key — run step 1" + okey="$(mktemp)"; ( umask 077; printf '%s' "$KEY" > "$okey" ) + OP_JWT=$(wallet_sig_mint_jwt "$okey" "$BROKER"); rc=$?; rm -f "$okey" + [ "$rc" = 0 ] && [ -n "$OP_JWT" ] || die "operator session mint failed (broker wallet_sig)" + agent_pk=$(jq -r '.agent_private_key // empty' "${AGENT_FILE:-$HOME/.agentkeys/agents/$MOCK_AGENT_LABEL.json}") + { [ -n "$agent_pk" ] && [ "$agent_pk" != null ]; } || die "mock agent has no master-held key — heima-agent-create ran in §10.2 mode?" + akey="$(mktemp)"; ( umask 077; printf '%s' "$agent_pk" > "$akey" ) + AGENT_JWT=$(wallet_sig_mint_jwt "$akey" "$BROKER"); rc=$?; rm -f "$akey" + [ "$rc" = 0 ] && [ -n "$AGENT_JWT" ] || die "agent session mint failed (broker wallet_sig)" + ASB_FILE="$(mktemp)"; ( umask 077; printf '%s' "$AGENT_JWT" > "$ASB_FILE" ) + ok "operator session (${#OP_JWT} chars) + agent session (${#AGENT_JWT} chars, omni == actor) ready" +fi + +# ─── Step 4: boot the REAL MCP server on localhost (the mock 'sandbox') ───── +if should_run 4; then + step 4 "agentkeys-mcp-server on 127.0.0.1:$MOCK_MCP_PORT (http backend + per-actor STS relay)" + { [ -n "${MOCK_ACTOR:-}" ] && [ -n "${ASB_FILE:-}" ]; } || die "need steps 2-3 first" + MLOG="$(mktemp -t mock-wire-mcp.XXXX)" + mcp_args=(--backend http --transport http --listen "127.0.0.1:$MOCK_MCP_PORT" + --vendor-tokens "harness:$VENDOR_TOKEN" --broker-url "${BROKER%/}" + --memory-url "${AGENTKEYS_WORKER_MEMORY_URL}" + --default-actor "$MOCK_ACTOR" --default-operator-omni "$OPERATOR_OMNI" + --default-device-key-hash "$MOCK_DKH" + --agent-session-bearer-file "$ASB_FILE" + --memory-role-arn "${MEMORY_ROLE_ARN}" --vault-role-arn "${VAULT_ROLE_ARN}" + --aws-region "$REGION") + [ -n "${AGENTKEYS_WORKER_AUDIT_URL:-}" ] && mcp_args+=(--audit-url "$AGENTKEYS_WORKER_AUDIT_URL") + "$MCP_BIN" "${mcp_args[@]}" > "$MLOG" 2>&1 & + MPID=$! + ready=0; for _ in $(seq 1 20); do + curl -fsS "http://127.0.0.1:$MOCK_MCP_PORT/healthz" >/dev/null 2>&1 && { ready=1; break; } + kill -0 "$MPID" 2>/dev/null || break; sleep 0.5 + done + [ "$ready" = 1 ] || die "MCP server not ready: $(tail -3 "$MLOG" | tr '\n' ' ' | cut -c1-200)" + ok "MCP up (pid $MPID) — same runtime layer the sandbox agent talks to" +fi + +# ─── Step 5: master-self vault the probe LLM cred ─────────────────────────── +if should_run 5; then + step 5 "Master vaults '$MOCK_CRED_SERVICE' (probe secret) — what the mock agent is authorized to fetch" + { [ -n "${OP_JWT:-}" ] && [ -n "${MASTER_DKH:-}" ]; } || die "need steps 1+3 first" + PROBE_SECRET="sk-mock-wire-$$-$(date +%s)" + export PROBE_SECRET + s3key=$("$CLI_BIN" cred store "$MOCK_CRED_SERVICE" --secret-env PROBE_SECRET \ + --operator-omni "$OPERATOR_OMNI" --actor-omni "$OPERATOR_OMNI" \ + --device-key-hash "$MASTER_DKH" --session-bearer "$OP_JWT" \ + --broker-url "${BROKER%/}" --cred-url "${AGENTKEYS_WORKER_CRED_URL}" \ + --vault-role-arn "${VAULT_ROLE_ARN}" --region "$REGION" 2>&1) \ + || die "master-self cred store failed: $(echo "$s3key" | tr '\n' ' ' | cut -c1-200)" + PROBE_SHA=$(sha_hex "$PROBE_SECRET") + ok "vaulted (s3: $(echo "$s3key" | tail -1 | cut -c1-60)…) — expected sha ${PROBE_SHA:0:12}…" +fi + +# ─── Step 6: THE PROOF — the sandbox script, on the runner ────────────────── +if should_run 6; then + step 6 "sandbox-agent-isolation.sh on the runner: memory via MCP + #216 cred fetch + scope negative" + { [ -n "${OP_JWT:-}" ] && [ -n "${MOCK_ACTOR:-}" ] && [ -n "${PROBE_SHA:-}" ]; } || die "need steps 1-5 first" + HENV_FILE="$(mktemp -t mock-wire-henv.XXXX)" + ( umask 077; { + printf 'AGENTKEYS_MCP_URL=http://127.0.0.1:%s/mcp\n' "$MOCK_MCP_PORT" + printf 'AGENTKEYS_MCP_VENDOR_TOKEN=%s\n' "$VENDOR_TOKEN" + printf 'AGENTKEYS_ACTOR_OMNI=%s\n' "$MOCK_ACTOR" + printf 'AGENTKEYS_OPERATOR_OMNI=%s\n' "$OPERATOR_OMNI" + printf 'AGENTKEYS_DEVICE_KEY_HASH=%s\n' "$MOCK_DKH" + printf 'AGENTKEYS_SESSION_BEARER=%s\n' "$OP_JWT" + printf 'AGENTKEYS_BROKER_URL=%s\n' "${BROKER%/}" + printf 'AGENTKEYS_WORKER_CRED_URL=%s\n' "${AGENTKEYS_WORKER_CRED_URL}" + printf 'VAULT_ROLE_ARN=%s\n' "${VAULT_ROLE_ARN}" + printf 'REGION=%s\n' "$REGION" + printf 'CRED_SERVICE=%s\n' "$MOCK_CRED_SERVICE" + } > "$HENV_FILE" ) + if HARNESS_ENV="$HENV_FILE" EXPECTED_CRED_SHA256="$PROBE_SHA" AGENT_BIN="$CLI_BIN" \ + bash "$REPO_ROOT/harness/scripts/sandbox-agent-isolation.sh" "$MOCK_WIRE_NS"; then + ok "the SAME proof the sandbox runs passed on the runner (mock agent): MCP memory roundtrip + authorized cred fetch (sha-exact) + un-granted denial" + else + die "mock-sandbox proof failed — see sandbox-agent-isolation.sh output above" + fi +fi + +printf '\n%s the CI mock sandbox ran the full post-wire agent runtime — MCP server + agentkeys CLI against the live broker/workers. Sandbox key custody is the operator run (phase1-wire-demo.sh --real).\n' "$(c '1;32' 'DONE ·')" >&2 diff --git a/harness/scripts/_lib.sh b/harness/scripts/_lib.sh index 28d98e9e..d0abcdd1 100644 --- a/harness/scripts/_lib.sh +++ b/harness/scripts/_lib.sh @@ -280,3 +280,40 @@ resolve_active_master_dkh() { fi return 1 } + +# wallet_sig_mint_jwt +# Mint a broker session JWT non-interactively via the wallet_sig (SIWE) auth +# plugin: /v1/auth/wallet/start → cast wallet sign → /v1/auth/wallet/verify. +# Echoes the JWT on stdout (logs to stderr); rc 1 on any failure. The JWT's +# agentkeys.omni_account == the signing key's broker omni: pass the DEPLOYER +# key for an OPERATOR session (cap-mint authz) or an AGENT's key for the +# per-actor STS-relay session (omni == actor_omni). Uses temp files, not shell +# pipelines — jq chokes on the multi-line SIWE message in a var round-trip. +wallet_sig_mint_jwt() { + local key_file="$1" issuer="${2%/}" + local key addr t1 t2 rid msg sig jwt + key="$(tr -d '[:space:]' < "$key_file")" || return 1 + case "$key" in 0x*) ;; *) key="0x$key" ;; esac + addr="$(cast wallet address --private-key "$key" 2>/dev/null)" \ + || { echo "wallet_sig_mint_jwt: cast wallet address failed (key valid? cast on PATH?)" >&2; return 1; } + t1="$(mktemp)"; t2="$(mktemp)" + if ! curl -sS --max-time 15 -X POST "$issuer/v1/auth/wallet/start" -H 'content-type: application/json' \ + -d "$(jq -n --arg a "$addr" '{address:$a, chain_id:1}')" -o "$t1"; then + echo "wallet_sig_mint_jwt: POST $issuer/v1/auth/wallet/start failed" >&2; rm -f "$t1" "$t2"; return 1 + fi + rid="$(jq -r '.request_id // empty' "$t1")"; msg="$(jq -r '.siwe_message // empty' "$t1")" + if [ -z "$rid" ] || [ -z "$msg" ]; then + echo "wallet_sig_mint_jwt: wallet/start missing request_id/siwe_message: $(head -c 160 "$t1")" >&2 + rm -f "$t1" "$t2"; return 1 + fi + sig="$(cast wallet sign --private-key "$key" "$msg" 2>/dev/null)" \ + || { echo "wallet_sig_mint_jwt: cast wallet sign failed" >&2; rm -f "$t1" "$t2"; return 1; } + if ! curl -sS --max-time 15 -X POST "$issuer/v1/auth/wallet/verify" -H 'content-type: application/json' \ + -d "$(jq -n --arg r "$rid" --arg s "$sig" '{request_id:$r, signature:$s}')" -o "$t2"; then + echo "wallet_sig_mint_jwt: POST $issuer/v1/auth/wallet/verify failed" >&2; rm -f "$t1" "$t2"; return 1 + fi + jwt="$(jq -r '.session_jwt // .jwt // empty' "$t2")" + rm -f "$t1" "$t2" + [ -n "$jwt" ] || { echo "wallet_sig_mint_jwt: no session JWT in wallet/verify (broker wallet_sig plugin enabled?)" >&2; return 1; } + printf '%s' "$jwt" +} diff --git a/harness/v2-demo.sh b/harness/v2-demo.sh index 78a5fdb6..f45299b3 100755 --- a/harness/v2-demo.sh +++ b/harness/v2-demo.sh @@ -26,9 +26,9 @@ # bash harness/v2-demo.sh --from 5 # just the wire phase (phase 5 has no sub-steps) # # Other flags are for CI / scoping only — an operator should not need them: -# bash harness/v2-demo.sh --ci # CI: software register + mock agent + tolerate skips; wire OFF (no sandbox) +# bash harness/v2-demo.sh --ci # CI: software register + mock agent + tolerate skips; wire MOCKED on the runner (no sandbox) # bash harness/v2-demo.sh --stage 3 # one phase -# bash harness/v2-demo.sh --wire real|none # force the wire phase on/off (real-data-only; in-memory 'light' removed) +# bash harness/v2-demo.sh --wire real|mock|none # force the wire phase: real sandbox / CI mock-sandbox / off (in-memory 'light' removed) # bash harness/v2-demo.sh --allow-skip=agent-file-invalid # passthrough to stage 3 # # Fail-fast: stops at the first failing phase (a red phase cascades); the wire phase @@ -77,7 +77,10 @@ fi { [ -n "${AGENTKEYS_CI:-}" ] || [ -n "${CI:-}" ] && [ "$CI" != 0 ]; } && CI=1 # Phase 5 (wire) runs only when it's in STAGES; WIRE_MODE controls HOW. Operator → auto -# (real if the sandbox is up, else skip-with-note); CI (no sandbox) → none. --wire wins. +# (real if the sandbox is up, else skip-with-note); CI (no sandbox) → MOCK: the runner +# emulates the aiosandbox side (mock-wire-demo.sh boots the real MCP server + runs the +# SAME sandbox-agent-isolation.sh with the master-held mock agent — #216 cred fetch + +# memory-via-MCP, headless). --wire wins (`none` = intentionally off). # "is the aiosandbox (agent container) up?" — probe its HTTP API (the SAME gate # phase1-wire-demo.sh:1.1 uses), NOT a local openviking install. Respects $SANDBOX_URL. sandbox_present() { @@ -85,7 +88,7 @@ sandbox_present() { curl -fsS --max-time 3 "$u/healthz" >/dev/null 2>&1 || curl -fsS --max-time 3 "$u/v1/sandbox" >/dev/null 2>&1 } if [ -z "$WIRE_MODE" ]; then - if [ "$CI" = 1 ]; then WIRE_MODE=none; else WIRE_MODE=auto; fi + if [ "$CI" = 1 ]; then WIRE_MODE=mock; else WIRE_MODE=auto; fi fi c() { [ -t 2 ] && printf '\033[%sm%s\033[0m' "$1" "$2" || printf '%s' "$2"; } @@ -131,15 +134,20 @@ preflight() { # Runs --webauthn so the MASTER grants the agent's memory: scope (one Touch ID, like # phases 1-2); WITHOUT the grant the agent pairs but memory.get → service_not_in_scope. # Sets WIRE_RESULT so the final summary can tell a real PASS apart from an auto-skip: -# wired — the wire actually ran (proof executed) -# disabled — intentionally off (--wire none / CI has no sandbox) → clean +# wired — the REAL wire ran (sandbox proof executed) +# mocked — the CI mock-sandbox proof ran (mock-wire-demo.sh: the runner boots the +# real MCP server + runs sandbox-agent-isolation.sh with the master-held +# mock agent — the post-wire runtime, NOT sandbox key custody) +# disabled — intentionally off (--wire none) → clean # skipped — auto mode but NO aiosandbox → the proof did NOT run (NOT a pass) -# Returns 0 on wired/disabled/auto-skip, non-zero only on a real wire failure. The -# auto-skip is surfaced as DEMO INCOMPLETE (non-zero exit) by the summary, so an +# Returns 0 on wired/mocked/disabled/auto-skip, non-zero only on a real wire failure. +# The auto-skip is surfaced as DEMO INCOMPLETE (non-zero exit) by the summary, so an # unexecuted proof can never read as green. run_wire_phase() { case "$WIRE_MODE" in - none) phase "5 — wire (disabled: --wire none / CI has no sandbox)"; WIRE_RESULT=disabled; return 0 ;; + none) phase "5 — wire (disabled: --wire none)"; WIRE_RESULT=disabled; return 0 ;; + mock) phase "5 — mock-wire-demo.sh (CI mock sandbox: master-held dev agent on the runner)" + WIRE_RESULT=mocked; bash "$REPO_ROOT/mock-wire-demo.sh" ;; auto) if sandbox_present; then phase "5 — phase1-wire-demo.sh --real --webauthn"; WIRE_RESULT=wired; bash "$REPO_ROOT/phase1-wire-demo.sh" --real --webauthn @@ -150,8 +158,8 @@ run_wire_phase() { return 0 fi ;; real) phase "5 — phase1-wire-demo.sh --real --webauthn"; WIRE_RESULT=wired; bash "$REPO_ROOT/phase1-wire-demo.sh" --real --webauthn ;; - light) echo "v2-demo: --wire light (in-memory) was removed — real-data-only. Use --wire real|none." >&2; return 1 ;; - *) echo "v2-demo: --wire wants real|none (got '$WIRE_MODE')" >&2; return 1 ;; + light) echo "v2-demo: --wire light (in-memory) was removed — real-data-only. Use --wire real|mock|none." >&2; return 1 ;; + *) echo "v2-demo: --wire wants real|mock|none (got '$WIRE_MODE')" >&2; return 1 ;; esac } @@ -225,6 +233,7 @@ printf '\n%s phases %s — all green.\n' "$(c '1;32' 'v2-demo DONE ·')" "$STAGE if [ "$wire_requested" = 1 ]; then case "${WIRE_RESULT:-}" in wired) say "agent paired in the sandbox → run the agent-side proof THERE: bash \$HOME/sandbox-agent-isolation.sh" ;; + mocked) say "wire proof ran on the CI mock sandbox (master-held dev agent — post-wire runtime proven). The REAL §10.2 sandbox-key proof is the operator run." ;; disabled) say "wire intentionally disabled (--wire none) — no §10.2 agent was paired this run." ;; esac fi diff --git a/harness/v2-stage3-demo.sh b/harness/v2-stage3-demo.sh index fed21840..db207e27 100755 --- a/harness/v2-stage3-demo.sh +++ b/harness/v2-stage3-demo.sh @@ -570,6 +570,14 @@ fi # never landed stage-1 step 13's setScopeWithWebauthn). SMOKE_SERVICE="${SMOKE_TEST_SERVICE:-openrouter}" SMOKE_PLAINTEXT="${SMOKE_TEST_SECRET:-stage3-roundtrip-secret-$(date +%s)}" +# THE canonical mock-agent grant — keep IDENTICAL to harness/mock-wire-demo.sh +# (which re-grants the same list for the phase-5 CI mock-sandbox proof). One list, +# two call sites here: ensure_mock_agent + the step-18 revoke-transition restore. +# Granting only $SMOKE_SERVICE here would make phases 3 and 5 flip-flop setScope +# every CI run (set-replace semantics) — the same list lets both idempotently skip. +MOCK_WIRE_NS="${MOCK_WIRE_NS:-ci-wire-proof}" +MOCK_CRED_SERVICE="${MOCK_CRED_SERVICE:-mock-wire-llm}" +MOCK_SCOPE_SERVICES="${MOCK_SCOPE_SERVICES:-$SMOKE_SERVICE,memory:$MOCK_WIRE_NS,$MOCK_CRED_SERVICE}" # Resolve the demo agent's actor_omni + device_key_hash. Prefer the # agent file (created by stage-1 step 12) so the cap binds to a real @@ -633,7 +641,7 @@ ensure_mock_agent() { bash "$rr/scripts/heima-agent-create.sh" --label "$label" --registry-address "$registry_addr" >&2 \ || { echo "ensure_mock_agent: heima-agent-create.sh failed" >&2; return 1; } if [ -n "$scope_addr" ] && [ "$scope_addr" != 0x0 ]; then - local args=(--agent "$label" --services "$SMOKE_SERVICE" --scope-address "$scope_addr") + local args=(--agent "$label" --services "$MOCK_SCOPE_SERVICES" --scope-address "$scope_addr") [ "${WEBAUTHN_MODE:-0}" = 1 ] && args+=(--webauthn) bash "$rr/scripts/heima-scope-set.sh" "${args[@]}" >&2 \ || { echo "ensure_mock_agent: heima-scope-set.sh failed" >&2; return 1; } @@ -1359,9 +1367,9 @@ if should_run_step 18; then die "#216 post-revoke cred-fetch returned unexpected HTTP $rc — body: $body" fi restore_json=$(bash "$REPO_ROOT/scripts/heima-scope-set.sh" --agent "$pg_label" \ - --services "$SMOKE_SERVICE" --scope-address "$scope_addr" | tail -1) || restore_json="" + --services "$MOCK_SCOPE_SERVICES" --scope-address "$scope_addr" | tail -1) || restore_json="" echo "$restore_json" | jq -e '.ok==true' >/dev/null 2>&1 \ - && info "#216 revoke transition: scope restored to '$SMOKE_SERVICE' for '$pg_label'" \ + && info "#216 revoke transition: scope restored to the canonical [$MOCK_SCOPE_SERVICES] for '$pg_label'" \ || info "#216 revoke transition: restore did not confirm (next run's ensure_mock_agent self-heals) — $(echo "$restore_json" | tr '\n' ' ' | cut -c1-120)" fi fi From 26dd17236fb9c14e74088a19b6ecdb4cb36de6e9 Mon Sep 17 00:00:00 2001 From: Hanwen Cheng Date: Thu, 11 Jun 2026 12:57:06 +0800 Subject: [PATCH 03/17] =?UTF-8?q?fix:=20#216=20delegated=20cred-fetch=20re?= =?UTF-8?q?ads=20the=20master's=20vault=20=E2=80=94=20the=20agent-identity?= =?UTF-8?q?=20fetch=20never=20could?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The mock-wire CI proof (first headless run of the post-wire agent runtime) caught a REAL #216 gap, not a harness bug: the cred worker keys S3 strictly by cap.payload.actor_omni, so an agent-identity fetch (actor=agent) read bots//credentials/ — but the #216 flow vaults the key MASTER-SELF into bots//credentials/. 502 s3_get. Every prior #216 proof was master-self (operator==actor → same prefix), and phase1's old host fetch swallowed the failure (2>/dev/null → env fallback), so 'the agent fetches from the master's vault' had never actually worked end-to-end. Fix (worker, fetch only): try the actor's OWN vault first (#228 agent-owned creds — self-stored entries shadow delegated ones), then, for a DELEGATED cap (actor != operator), fall back to the OPERATOR's vault. The envelope AAD is keyed by the vault OWNER (each vault's objects were encrypted with aad(operator, owner, service, epoch) at store time), so decrypt matches either source. No IAM change: the S3 read still runs under the caller-relayed STS — reading the operator prefix requires the operator session the wire context already holds; the (device-bound, isServiceInScope-verified) cap narrows WHICH service that session releases. Store/teardown/list stay strictly actor-keyed. Unit tests: fetch_vault_owners (master-self = one vault, byte-identical prior behavior; delegated = actor-then-operator). arch.md synced (credential_envelope row + the cred-fetch sequence) per the architecture-as-source-of-truth policy. CI self-verifies: crates/agentkeys-worker-*/** trips the paths-filter → the test EC2 auto-redeploys → the harness (incl. phase-5 mock wire step 6) runs against the fixed worker. --- crates/agentkeys-worker-creds/src/handlers.rs | 100 ++++++++++++++---- crates/agentkeys-worker-creds/src/lib.rs | 7 +- docs/arch.md | 4 +- 3 files changed, 86 insertions(+), 25 deletions(-) diff --git a/crates/agentkeys-worker-creds/src/handlers.rs b/crates/agentkeys-worker-creds/src/handlers.rs index f469482e..57532267 100644 --- a/crates/agentkeys-worker-creds/src/handlers.rs +++ b/crates/agentkeys-worker-creds/src/handlers.rs @@ -269,30 +269,68 @@ async fn cred_fetch_inner( creds: Option<&crate::aws_creds::StsCreds>, req: &FetchRequest, ) -> Result, ApiError> { - let key = s3_key(&req.cap.payload.actor_omni, &req.cap.payload.service); + // A fetch may read TWO vaults, tried in order (see fetch_vault_owners): + // the actor's OWN (#228 agent-owned creds), then — for a DELEGATED cap + // (actor != operator) — the OPERATOR's vault (#216: the master vaulted the + // key master-self; the cap's already-verified on-chain cred: scope + // grant IS the agent's authorization to fetch it). The S3 read still runs + // under the CALLER-relayed STS creds, so layer-3 IAM is untouched: reading + // the operator's prefix requires operator-tagged STS (the wire context's + // operator session) — the cap only narrows WHICH service that session + // releases. The envelope AAD is keyed by the vault OWNER (each vault's + // objects were encrypted with aad(operator, owner, service, epoch) at + // store time), so the owner drives both the key and the decrypt. let s3 = s3_for_request(&state.s3, &state.config.region, creds).await; - let resp = s3 - .get_object() - .bucket(&state.config.vault_bucket) - .key(&key) - .send() - .await - .map_err(|e| err_502(e.to_string(), "s3_get"))?; - let body = resp - .body - .collect() - .await - .map_err(|e| err_502(e.to_string(), "s3_body"))? - .into_bytes(); + let owners = fetch_vault_owners(&req.cap.payload.actor_omni, &req.cap.payload.operator_omni); + let mut last_err: Option = None; + for owner in &owners { + let key = s3_key(owner, &req.cap.payload.service); + let resp = match s3 + .get_object() + .bucket(&state.config.vault_bucket) + .key(&key) + .send() + .await + { + Ok(resp) => resp, + Err(e) => { + // NoSuchKey (not stored in this vault) or AccessDenied (the + // relayed STS isn't tagged for this prefix) — try the next + // vault; surface the LAST attempt's error if all miss. + last_err = Some(err_502(e.to_string(), "s3_get")); + continue; + } + }; + let body = resp + .body + .collect() + .await + .map_err(|e| err_502(e.to_string(), "s3_body"))? + .into_bytes(); + + let aad = envelope::aad( + &req.cap.payload.operator_omni, + owner, + &req.cap.payload.service, + req.cap.payload.k3_epoch, + ); + return envelope::decrypt(&state.config.kek_hex_stage1, &body, &aad) + .map_err(|e| err_500(e.to_string(), "envelope_decrypt")); + } + Err(last_err.unwrap_or_else(|| err_502("no vault candidates", "s3_get"))) +} - let aad = envelope::aad( - &req.cap.payload.operator_omni, - &req.cap.payload.actor_omni, - &req.cap.payload.service, - req.cap.payload.k3_epoch, - ); - envelope::decrypt(&state.config.kek_hex_stage1, &body, &aad) - .map_err(|e| err_500(e.to_string(), "envelope_decrypt")) +/// The vaults a fetch may read, in order: the actor's OWN vault first (#228 +/// agent-owned creds — an agent's self-stored entry shadows a same-named +/// delegated one, never the reverse), then the operator's vault when the cap +/// is delegated (actor != operator, the #216 master-provisioned LLM key). +/// Master-self caps (actor == operator) read exactly one vault — unchanged. +fn fetch_vault_owners(actor_omni: &str, operator_omni: &str) -> Vec { + if actor_omni.eq_ignore_ascii_case(operator_omni) { + vec![actor_omni.to_string()] + } else { + vec![actor_omni.to_string(), operator_omni.to_string()] + } } async fn cred_teardown( @@ -443,6 +481,24 @@ fn s3_prefix(actor_omni: &str) -> String { mod tests { use super::*; + #[test] + fn fetch_owners_master_self_reads_one_vault() { + // operator == actor (case-insensitive) → exactly the actor's own vault, + // byte-identical to the pre-#216 behavior. + assert_eq!(fetch_vault_owners("0xAB", "0xab"), vec!["0xAB".to_string()]); + } + + #[test] + fn fetch_owners_delegated_falls_back_to_operator_vault() { + // actor != operator (#216 delegated fetch) → the agent's own vault + // first (#228 agent-owned shadows), then the operator's (the master- + // vaulted key the cap's scope grant authorizes). + assert_eq!( + fetch_vault_owners("0xagent", "0xmaster"), + vec!["0xagent".to_string(), "0xmaster".to_string()] + ); + } + #[test] fn s3_key_format_matches_arch_md_15_1() { // arch.md §15.1: s3://$VAULT_BUCKET/bots//credentials/.enc diff --git a/crates/agentkeys-worker-creds/src/lib.rs b/crates/agentkeys-worker-creds/src/lib.rs index be2343ab..28a6007a 100644 --- a/crates/agentkeys-worker-creds/src/lib.rs +++ b/crates/agentkeys-worker-creds/src/lib.rs @@ -13,7 +13,12 @@ //! 5. AES-256-GCM encrypt/decrypt with `aad = sha256(operator_omni || //! actor_omni || service || k3_epoch)`. //! 6. S3 PUT/GET at `s3://$VAULT_BUCKET/bots//credentials/ -//! .enc` via the worker's IAM identity. +//! .enc` via the worker's IAM identity. A FETCH with a +//! delegated cap (actor != operator, #216) additionally falls back to +//! the OPERATOR's vault when the actor's own has no entry — the master +//! vaulted the key master-self and the cap's verified on-chain +//! `cred:` scope grant is the agent's authorization to fetch +//! it (store/teardown/list stay strictly actor-keyed). //! //! Stage-1 simplification: KEK is injected via env. Stage 2 (#90) //! replaces with mTLS-derived KEK from the signer enclave. diff --git a/docs/arch.md b/docs/arch.md index c2f4deaf..393043fc 100644 --- a/docs/arch.md +++ b/docs/arch.md @@ -284,7 +284,7 @@ Pinned to disambiguate the same value showing up under different labels across c | `OIDC JWT` (= K7) | Per-mint short-lived JWT signed by K2; consumed by `AssumeRoleWithWebIdentity`. Carries `agentkeys_actor_omni` claim → AWS session tag. | `oidc_jwt`, `JWT_A` / `JWT_B` (demo shell vars). | | `cap-token` | The bearer issued by broker authorizing one specific operation (cred-fetch / cred-store / memory-read / audit-append / payment / etc.). Carries K10 sig + K11 assertion (for master mutations) + broker's K1 co-signature. | `cap`, `capability_token`, `op_cap`. | | `credential_kek` | 32-byte AES-256 key for one operator's credentials. Derived as `HKDF-SHA256(salt="agentkeys.kek-salt.v2", ikm=K3_v[epoch], info="agentkeys.user.v1" \|\| actor_omni)`. | `KEK`, `cred_kek`. | -| `credential_envelope` | Wire format of one stored credential: `1B version (0x04) \|\| 1B k3_epoch \|\| 12B nonce \|\| ciphertext \|\| 16B tag`. Stored at `s3://$VAULT_BUCKET/bots//credentials/.enc`. AAD binds `(actor_omni, service)`. | `envelope`, `AEAD blob`, `.enc` (S3 key suffix). | +| `credential_envelope` | Wire format of one stored credential: `1B version (0x04) \|\| 1B k3_epoch \|\| 12B nonce \|\| ciphertext \|\| 16B tag`. Stored at `s3://$VAULT_BUCKET/bots//credentials/.enc`; AAD binds `(operator_omni, vault_owner_omni, service, k3_epoch)`. Store/teardown/list are strictly actor-keyed (vault_owner = the cap's actor). A **fetch with a delegated cap** (actor ≠ operator, #216) tries the actor's own vault first (agent-owned, #228), then **falls back to the operator's vault** — the master vaulted the key master-self and the cap's verified on-chain `cred:` grant is the agent's authorization; the S3 read still runs under the caller-relayed STS, so layer-3 IAM is unchanged (reading the operator prefix needs the operator session the wire context holds). | `envelope`, `AEAD blob`, `.enc` (S3 key suffix). | | `vault_bucket` / `memory_bucket` / `config_bucket` / `audit_bucket` / `email_bucket` / `payment_audit_bucket` | One S3 bucket per data class per §17. Per-actor prefix at `bots//` (config is per-operator + master-only, #201). | `$VAULT_BUCKET`, `$MEMORY_BUCKET`, `$CONFIG_BUCKET`, `$AUDIT_BUCKET`, `$EMAIL_BUCKET`, `$PAYMENT_AUDIT_BUCKET`. | | `policy` / `scope` / `namespace` / `category` / `service` (the authorization vocabulary) | **Distinct pipeline stages, NOT synonyms:** **policy** (human intent, off-chain, `DataClass::Config`) → COMPILE → **scope** (on-chain `(operator, actor, serviceHash)` grant, `AgentKeysScope` §19) over **categories/attributes** (the classifier's tag) → **service** (the signed cap string; for memory `service = memory:`, where **namespace** = the memory category). The unifying unit is the **policy attribute (category)** ([`research/universal-gate-pattern.md`](research/universal-gate-pattern.md) four primitives). Full table + pipeline: [`wiki/policy-scope-namespace.md`](wiki/policy-scope-namespace.md). | Confusions this resolves: "scope" used to mean "namespace" or "policy"; **"tag" = classifier *category*** (≠ the AWS **PrincipalTag** of §17 / [`wiki/tag-based-access.md`](wiki/tag-based-access.md)). | @@ -839,7 +839,7 @@ sequenceDiagram Worker->>Chain: re-verify scope + binding + epoch (defense in depth) Worker->>Sig: derive_cred_kek(operator_omni, k3_epoch) [mTLS] Sig-->>Worker: KEK (32 bytes) - Worker->>Worker: GetObject s3://vault_bucket/bots//credentials/.enc + Worker->>Worker: GetObject s3://vault_bucket/bots//credentials/.enc (delegated cap: falls back to bots//… — the master-vaulted key, #216) Worker->>Worker: AES-256-GCM decrypt under KEK Worker-->>Dmn: plaintext credential Dmn->>Dmn: Cache plaintext (TTL 5 min) From 4ae62c4da255dcfecfea8cc575631f54826ff6fd Mon Sep 17 00:00:00 2001 From: Hanwen Cheng Date: Thu, 11 Jun 2026 14:32:25 +0800 Subject: [PATCH 04/17] =?UTF-8?q?fix:=20stage-1=20agent-create=20self-heal?= =?UTF-8?q?s=20a=20=C2=A710.2=20sandbox-paired=20(keyless)=20agent=20file?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit heima-agent-create.sh read .agent_private_key and fed it straight to `cast wallet sign`. When the file is a §10.2 sandbox-paired record (key_custody=sandbox-only, agent_private_key:null — written by the wire phase's in-sandbox device-session), AGENT_KEY became the literal 'null' → 'Error: Failed to decode private key' at stage-1 step 12. The wire phase and stage-1 share the 'demo-agent' label, so any operator who runs the wire then re-runs stage 1 hits this. Now: validate the key shape (0x+64hex or bare 64hex); if unusable (sandbox custody / corrupt / legacy), back up the file to