Skip to content

fix(ci): unblock storyboard runner — poll, MCP outputSchema, ref seller boot#443

Open
bokelley wants to merge 8 commits intomainfrom
bokelley/storyboard-poll-fix
Open

fix(ci): unblock storyboard runner — poll, MCP outputSchema, ref seller boot#443
bokelley wants to merge 8 commits intomainfrom
bokelley/storyboard-poll-fix

Conversation

@bokelley
Copy link
Copy Markdown
Contributor

@bokelley bokelley commented May 3, 2026

Summary

Three interlocked fixes to take both storyboard CI jobs from `overall_status: unreachable` to passing. They have to ship together because each one masks the next.

1. CI poll loop falsely declares "ready" on first iteration (`ci.yml`)

When the agent isn't yet listening, `curl -w "%{http_code}"` writes `000` to stdout and exits non-zero. The shell line:

```bash
HTTP_CODE=$(curl ... || echo "000")
```

routes `||` inside the command substitution, so curl's `000` and the fallback `000` concatenate to `"000000"`. `"000000" != "000"` is true → loop breaks immediately with `Seller agent ready (HTTP 000000, pid …)`. The storyboard runner then can't reach the still-cold agent.

Fix: move `||` onto the assignment so the fallback overwrites instead of concatenating.

2. MCP `outputSchema` missing `type: "object"` for union responses (`mcp_tools.py`)

With the poll fix in place, the seller agent comes up — and the storyboard runner's Zod-validated capability discovery rejects 5 tools with `Invalid input: expected "object"` at `tools[N].outputSchema.type`.

Cause: discriminated-union response types (`CreateMediaBuyResponse`, `AcquireRightsResponse`, …) come back from Pydantic as `{"anyOf": [...]}` with no `type` field. MCP requires root-level `type: "object"` on `outputSchema` because it describes `CallToolResult.structuredContent`, which is always a JSON object.

Fix: `schema.setdefault("type", "object")` in `_generate_pydantic_output_schemas`. Every variant inside the union is itself a `type: "object"` schema, so this is semantically equivalent (must be an object AND match a variant) and MCP-conformant. Existing tests that assert `anyOf` survives still pass.

3. v3 reference seller dies on boot — F12 webhook validator (`v3_reference_seller/src/app.py`)

The platform claims `sales-non-guaranteed` specialism, which exposes `create_media_buy` / `sync_creatives` / `update_media_buy` — all webhook-eligible per F12. Without a wired `WebhookSender`/`WebhookDeliverySupervisor`, `validate_webhook_sender_for_platform` raises `AdcpError[INVALID_REQUEST]` at boot.

Fix: pass `auto_emit_completion_webhooks=False`. The reference platform doesn't emit completion webhooks, which is the supported "I handle webhooks manually" code path the validator documents. Adopters whose platforms need webhook delivery wire a sender/supervisor and drop the kwarg.

Test plan

  • Both storyboard jobs (`seller_agent.py` and `v3_reference_seller`) report a real `Agent ready (HTTP 4xx, …)` and progress past capability discovery.
  • `seller_agent.py` storyboard reaches `overall_status: passing`, `controller_detected: true`.
  • `v3_reference_seller` storyboard reaches `overall_status: passing` and traffic counters `media_buy.create > 0` and `creative.upload > 0`.
  • `pytest tests/test_tools_list_output_schema.py` — 5 passed locally; union assertions still hold.

🤖 Generated with Claude Code

bokelley and others added 3 commits May 2, 2026 23:25
When the agent isn't yet listening, ``curl -w "%{http_code}"`` writes
"000" to stdout AND exits non-zero, so ``... || echo "000"`` appended
a second "000" — making HTTP_CODE the string "000000". The
``"$HTTP_CODE" != "000"`` comparison then succeeded on the first
iteration, falsely declaring the agent ready before it had started.
Both storyboard jobs (seller_agent.py and v3_reference_seller) failed
on every run with overall_status=unreachable as a result.

Move ``||`` onto the command-substitution assignment so the fallback
overwrites instead of concatenates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MCP requires ``outputSchema`` to define top-level ``type: "object"``
because it describes ``CallToolResult.structuredContent``, which is
always a JSON object. Pydantic's ``TypeAdapter.json_schema`` for
discriminated-union response types (``CreateMediaBuyResponse``,
``AcquireRightsResponse``, etc.) emits ``{"anyOf": [...]}`` with no
``type`` field — Zod-validated MCP clients reject these tools at
``tools/list``. The storyboard runner's capability discovery flagged
five such tools as ``invalid_value`` on ``outputSchema.type``, taking
the agent to ``overall_status: unreachable``.

Every variant inside the union is itself a Pydantic model rendered as
``type: "object"``, so adding root-level ``type: "object"`` alongside
``anyOf`` is semantically equivalent (must be an object AND match a
variant) and MCP-spec-conformant. Existing union-shape assertions in
``tests/test_tools_list_output_schema.py`` continue to pass — the
``anyOf`` is preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The v3 reference seller's platform claims the ``sales-non-guaranteed``
specialism (which exposes ``create_media_buy``, ``sync_creatives``,
``update_media_buy``) but doesn't wire a ``WebhookSender`` or
``WebhookDeliverySupervisor`` — server-boot
``validate_webhook_sender_for_platform`` raises
``AdcpError[INVALID_REQUEST]`` and the process dies before it
listens, taking storyboard CI to ``overall_status: unreachable``.

Pass ``auto_emit_completion_webhooks=False`` to ``serve()``. The
reference platform doesn't emit completion webhooks, which matches
the supported "I handle webhooks manually" code path the validator
calls out. Adopters whose platforms need webhook delivery wire a
``WebhookSender`` (or ``InMemoryWebhookDeliverySupervisor``) and drop
the kwarg — see the webhook_supervisor module for the wiring pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bokelley bokelley changed the title fix(ci): correct broken curl readiness check in storyboard poll loop fix(ci): unblock storyboard runner — poll, MCP outputSchema, ref seller boot May 3, 2026
bokelley and others added 5 commits May 3, 2026 00:07
``asyncio.run(_bootstrap_schema(engine))`` opens asyncpg connections
on a transient event loop that closes when ``asyncio.run`` returns.
The connections stay in the pool, but asyncpg binds connection-
internal Future objects to the loop they were opened on. uvicorn
then runs on its own loop, and the first request raises
``RuntimeError: got Future attached to a different loop`` —
returning HTTP 500 on the readiness GET and taking storyboard CI
to ``overall_status: unreachable``.

Dispose the engine inside the bootstrap coroutine so the pool is
emptied while the bootstrap loop is still alive. uvicorn opens
fresh connections on its own loop on first use.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Storyboard fixtures from @adcp/sdk's ``delivery_reporting.yaml`` ship
``channels: ["video"]`` for ``outdoor_video_q2`` (and similar legacy
names elsewhere). ``"video"`` isn't in the canonical
``MediaChannelSchema`` enum from schemas/cache/enums/channels.json
(the enum has ``olv``, ``ctv``, ``linear_tv``, etc. — bare
``"video"`` was never a valid value), so the SDK's strict response
validator rejects ``get_products`` and storyboard CI reports the
agent as ``mcp_error: VALIDATION_ERROR[/products/N/channels/0]``.

Filter incoming fixture ``channels`` against the spec enum in
``seed_product`` and drop the field if no values remain. The static
``PRODUCTS`` block doesn't declare ``channels`` either, so the seller
behaves consistently across static and seeded products.

The upstream fixture is genuinely buggy and should be fixed in
@adcp/sdk; this is a defensive normalization on the demo seller side
so storyboard CI keeps moving while that gets sorted upstream.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
``SubdomainTenantMiddleware`` passes the raw Host header to
``router.resolve()``. RFC 7230 makes the header case-insensitive and
lets the client include ``:port``; the Protocol docstring is explicit
that implementations strip the port suffix as needed. The CI
readiness probe sends ``Host: acme.localhost:3001``, but
``SqlSubdomainTenantRouter.resolve`` ran the string verbatim through
``WHERE host == :host`` and never matched the seeded
``acme.localhost`` row — every request 404'd as ``unknown-host``,
storyboard CI reported the agent as ``unreachable``.

Lower-case and strip the port suffix before the cache lookup AND the
DB query so ``ACME.localhost:3001`` resolves the same row as
``acme.localhost``. Adds a regression test that captures the SQL bind
to prove the literal port-suffixed host doesn't reach the WHERE
clause.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
FastMCP's TransportSecurityMiddleware enforces a strict default
``allowed_hosts`` (loopback only — ``127.0.0.1:*``, ``localhost:*``,
``[::1]:*``). Adopters serving multi-tenant subdomain hosts
(``acme.example.com``, ``acme.localhost``) get ``421 Misdirected
Request`` on every MCP request — the storyboard runner reports the
agent as ``unreachable`` because capability discovery never
completes.

Surface the underlying knobs on ``adcp.server.serve.serve`` and
``create_mcp_server``:

* ``allowed_hosts``: extends the FastMCP default (loopback probes
  still work alongside adopter-specified tenant hosts).
* ``allowed_origins``: symmetric, for the Origin header check.
* ``enable_dns_rebinding_protection``: turns the MCP-layer check off
  entirely — the right move for adopters whose outer ASGI middleware
  (e.g. :class:`SubdomainTenantMiddleware`) already validates the
  Host header against a tenant table, so duplicating the check
  against a static allow-list adds operational overhead without a
  security benefit.

Threaded through ``_serve_mcp``, ``_serve_mcp_and_a2a``, and
``_build_mcp_and_a2a_app`` so every transport sees the same wiring.
``adcp.decisioning.serve`` already forwards via ``**serve_kwargs``,
so adopters using the decisioning wrapper pick this up for free.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
``SubdomainTenantMiddleware`` (wired via ``asgi_middleware``) already
validates the Host header against the seeded tenant table — that's
the load-bearing host check for this seller. Without further config,
FastMCP's TransportSecurityMiddleware also rejects any non-loopback
Host (``acme.localhost:3001`` → ``421 Misdirected Request``), and
the storyboard runner reports the agent as ``unreachable`` because
MCP discovery never completes.

Pass ``enable_dns_rebinding_protection=False`` to ``serve()`` so the
MCP-layer check is off and the SubdomainTenantMiddleware stays the
single host-validation point. Adopters that don't run a tenant-aware
ASGI middleware leave the kwarg unset to keep the FastMCP defaults
active.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant