diff --git a/CHANGELOG.md b/CHANGELOG.md
index 2f03e44..8db5361 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -13,6 +13,8 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The
 
 ### Added
 
+- **Chat-with-multimodal example.** `examples/11-chat-with-multimodal/` demonstrates `ChatPrompt` + `PlaceholderSegment` (proposal 0046) end-to-end: a four-turn lunar-mission Q&A conversation with conversation memory threaded through state, one mid-conversation turn attaching a photograph via `ImageURLBlockTemplate`, the agent processing the multimodal turn naturally without changing the chat-history shape. Complementary to example 09 (tool use); chat history threading and tool calling are separate primitives.
+- **`docs/examples/index.md` catalog now lists example 10.** A pre-existing gap (the Langfuse-observability example was missing from the catalog) caught and fixed alongside the example 11 entry.
 - **PyPI + spec-version shields on the docs homepage.** `docs/index.md` now carries dynamic shields for the published PyPI version and the pinned spec version, sourced from `img.shields.io`. Both auto-update on every publish or spec bump; no maintenance burden. Mirrors the same shield URLs the README already uses.
 - **vLLM production deployment notes.** `docs/model-providers/vllm.md` grows a "Production deployment" section covering the `VLLM_HTTP_TIMEOUT_KEEP_ALIVE` gotcha (vLLM's stock 5s uvicorn keep-alive lapses pooled OA-side httpx connections and surfaces as `ProviderUnavailable`; widen to roughly 300s), a systemd unit skeleton, and the three throughput knobs that interact with OA's shared connection pool (`--max-model-len`, `--max-num-seqs`, `--gpu-memory-utilization`). The existing "Tool calling" section grows a `--tool-call-parser` family table verified against vLLM's docs (Llama 3.x / Llama 4 / Mistral / Hermes / Qwen3 / DeepSeek V3 / GPT-OSS), plus explicit "not supported here" callouts for Anthropic / Gemini (proprietary cloud) and mainstream Gemma (no vLLM parser).
 - **Three new patterns docs.** `docs/patterns/state-migration-on-resume.md`, `docs/patterns/caller-supplied-trace-identifiers.md`, and `docs/patterns/observer-state-reconciliation.md` graduate the corresponding entries from `docs/agent/non-obvious-shapes.md` into full pattern recipes with code snippets and "when this is right / when it isn't" guidance. The programmatic patterns API (`openarmature.patterns.list()` / `get(name)`) grows from 4 to 7 entries.
diff --git a/docs/examples/11-chat-with-multimodal.md b/docs/examples/11-chat-with-multimodal.md
new file mode 100644
index 0000000..ae8eb13
--- /dev/null
+++ b/docs/examples/11-chat-with-multimodal.md
@@ -0,0 +1,160 @@
+# 11 - Chat with multi-turn memory and a multimodal turn
+
+A lunar-mission Q&A assistant that maintains conversation context
+across four turns. One mid-conversation turn includes an attached
+photograph (Apollo 16 Lunar Module "Orion" on the lunar surface):
+the user asks about it, the agent processes the multimodal turn
+naturally without changing the chat-history shape.
+
+## Overview
+
+The user has a four-turn conversation with the assistant. Turns 1,
+2, and 4 are text-only; turn 3 attaches a photograph and asks the
+agent to describe it. Throughout the conversation, the agent
+maintains memory: turn 2 references "it" from turn 1, turn 4
+references "the LM you described" from turn 3.
+
+The whole thing rides on one `ChatPrompt` template:
+
+- A `ContentSegment(role="system", ...)` holds the assistant's
+  persona and response style.
+- A `PlaceholderSegment(placeholder="history")` is the slot where
+  the caller injects the prior conversation.
+- A trailing `ContentSegment(role="user", ...)` carries the current
+  turn's question. For text-only turns its `content` is a string;
+  for the multimodal turn its `content` is a list of content-block
+  templates (`TextBlockTemplate` + `ImageURLBlockTemplate`).
+
+Chat history lives on state as `Annotated[list[Message], append]`.
+After each turn the `respond` node appends two messages to history
+(the rendered user turn + the assistant response), and the next
+turn's `render()` injects the grown history into the placeholder.
+
+## What it teaches
+
+- [`ChatPrompt`](../concepts/prompts.md) with
+  [`ContentSegment`](../concepts/prompts.md) and
+  [`PlaceholderSegment`](../concepts/prompts.md) (proposal 0046,
+  spec v0.38.0). The placeholder is how multi-turn chat history
+  shapes get injected at render time.
+- The same chat template can carry an
+  [`ImageURLBlockTemplate`](../concepts/prompts.md) when the
+  current user turn includes an image. The `content` field on the
+  user `ContentSegment` switches between a single `str` (text-only)
+  and a `list[ContentBlockTemplate]` (multimodal); the system and
+  placeholder segments are identical across both shapes.
+- [`PromptManager.render(prompt, placeholders={"history":
+  state.history})`](../reference/prompts.md) injects the message
+  list at the placeholder slot. An empty list is valid (first-turn
+  case); the rendered messages become just
+  `[system, current_user_turn]` with no prior history.
+- Multi-turn memory threaded through state via the `append`
+  reducer. Each `respond` call appends `[current_user_message,
+  assistant_response]` to history; reading history on the next turn
+  produces the running conversation.
+- The graph is a single `respond` node with a conditional edge that
+  loops back to itself until the script-supplied user turns are
+  exhausted, then routes to `END`. The cycle is
+  [`respond → respond → respond → … → END`](../concepts/graphs.md).
+- Complementary to [example 09 (tool use)](09-tool-use.md): chat
+  history threading and tool calling are separate primitives.
+  Example 09 shows the LLM emitting tool calls and the framework
+  dispatching them; this example shows how the prompt-management
+  layer composes a multi-turn conversation. A production chat agent
+  often combines both.
+
+## How to run
+
+```bash
+uv sync --group examples --all-extras
+
+# Clean conversation output only (default).
+LLM_API_KEY=sk-... uv run python examples/11-chat-with-multimodal/main.py
+
+# With OTel JSON spans streaming to stderr alongside the chat.
+LLM_API_KEY=sk-... uv run python examples/11-chat-with-multimodal/main.py --traces
+```
+
+`LLM_MODEL` must point at a vision-capable model. The default
+(`gpt-4o-mini`) qualifies. For a different image, set `IMAGE_URL`
+to any publicly-reachable image URL.
+
+The conversation streams to stdout as each turn completes (a small
+visual delay between turns lets the human reader follow along). The
+`--traces` flag opts in to the OTel observer with a console
+exporter; without it the chat runs without any observer attached.
+Example 03 owns the observer-hooks story end-to-end; this example's
+headline is the chat shape, not the observability wiring.
+
+The demo is illustrative only: it runs four pre-scripted user turns
+sequentially in one process. A real chat-server runtime would
+manage one invocation per turn with the chat history persisted
+across sessions (e.g., via a checkpointer keyed on session_id);
+that's [example 08 (checkpointing)](08-checkpointing-and-migration.md)'s
+territory, combined with this one's chat shape.
+
+## The graph
+
+```mermaid
+flowchart TD
+  start([start])
+  respond[respond]
+  stop([end])
+
+  start --> respond
+  respond -->|more user turns scripted| respond
+  respond -->|user turns exhausted| stop
+```
+
+`route_after_respond` returns `"respond"` while
+`state.next_turn_index < len(state.user_turns)` and `END` otherwise.
+Each loop iteration renders the current chat template, calls the
+LLM, and updates state.
+
+## Reading the output
+
+```
+=== openarmature chat-with-multimodal demo ===
+Image URL: https://images-assets.nasa.gov/image/as16-113-18334/...
+Scripted turns: 4
+
+--- Turn 0 ---
+USER:      What was the primary objective of Apollo 11?
+ASSISTANT: The primary objective of Apollo 11 was to perform a
+manned lunar landing and safely return the crew to Earth ...
+
+--- Turn 1 ---
+USER:      And what year did it launch?
+ASSISTANT: Apollo 11 launched on July 16, 1969.
+
+--- Turn 2 [+image] ---
+USER:      I have a photograph of the Lunar Module. What's
+distinctive about its design?
+ASSISTANT: The Apollo Lunar Module had a distinctive two-stage,
+spider-like configuration ...
+
+--- Turn 3 ---
+USER:      Given what you described about the LM, was that design
+reused on later Apollo missions?
+ASSISTANT: Yes, the same basic LM design was used on Apollo 12
+through 17 ...
+
+=== history length: 8 messages (4 user/assistant turns) ===
+```
+
+- **Turn 1 builds on turn 0** without you having to re-mention
+  Apollo 11. The history placeholder injected the prior `[user_0,
+  assistant_0]` pair, so the model sees the question "what year did
+  it launch" in context.
+- **Turn 2 is the multimodal one** (`[+image]` tag in the trace).
+  The user `ContentSegment` for this turn carries
+  `[TextBlockTemplate(text=...), ImageURLBlockTemplate(url=...)]`
+  instead of a plain string; the model receives both blocks in one
+  user message and answers about the image.
+- **Turn 3 references "the LM you described"** from turn 2. The
+  history at this point contains all six prior messages (system is
+  not in history; it comes from the template every render). The
+  model carries the multimodal context forward without you having
+  to re-attach the image.
+- **History length 8 = 4 (user, assistant) pairs.** No system
+  message in history; the template adds it on every render.
diff --git a/docs/examples/index.md b/docs/examples/index.md
index 51fe262..75f316e 100644
--- a/docs/examples/index.md
+++ b/docs/examples/index.md
@@ -43,6 +43,14 @@ in the repo.
 - [**09 - Tool use**](09-tool-use.md). Lunar-mission assistant that
   calls local Python tools to answer questions mixing fact recall and
   physics arithmetic.
+- [**10 - Langfuse observability**](10-langfuse-observability.md).
+  Send LLM-call observability natively to Langfuse with a prompt-
+  linkage demonstration on a mission-briefing Q&A pipeline.
+- [**11 - Chat with multimodal**](11-chat-with-multimodal.md). Four-
+  turn lunar-mission conversation with conversation memory threaded
+  through `ChatPrompt` + `PlaceholderSegment`. One turn attaches a
+  photograph; the agent processes it without changing the chat
+  shape.
 
 ## Configuration
 
diff --git a/examples/11-chat-with-multimodal/main.py b/examples/11-chat-with-multimodal/main.py
new file mode 100644
index 0000000..8218ea5
--- /dev/null
+++ b/examples/11-chat-with-multimodal/main.py
@@ -0,0 +1,532 @@
+"""openarmature demo: multi-turn chat with conversation memory and a
+multimodal turn, using ChatPrompt + PlaceholderSegment.
+
+**Use case:** Lunar mission Q&A assistant that maintains conversation
+context across four turns. Turn 3 includes an attached photograph
+(e.g., a Lunar Module on the surface): the user asks about it, the
+agent processes the multimodal turn naturally without changing the
+chat-history shape. Turns 1, 2, 4 are text-only.
+
+**Demonstrates:** ChatPrompt + ContentSegment (system + user) +
+PlaceholderSegment for chat-history injection (proposal 0046,
+spec v0.38.0). PromptManager.render with the `placeholders` kwarg.
+Multi-turn message threading through state with the `append`
+reducer; the conversation history grows over turns and feeds back
+into render() on each turn. The same chat template carries an
+optional ImageURLBlockTemplate when the user's current turn includes
+an image (lunar mission photograph), so multimodal turns work
+without bespoke handling. Complementary to example 09 (tool calling)
+which exercises a different LLM-side primitive entirely.
+
+**What's interesting in the implementation:**
+
+- The chat template is built per-turn by `_build_chat_prompt(...)`,
+  which switches the user `ContentSegment.content` between a single
+  text template (text-only turn) and a `[TextBlockTemplate,
+  ImageURLBlockTemplate]` list (multimodal turn). The system segment
+  and the `PlaceholderSegment(placeholder="history")` slot are identical
+  across both shapes; only the trailing user segment changes.
+- Chat history lives on state as `history: Annotated[list[Message],
+  append]`. After each turn the node appends two messages (the new
+  user turn that just rendered + the assistant response) so the
+  next turn's render() sees the full prior conversation.
+- `PromptManager.render(prompt, placeholders={"history": state.history})`
+  injects the message list at the placeholder slot. An empty
+  list is valid (first-turn case): the rendered messages become
+  just `[system, current_user_turn]` with no prior history.
+- The graph is a single `respond` node with a conditional edge that
+  loops back to itself until the script-supplied user turns are
+  exhausted, then routes to END. Each loop iteration renders the
+  current chat template, calls the LLM, and updates state.
+- `LangfusePromptBackend` is intentionally not used here: chat
+  history threading is the headline demonstration, not prompt
+  backend complexity. Example 07 owns the multi-backend prompt
+  story (filesystem primary + fallback); example 10 owns the
+  Langfuse-backend integration.
+- Error handling at the invoke() boundary. `main()` catches
+  `NodeException` (the graph engine's wrapper) and inspects
+  `exc.__cause__` (Python's standard exception chain) for
+  `LlmProviderError` to surface the canonical category
+  (`provider_rate_limit`, `provider_invalid_request`, etc.) in the
+  error message. The image URL failure mode (OpenAI's
+  fetcher hitting a CDN that blocks it) lands here as
+  `provider_invalid_request`. Three legitimate places to handle
+  this in production: caller-side `try / except NodeException`
+  (shown here), `RetryMiddleware` wrapping the respond node for
+  transient categories, or a `try / except LlmProviderError`
+  inside the node body returning a fallback response.
+
+**Configuration** (env vars; OpenAI defaults shown):
+
+- ``LLM_BASE_URL`` defaults to ``https://api.openai.com``. Host root only.
+- ``LLM_MODEL`` defaults to ``gpt-4o-mini`` (a vision-capable model
+  needed for the multimodal turn).
+- ``LLM_API_KEY`` required (empty for local servers that don't
+  authenticate, but the model MUST support vision blocks).
+- ``IMAGE_URL`` overrides the default image URL. Default is a
+  public-domain NASA photograph of the Apollo 16 Lunar Module
+  "Orion" on the lunar surface, served from NASA's images-assets
+  archive. OpenAI's vision pipeline downloads the image; some hosts
+  (e.g., upload.wikimedia.org) block its fetcher with a
+  ProviderInvalidRequest. images-assets.nasa.gov is known to work.
+
+Run with:
+
+    uv sync --group examples --all-extras
+
+    # Clean conversation output only (default).
+    LLM_API_KEY=sk-... uv run python examples/11-chat-with-multimodal/main.py
+
+    # With OTel JSON spans streaming to stderr alongside the chat.
+    LLM_API_KEY=sk-... uv run python examples/11-chat-with-multimodal/main.py --traces
+
+(``--all-extras`` pulls in ``opentelemetry-sdk`` for the OTel observer.)
+The conversation transcript streams to stdout as each turn closes,
+with a short visual delay between turns (~``_TURN_DELAY_S``).  Pass
+``--traces`` to also see the OTel observer attached and node + LLM
+spans dumped to stderr; the OTel side is optional supporting
+infrastructure, not the headline of this example (example 03 owns
+the observer-hooks story).
+
+The demo is illustrative only: it runs four pre-scripted user turns
+sequentially in one process. A real chat-server runtime would
+manage one invocation per turn with the chat history persisted
+across sessions (e.g., via a checkpointer keyed on session_id);
+that's example 08's territory, combined with this one's chat shape.
+"""
+
+from __future__ import annotations
+
+import argparse
+import asyncio
+import os
+from dataclasses import dataclass
+from datetime import UTC, datetime
+from typing import Annotated, Any
+
+from opentelemetry.sdk.resources import Resource
+from opentelemetry.sdk.trace.export import ConsoleSpanExporter, SimpleSpanProcessor
+from pydantic import Field
+
+from openarmature.graph import (
+    END,
+    EndSentinel,
+    GraphBuilder,
+    NodeException,
+    State,
+    append,
+)
+from openarmature.llm import (
+    AssistantMessage,
+    LlmProviderError,
+    Message,
+    OpenAIProvider,
+    RuntimeConfig,
+    UserMessage,
+)
+from openarmature.observability.otel import OTelObserver
+from openarmature.prompts import (
+    ChatPrompt,
+    ContentBlockTemplate,
+    ContentSegment,
+    ImageURLBlockTemplate,
+    PlaceholderSegment,
+    Prompt,
+    PromptManager,
+    TextBlockTemplate,
+)
+
+# ---------------------------------------------------------------------------
+# Defaults
+# ---------------------------------------------------------------------------
+
+# Default image: NASA public-domain photograph of the Apollo 16 Lunar
+# Module "Orion" parked on the lunar surface during the first EVA,
+# served from NASA's official images-assets archive (the canonical
+# NASA media library).
+#
+# Important: OpenAI's vision pipeline downloads the image from this
+# URL during the chat completion call.  Some CDNs (notably
+# ``upload.wikimedia.org``) block OpenAI's image fetcher and return a
+# ``ProviderInvalidRequest`` from the API.  ``images-assets.nasa.gov``
+# is known to work; if you override ``IMAGE_URL``, point at a host
+# that allows OpenAI's user agent.
+DEFAULT_IMAGE_URL = "https://images-assets.nasa.gov/image/as16-113-18334/as16-113-18334~orig.jpg"
+
+
+# ---------------------------------------------------------------------------
+# Provider (lazy-init)
+# ---------------------------------------------------------------------------
+
+_provider_instance: OpenAIProvider | None = None
+
+
+def _get_provider() -> OpenAIProvider:
+    global _provider_instance
+    if _provider_instance is None:
+        _provider_instance = OpenAIProvider(
+            base_url=os.environ.get("LLM_BASE_URL", "https://api.openai.com"),
+            model=os.environ.get("LLM_MODEL", "gpt-4o-mini"),
+            api_key=os.environ.get("LLM_API_KEY") or None,
+        )
+    return _provider_instance
+
+
+# ---------------------------------------------------------------------------
+# User turn shape (script-driven)
+# ---------------------------------------------------------------------------
+# Each scripted turn is a question with an optional image URL.  The
+# multimodal turn supplies an image_url; text-only turns leave it None.
+
+
+@dataclass(frozen=True)
+class UserTurn:
+    text: str
+    image_url: str | None = None
+
+
+# ---------------------------------------------------------------------------
+# Chat prompt construction
+# ---------------------------------------------------------------------------
+# A small in-process function rather than a backend.  The point of this
+# example is the placeholder + segment shape, not backend wiring (07
+# covers FilesystemPromptBackend; 10 covers Langfuse).  A real
+# deployment would either:
+#   - fetch the chat template from LangfusePromptBackend, or
+#   - load it from a FilesystemPromptBackend chat-prompt sidecar once
+#     the backend grows chat support (the current filesystem backend
+#     only emits TextPrompt).
+
+_SYSTEM_INSTRUCTIONS = (
+    "You are a lunar-mission expert assistant.  Answer questions about "
+    "Apollo and Artemis missions concisely and factually.  When the user "
+    "attaches an image, describe what you see in the image and connect it "
+    "to the mission context the user provided.  Keep responses to "
+    "two or three sentences."
+)
+
+# Stable build-time stamp for the inline-constructed prompt.  ``fetched_at``
+# is meaningful for prompts pulled from a remote backend (when did we last
+# sync); for the inline-built prompt in this demo it's just "process
+# startup" so a constant is more honest than ``datetime.now()`` per turn.
+_PROMPT_BUILT_AT = datetime.now(UTC)
+
+
+def _build_chat_prompt(text: str, image_url: str | None) -> ChatPrompt:
+    """Build the chat template for one turn.
+
+    System and history-placeholder segments are identical across turn
+    shapes; only the trailing user segment changes:
+
+    - Text-only turn: ``ContentSegment(role="user", content=text)``.
+    - Multimodal turn: ``ContentSegment(role="user",
+      content=[TextBlockTemplate, ImageURLBlockTemplate])``.
+
+    Constructing the template per-turn keeps the example self-contained;
+    a production deployment would fetch a versioned template from a
+    PromptBackend and pass the image_url through variables instead.
+    """
+    user_content: str | list[ContentBlockTemplate]
+    if image_url is not None:
+        user_content = [
+            TextBlockTemplate(text=text),
+            ImageURLBlockTemplate(url=image_url),
+        ]
+    else:
+        user_content = text
+    return ChatPrompt(
+        name="lunar-chat",
+        version="v1",
+        label="production",
+        template_hash="sha256:lunar-chat-v1",
+        fetched_at=_PROMPT_BUILT_AT,
+        chat_template=[
+            ContentSegment(role="system", content=_SYSTEM_INSTRUCTIONS),
+            PlaceholderSegment(placeholder="history"),
+            ContentSegment(role="user", content=user_content),
+        ],
+    )
+
+
+# ---------------------------------------------------------------------------
+# Prompt manager
+# ---------------------------------------------------------------------------
+# ``PromptManager.render(prompt, ...)`` accepts a ``Prompt`` directly, so
+# the example calls render() with the inline-built ChatPrompt rather
+# than round-tripping through a backend's fetch().  The manager
+# constructor requires at least one backend, so a no-op stub satisfies
+# the contract without participating in execution.  Production
+# deployments would supply a real backend (LangfusePromptBackend etc.)
+# and call ``manager.fetch(name, label)`` to retrieve the versioned
+# prompt before rendering.
+
+
+class _NoFetchBackend:
+    """Stub backend purely to satisfy PromptManager's constructor.
+
+    The example constructs ChatPrompt objects inline (see
+    ``_build_chat_prompt``) and calls ``manager.render()`` directly, so
+    ``fetch()`` is never invoked.
+    """
+
+    async def fetch(self, name: str, label: str = "production") -> Prompt:
+        raise NotImplementedError("example constructs prompts inline; fetch not used")
+
+
+_PROMPT_MANAGER = PromptManager(_NoFetchBackend())
+
+
+# ---------------------------------------------------------------------------
+# State
+# ---------------------------------------------------------------------------
+# ``history`` is the conversation memory: the running list of user +
+# assistant Message pairs from all prior turns.  Declared with the
+# ``append`` reducer so each respond-node update concatenates the two
+# new messages (current user turn + assistant response) rather than
+# overwriting prior history.
+#
+# ``user_turns`` is the pre-scripted list of turns the demo runs;
+# ``next_turn_index`` advances by one per respond call.  In a real
+# chat server this would not be on state; turns arrive one per
+# invocation rather than as a pre-scripted batch.  Keeping the
+# scripted shape here lets the demo run end-to-end without an
+# interactive prompt.
+
+
+class ChatState(State):
+    user_turns: list[UserTurn]
+    next_turn_index: int = 0
+    history: Annotated[list[Message], append] = Field(default_factory=list[Message])
+
+
+# Visual pacing between turns when printing the transcript.  Tiny
+# delay so the human reader can follow the conversation as it
+# arrives rather than seeing the full thing dump at once; tune via
+# the constant rather than per-turn.
+_TURN_DELAY_S = 0.5
+
+
+# ---------------------------------------------------------------------------
+# Nodes
+# ---------------------------------------------------------------------------
+
+
+async def respond(state: ChatState) -> dict[str, Any]:
+    """Render the chat template for the current turn, call the LLM,
+    append both the new user message and the assistant response to
+    history.
+    """
+    turn = state.user_turns[state.next_turn_index]
+
+    # Build a fresh ChatPrompt per turn (text-only or multimodal) and
+    # render directly through the manager; no fetch round-trip needed
+    # since we have the Prompt in hand.
+    prompt = _build_chat_prompt(turn.text, turn.image_url)
+    rendered = _PROMPT_MANAGER.render(
+        prompt,
+        variables={},
+        placeholders={"history": state.history},
+    )
+
+    response = await _get_provider().complete(
+        rendered.messages,
+        config=RuntimeConfig(temperature=0.0, max_tokens=400),
+    )
+
+    # The rendered messages include [system, *history, current_user]
+    # for THIS chat_template shape.  ``rendered.messages[-1]`` is the
+    # current user turn because the user ContentSegment is the last
+    # segment in ``_build_chat_prompt``'s template; if the template
+    # ever grows a trailing assistant or system segment, this index
+    # has to move.  Append (current_user, assistant_response) to
+    # history so the next turn sees the full conversation.  The system
+    # message is part of the template, not part of history.
+    current_user_message = rendered.messages[-1]
+    assert isinstance(current_user_message, UserMessage), (
+        "expected rendered messages to end with the new user turn"
+    )
+
+    # Print the turn immediately so the conversation streams to the
+    # reader as the graph executes; otherwise the chat would only
+    # appear after invoke() returns.  Side effects inside a node body
+    # are fine; the alternative (a custom observer reacting to
+    # ``completed`` events) would be more "OA-native" but adds
+    # boilerplate that distracts from this example's headline.
+    print(_format_turn(state.next_turn_index, turn, response.message))
+    await asyncio.sleep(_TURN_DELAY_S)
+
+    return {
+        "next_turn_index": state.next_turn_index + 1,
+        "history": [current_user_message, response.message],
+    }
+
+
+# Single cap for both user text and assistant response in the trace
+# transcript.  Keeps the printout scannable without privileging one
+# side; either both sides truncate or neither.
+_TRANSCRIPT_LINE_CAP = 240
+
+
+def _truncate(s: str, cap: int = _TRANSCRIPT_LINE_CAP) -> str:
+    if len(s) <= cap:
+        return s
+    return s[: cap - 3] + "..."
+
+
+def _format_turn(turn_index: int, turn: UserTurn, assistant: AssistantMessage) -> str:
+    image_tag = " [+image]" if turn.image_url is not None else ""
+    user_short = _truncate(turn.text)
+    assistant_short = _truncate(assistant.content or "")
+    return f"\n--- Turn {turn_index}{image_tag} ---\nUSER:      {user_short}\nASSISTANT: {assistant_short}"
+
+
+def route_after_respond(state: ChatState) -> str | EndSentinel:
+    """Loop back for the next turn or exit when the scripted turns run out."""
+    if state.next_turn_index < len(state.user_turns):
+        return "respond"
+    return END
+
+
+# ---------------------------------------------------------------------------
+# Graph
+# ---------------------------------------------------------------------------
+
+
+def build_graph():
+    return (
+        GraphBuilder(ChatState)
+        .add_node("respond", respond)
+        .add_conditional_edge("respond", route_after_respond)
+        .set_entry("respond")
+        .compile()
+    )
+
+
+# ---------------------------------------------------------------------------
+# Observer (console)
+# ---------------------------------------------------------------------------
+# OTel observer with a console exporter emits one span per node
+# boundary.  Inside the respond node, the LLM provider emits the
+# ``openarmature.llm.complete`` span carrying the GenAI semconv
+# attributes (gen_ai.system, model, usage tokens) plus, per turn, the
+# prompt identity if the manager's ``with_active_prompt`` scope is
+# active. The demo runs without that scope wrapping to keep the
+# loop tight.
+
+
+def _build_observer() -> OTelObserver:
+    exporter = ConsoleSpanExporter()
+    processor = SimpleSpanProcessor(exporter)
+    return OTelObserver(
+        span_processor=processor,
+        resource=Resource.create({"service.name": "openarmature-chat-multimodal"}),
+    )
+
+
+# ---------------------------------------------------------------------------
+# Scripted conversation
+# ---------------------------------------------------------------------------
+# Four turns: a factual opener, a follow-up that depends on the first
+# answer, a multimodal turn with an image, and a closing follow-up.
+# The multimodal turn intentionally references "the image you just
+# saw" in the next turn to confirm conversation memory carries the
+# multimodal context across turns.
+
+
+def _scripted_turns(image_url: str) -> list[UserTurn]:
+    return [
+        UserTurn(text="What was the primary objective of Apollo 11?"),
+        UserTurn(text="And what year did it launch?"),
+        UserTurn(
+            text=("I have a photograph of the Lunar Module. What's distinctive about its design?"),
+            image_url=image_url,
+        ),
+        UserTurn(
+            text=("Given what you described about the LM, was that design reused on later Apollo missions?"),
+        ),
+    ]
+
+
+# ---------------------------------------------------------------------------
+# main
+# ---------------------------------------------------------------------------
+
+
+def _parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(
+        description=(
+            "Multi-turn chat demo with a multimodal turn. "
+            "Conversation streams to stdout as each turn completes."
+        )
+    )
+    parser.add_argument(
+        "--traces",
+        action="store_true",
+        help=(
+            "Attach the OTel observer with a console exporter so node + LLM spans "
+            "stream to stderr as JSON. Off by default for a cleaner first-read; "
+            "turn on to see the observability shape end-to-end."
+        ),
+    )
+    return parser.parse_args()
+
+
+async def main() -> None:
+    args = _parse_args()
+    image_url = os.environ.get("IMAGE_URL", DEFAULT_IMAGE_URL)
+
+    graph = build_graph()
+    if args.traces:
+        graph.attach_observer(_build_observer())
+
+    initial = ChatState(user_turns=_scripted_turns(image_url))
+
+    print("=== openarmature chat-with-multimodal demo ===")
+    print(f"Image URL: {image_url}")
+    print(f"Scripted turns: {len(initial.user_turns)}")
+    if args.traces:
+        print("OTel traces: ON (spans stream to stderr as each node closes)")
+    print()
+
+    # Catch the engine-level wrapper ``NodeException`` at the
+    # ``invoke()`` boundary.  The underlying error is attached via
+    # Python's standard exception-chaining as ``exc.__cause__``; if
+    # it's an ``LlmProviderError`` we surface the canonical
+    # ``.category`` string (``provider_rate_limit``,
+    # ``provider_invalid_request``, etc.) so the failure mode is
+    # immediately greppable.  This is one of three legitimate places
+    # to handle the error; see the docstring for the other two
+    # (``RetryMiddleware`` wrapping the node, ``try/except`` inside
+    # the node body).
+    final: ChatState | None = None
+    try:
+        final = await graph.invoke(initial)
+    except NodeException as exc:
+        cause = exc.__cause__
+        if isinstance(cause, LlmProviderError):
+            category = cause.category
+        else:
+            category = type(cause).__name__ if cause is not None else "<unknown>"
+        print()
+        print(f"*** node {exc.node_name!r} failed ({category}): {cause} ***")
+        print()
+        print("Three places to handle this in production code:")
+        print("  - Caller-side try/except NodeException (this example).")
+        print("  - RetryMiddleware on the node for transient categories.")
+        print("  - try/except inside the node body returning a fallback.")
+    finally:
+        await graph.drain()
+        await _get_provider().aclose()
+
+    if final is None:
+        return
+
+    print()
+    print(
+        f"=== history length: {len(final.history)} messages "
+        f"({len(final.history) // 2} user/assistant turns) ==="
+    )
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
diff --git a/mkdocs.yml b/mkdocs.yml
index 1432f15..b40ca68 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -141,6 +141,7 @@ nav:
     - Checkpointing and migration: examples/08-checkpointing-and-migration.md
     - Tool use: examples/09-tool-use.md
     - Langfuse observability: examples/10-langfuse-observability.md
+    - Chat with multimodal: examples/11-chat-with-multimodal.md
   - Patterns:
     - patterns/index.md
     - Parameterized entry point: patterns/parameterized-entry-point.md
diff --git a/src/openarmature/AGENTS.md b/src/openarmature/AGENTS.md
index e1211ad..23c3b36 100644
--- a/src/openarmature/AGENTS.md
+++ b/src/openarmature/AGENTS.md
@@ -1465,6 +1465,7 @@ _Runnable example programs shipped in the source tree at `examples/`. The full c
 - **`examples/08-checkpointing-and-migration/main.py`** — openarmature demo: a lunar-mission planning pipeline that checkpoints its progress, then resumes under an upgraded state schema.
 - **`examples/09-tool-use/main.py`** — openarmature demo: a lunar-mission assistant that calls local Python functions as tools to answer fact and physics questions about Apollo / Artemis missions.
 - **`examples/10-langfuse-observability/main.py`** — openarmature demo: Langfuse observer + prompt linkage on a lunar mission Q&A pipeline.
+- **`examples/11-chat-with-multimodal/main.py`** — openarmature demo: multi-turn chat with conversation memory and a multimodal turn, using ChatPrompt + PlaceholderSegment.
 
 ## Discovery cross-references