diff --git a/CHANGELOG.md b/CHANGELOG.md index 2f03e44..8db5361 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -13,6 +13,8 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The ### Added +- **Chat-with-multimodal example.** `examples/11-chat-with-multimodal/` demonstrates `ChatPrompt` + `PlaceholderSegment` (proposal 0046) end-to-end: a four-turn lunar-mission Q&A conversation with conversation memory threaded through state, one mid-conversation turn attaching a photograph via `ImageURLBlockTemplate`, the agent processing the multimodal turn naturally without changing the chat-history shape. Complementary to example 09 (tool use); chat history threading and tool calling are separate primitives. +- **`docs/examples/index.md` catalog now lists example 10.** A pre-existing gap (the Langfuse-observability example was missing from the catalog) caught and fixed alongside the example 11 entry. - **PyPI + spec-version shields on the docs homepage.** `docs/index.md` now carries dynamic shields for the published PyPI version and the pinned spec version, sourced from `img.shields.io`. Both auto-update on every publish or spec bump; no maintenance burden. Mirrors the same shield URLs the README already uses. - **vLLM production deployment notes.** `docs/model-providers/vllm.md` grows a "Production deployment" section covering the `VLLM_HTTP_TIMEOUT_KEEP_ALIVE` gotcha (vLLM's stock 5s uvicorn keep-alive lapses pooled OA-side httpx connections and surfaces as `ProviderUnavailable`; widen to roughly 300s), a systemd unit skeleton, and the three throughput knobs that interact with OA's shared connection pool (`--max-model-len`, `--max-num-seqs`, `--gpu-memory-utilization`). The existing "Tool calling" section grows a `--tool-call-parser` family table verified against vLLM's docs (Llama 3.x / Llama 4 / Mistral / Hermes / Qwen3 / DeepSeek V3 / GPT-OSS), plus explicit "not supported here" callouts for Anthropic / Gemini (proprietary cloud) and mainstream Gemma (no vLLM parser). - **Three new patterns docs.** `docs/patterns/state-migration-on-resume.md`, `docs/patterns/caller-supplied-trace-identifiers.md`, and `docs/patterns/observer-state-reconciliation.md` graduate the corresponding entries from `docs/agent/non-obvious-shapes.md` into full pattern recipes with code snippets and "when this is right / when it isn't" guidance. The programmatic patterns API (`openarmature.patterns.list()` / `get(name)`) grows from 4 to 7 entries. diff --git a/docs/examples/11-chat-with-multimodal.md b/docs/examples/11-chat-with-multimodal.md new file mode 100644 index 0000000..ae8eb13 --- /dev/null +++ b/docs/examples/11-chat-with-multimodal.md @@ -0,0 +1,160 @@ +# 11 - Chat with multi-turn memory and a multimodal turn + +A lunar-mission Q&A assistant that maintains conversation context +across four turns. One mid-conversation turn includes an attached +photograph (Apollo 16 Lunar Module "Orion" on the lunar surface): +the user asks about it, the agent processes the multimodal turn +naturally without changing the chat-history shape. + +## Overview + +The user has a four-turn conversation with the assistant. Turns 1, +2, and 4 are text-only; turn 3 attaches a photograph and asks the +agent to describe it. Throughout the conversation, the agent +maintains memory: turn 2 references "it" from turn 1, turn 4 +references "the LM you described" from turn 3. + +The whole thing rides on one `ChatPrompt` template: + +- A `ContentSegment(role="system", ...)` holds the assistant's + persona and response style. +- A `PlaceholderSegment(placeholder="history")` is the slot where + the caller injects the prior conversation. +- A trailing `ContentSegment(role="user", ...)` carries the current + turn's question. For text-only turns its `content` is a string; + for the multimodal turn its `content` is a list of content-block + templates (`TextBlockTemplate` + `ImageURLBlockTemplate`). + +Chat history lives on state as `Annotated[list[Message], append]`. +After each turn the `respond` node appends two messages to history +(the rendered user turn + the assistant response), and the next +turn's `render()` injects the grown history into the placeholder. + +## What it teaches + +- [`ChatPrompt`](../concepts/prompts.md) with + [`ContentSegment`](../concepts/prompts.md) and + [`PlaceholderSegment`](../concepts/prompts.md) (proposal 0046, + spec v0.38.0). The placeholder is how multi-turn chat history + shapes get injected at render time. +- The same chat template can carry an + [`ImageURLBlockTemplate`](../concepts/prompts.md) when the + current user turn includes an image. The `content` field on the + user `ContentSegment` switches between a single `str` (text-only) + and a `list[ContentBlockTemplate]` (multimodal); the system and + placeholder segments are identical across both shapes. +- [`PromptManager.render(prompt, placeholders={"history": + state.history})`](../reference/prompts.md) injects the message + list at the placeholder slot. An empty list is valid (first-turn + case); the rendered messages become just + `[system, current_user_turn]` with no prior history. +- Multi-turn memory threaded through state via the `append` + reducer. Each `respond` call appends `[current_user_message, + assistant_response]` to history; reading history on the next turn + produces the running conversation. +- The graph is a single `respond` node with a conditional edge that + loops back to itself until the script-supplied user turns are + exhausted, then routes to `END`. The cycle is + [`respond → respond → respond → … → END`](../concepts/graphs.md). +- Complementary to [example 09 (tool use)](09-tool-use.md): chat + history threading and tool calling are separate primitives. + Example 09 shows the LLM emitting tool calls and the framework + dispatching them; this example shows how the prompt-management + layer composes a multi-turn conversation. A production chat agent + often combines both. + +## How to run + +```bash +uv sync --group examples --all-extras + +# Clean conversation output only (default). +LLM_API_KEY=sk-... uv run python examples/11-chat-with-multimodal/main.py + +# With OTel JSON spans streaming to stderr alongside the chat. +LLM_API_KEY=sk-... uv run python examples/11-chat-with-multimodal/main.py --traces +``` + +`LLM_MODEL` must point at a vision-capable model. The default +(`gpt-4o-mini`) qualifies. For a different image, set `IMAGE_URL` +to any publicly-reachable image URL. + +The conversation streams to stdout as each turn completes (a small +visual delay between turns lets the human reader follow along). The +`--traces` flag opts in to the OTel observer with a console +exporter; without it the chat runs without any observer attached. +Example 03 owns the observer-hooks story end-to-end; this example's +headline is the chat shape, not the observability wiring. + +The demo is illustrative only: it runs four pre-scripted user turns +sequentially in one process. A real chat-server runtime would +manage one invocation per turn with the chat history persisted +across sessions (e.g., via a checkpointer keyed on session_id); +that's [example 08 (checkpointing)](08-checkpointing-and-migration.md)'s +territory, combined with this one's chat shape. + +## The graph + +```mermaid +flowchart TD + start([start]) + respond[respond] + stop([end]) + + start --> respond + respond -->|more user turns scripted| respond + respond -->|user turns exhausted| stop +``` + +`route_after_respond` returns `"respond"` while +`state.next_turn_index < len(state.user_turns)` and `END` otherwise. +Each loop iteration renders the current chat template, calls the +LLM, and updates state. + +## Reading the output + +``` +=== openarmature chat-with-multimodal demo === +Image URL: https://images-assets.nasa.gov/image/as16-113-18334/... +Scripted turns: 4 + +--- Turn 0 --- +USER: What was the primary objective of Apollo 11? +ASSISTANT: The primary objective of Apollo 11 was to perform a +manned lunar landing and safely return the crew to Earth ... + +--- Turn 1 --- +USER: And what year did it launch? +ASSISTANT: Apollo 11 launched on July 16, 1969. + +--- Turn 2 [+image] --- +USER: I have a photograph of the Lunar Module. What's +distinctive about its design? +ASSISTANT: The Apollo Lunar Module had a distinctive two-stage, +spider-like configuration ... + +--- Turn 3 --- +USER: Given what you described about the LM, was that design +reused on later Apollo missions? +ASSISTANT: Yes, the same basic LM design was used on Apollo 12 +through 17 ... + +=== history length: 8 messages (4 user/assistant turns) === +``` + +- **Turn 1 builds on turn 0** without you having to re-mention + Apollo 11. The history placeholder injected the prior `[user_0, + assistant_0]` pair, so the model sees the question "what year did + it launch" in context. +- **Turn 2 is the multimodal one** (`[+image]` tag in the trace). + The user `ContentSegment` for this turn carries + `[TextBlockTemplate(text=...), ImageURLBlockTemplate(url=...)]` + instead of a plain string; the model receives both blocks in one + user message and answers about the image. +- **Turn 3 references "the LM you described"** from turn 2. The + history at this point contains all six prior messages (system is + not in history; it comes from the template every render). The + model carries the multimodal context forward without you having + to re-attach the image. +- **History length 8 = 4 (user, assistant) pairs.** No system + message in history; the template adds it on every render. diff --git a/docs/examples/index.md b/docs/examples/index.md index 51fe262..75f316e 100644 --- a/docs/examples/index.md +++ b/docs/examples/index.md @@ -43,6 +43,14 @@ in the repo. - [**09 - Tool use**](09-tool-use.md). Lunar-mission assistant that calls local Python tools to answer questions mixing fact recall and physics arithmetic. +- [**10 - Langfuse observability**](10-langfuse-observability.md). + Send LLM-call observability natively to Langfuse with a prompt- + linkage demonstration on a mission-briefing Q&A pipeline. +- [**11 - Chat with multimodal**](11-chat-with-multimodal.md). Four- + turn lunar-mission conversation with conversation memory threaded + through `ChatPrompt` + `PlaceholderSegment`. One turn attaches a + photograph; the agent processes it without changing the chat + shape. ## Configuration diff --git a/examples/11-chat-with-multimodal/main.py b/examples/11-chat-with-multimodal/main.py new file mode 100644 index 0000000..8218ea5 --- /dev/null +++ b/examples/11-chat-with-multimodal/main.py @@ -0,0 +1,532 @@ +"""openarmature demo: multi-turn chat with conversation memory and a +multimodal turn, using ChatPrompt + PlaceholderSegment. + +**Use case:** Lunar mission Q&A assistant that maintains conversation +context across four turns. Turn 3 includes an attached photograph +(e.g., a Lunar Module on the surface): the user asks about it, the +agent processes the multimodal turn naturally without changing the +chat-history shape. Turns 1, 2, 4 are text-only. + +**Demonstrates:** ChatPrompt + ContentSegment (system + user) + +PlaceholderSegment for chat-history injection (proposal 0046, +spec v0.38.0). PromptManager.render with the `placeholders` kwarg. +Multi-turn message threading through state with the `append` +reducer; the conversation history grows over turns and feeds back +into render() on each turn. The same chat template carries an +optional ImageURLBlockTemplate when the user's current turn includes +an image (lunar mission photograph), so multimodal turns work +without bespoke handling. Complementary to example 09 (tool calling) +which exercises a different LLM-side primitive entirely. + +**What's interesting in the implementation:** + +- The chat template is built per-turn by `_build_chat_prompt(...)`, + which switches the user `ContentSegment.content` between a single + text template (text-only turn) and a `[TextBlockTemplate, + ImageURLBlockTemplate]` list (multimodal turn). The system segment + and the `PlaceholderSegment(placeholder="history")` slot are identical + across both shapes; only the trailing user segment changes. +- Chat history lives on state as `history: Annotated[list[Message], + append]`. After each turn the node appends two messages (the new + user turn that just rendered + the assistant response) so the + next turn's render() sees the full prior conversation. +- `PromptManager.render(prompt, placeholders={"history": state.history})` + injects the message list at the placeholder slot. An empty + list is valid (first-turn case): the rendered messages become + just `[system, current_user_turn]` with no prior history. +- The graph is a single `respond` node with a conditional edge that + loops back to itself until the script-supplied user turns are + exhausted, then routes to END. Each loop iteration renders the + current chat template, calls the LLM, and updates state. +- `LangfusePromptBackend` is intentionally not used here: chat + history threading is the headline demonstration, not prompt + backend complexity. Example 07 owns the multi-backend prompt + story (filesystem primary + fallback); example 10 owns the + Langfuse-backend integration. +- Error handling at the invoke() boundary. `main()` catches + `NodeException` (the graph engine's wrapper) and inspects + `exc.__cause__` (Python's standard exception chain) for + `LlmProviderError` to surface the canonical category + (`provider_rate_limit`, `provider_invalid_request`, etc.) in the + error message. The image URL failure mode (OpenAI's + fetcher hitting a CDN that blocks it) lands here as + `provider_invalid_request`. Three legitimate places to handle + this in production: caller-side `try / except NodeException` + (shown here), `RetryMiddleware` wrapping the respond node for + transient categories, or a `try / except LlmProviderError` + inside the node body returning a fallback response. + +**Configuration** (env vars; OpenAI defaults shown): + +- ``LLM_BASE_URL`` defaults to ``https://api.openai.com``. Host root only. +- ``LLM_MODEL`` defaults to ``gpt-4o-mini`` (a vision-capable model + needed for the multimodal turn). +- ``LLM_API_KEY`` required (empty for local servers that don't + authenticate, but the model MUST support vision blocks). +- ``IMAGE_URL`` overrides the default image URL. Default is a + public-domain NASA photograph of the Apollo 16 Lunar Module + "Orion" on the lunar surface, served from NASA's images-assets + archive. OpenAI's vision pipeline downloads the image; some hosts + (e.g., upload.wikimedia.org) block its fetcher with a + ProviderInvalidRequest. images-assets.nasa.gov is known to work. + +Run with: + + uv sync --group examples --all-extras + + # Clean conversation output only (default). + LLM_API_KEY=sk-... uv run python examples/11-chat-with-multimodal/main.py + + # With OTel JSON spans streaming to stderr alongside the chat. + LLM_API_KEY=sk-... uv run python examples/11-chat-with-multimodal/main.py --traces + +(``--all-extras`` pulls in ``opentelemetry-sdk`` for the OTel observer.) +The conversation transcript streams to stdout as each turn closes, +with a short visual delay between turns (~``_TURN_DELAY_S``). Pass +``--traces`` to also see the OTel observer attached and node + LLM +spans dumped to stderr; the OTel side is optional supporting +infrastructure, not the headline of this example (example 03 owns +the observer-hooks story). + +The demo is illustrative only: it runs four pre-scripted user turns +sequentially in one process. A real chat-server runtime would +manage one invocation per turn with the chat history persisted +across sessions (e.g., via a checkpointer keyed on session_id); +that's example 08's territory, combined with this one's chat shape. +""" + +from __future__ import annotations + +import argparse +import asyncio +import os +from dataclasses import dataclass +from datetime import UTC, datetime +from typing import Annotated, Any + +from opentelemetry.sdk.resources import Resource +from opentelemetry.sdk.trace.export import ConsoleSpanExporter, SimpleSpanProcessor +from pydantic import Field + +from openarmature.graph import ( + END, + EndSentinel, + GraphBuilder, + NodeException, + State, + append, +) +from openarmature.llm import ( + AssistantMessage, + LlmProviderError, + Message, + OpenAIProvider, + RuntimeConfig, + UserMessage, +) +from openarmature.observability.otel import OTelObserver +from openarmature.prompts import ( + ChatPrompt, + ContentBlockTemplate, + ContentSegment, + ImageURLBlockTemplate, + PlaceholderSegment, + Prompt, + PromptManager, + TextBlockTemplate, +) + +# --------------------------------------------------------------------------- +# Defaults +# --------------------------------------------------------------------------- + +# Default image: NASA public-domain photograph of the Apollo 16 Lunar +# Module "Orion" parked on the lunar surface during the first EVA, +# served from NASA's official images-assets archive (the canonical +# NASA media library). +# +# Important: OpenAI's vision pipeline downloads the image from this +# URL during the chat completion call. Some CDNs (notably +# ``upload.wikimedia.org``) block OpenAI's image fetcher and return a +# ``ProviderInvalidRequest`` from the API. ``images-assets.nasa.gov`` +# is known to work; if you override ``IMAGE_URL``, point at a host +# that allows OpenAI's user agent. +DEFAULT_IMAGE_URL = "https://images-assets.nasa.gov/image/as16-113-18334/as16-113-18334~orig.jpg" + + +# --------------------------------------------------------------------------- +# Provider (lazy-init) +# --------------------------------------------------------------------------- + +_provider_instance: OpenAIProvider | None = None + + +def _get_provider() -> OpenAIProvider: + global _provider_instance + if _provider_instance is None: + _provider_instance = OpenAIProvider( + base_url=os.environ.get("LLM_BASE_URL", "https://api.openai.com"), + model=os.environ.get("LLM_MODEL", "gpt-4o-mini"), + api_key=os.environ.get("LLM_API_KEY") or None, + ) + return _provider_instance + + +# --------------------------------------------------------------------------- +# User turn shape (script-driven) +# --------------------------------------------------------------------------- +# Each scripted turn is a question with an optional image URL. The +# multimodal turn supplies an image_url; text-only turns leave it None. + + +@dataclass(frozen=True) +class UserTurn: + text: str + image_url: str | None = None + + +# --------------------------------------------------------------------------- +# Chat prompt construction +# --------------------------------------------------------------------------- +# A small in-process function rather than a backend. The point of this +# example is the placeholder + segment shape, not backend wiring (07 +# covers FilesystemPromptBackend; 10 covers Langfuse). A real +# deployment would either: +# - fetch the chat template from LangfusePromptBackend, or +# - load it from a FilesystemPromptBackend chat-prompt sidecar once +# the backend grows chat support (the current filesystem backend +# only emits TextPrompt). + +_SYSTEM_INSTRUCTIONS = ( + "You are a lunar-mission expert assistant. Answer questions about " + "Apollo and Artemis missions concisely and factually. When the user " + "attaches an image, describe what you see in the image and connect it " + "to the mission context the user provided. Keep responses to " + "two or three sentences." +) + +# Stable build-time stamp for the inline-constructed prompt. ``fetched_at`` +# is meaningful for prompts pulled from a remote backend (when did we last +# sync); for the inline-built prompt in this demo it's just "process +# startup" so a constant is more honest than ``datetime.now()`` per turn. +_PROMPT_BUILT_AT = datetime.now(UTC) + + +def _build_chat_prompt(text: str, image_url: str | None) -> ChatPrompt: + """Build the chat template for one turn. + + System and history-placeholder segments are identical across turn + shapes; only the trailing user segment changes: + + - Text-only turn: ``ContentSegment(role="user", content=text)``. + - Multimodal turn: ``ContentSegment(role="user", + content=[TextBlockTemplate, ImageURLBlockTemplate])``. + + Constructing the template per-turn keeps the example self-contained; + a production deployment would fetch a versioned template from a + PromptBackend and pass the image_url through variables instead. + """ + user_content: str | list[ContentBlockTemplate] + if image_url is not None: + user_content = [ + TextBlockTemplate(text=text), + ImageURLBlockTemplate(url=image_url), + ] + else: + user_content = text + return ChatPrompt( + name="lunar-chat", + version="v1", + label="production", + template_hash="sha256:lunar-chat-v1", + fetched_at=_PROMPT_BUILT_AT, + chat_template=[ + ContentSegment(role="system", content=_SYSTEM_INSTRUCTIONS), + PlaceholderSegment(placeholder="history"), + ContentSegment(role="user", content=user_content), + ], + ) + + +# --------------------------------------------------------------------------- +# Prompt manager +# --------------------------------------------------------------------------- +# ``PromptManager.render(prompt, ...)`` accepts a ``Prompt`` directly, so +# the example calls render() with the inline-built ChatPrompt rather +# than round-tripping through a backend's fetch(). The manager +# constructor requires at least one backend, so a no-op stub satisfies +# the contract without participating in execution. Production +# deployments would supply a real backend (LangfusePromptBackend etc.) +# and call ``manager.fetch(name, label)`` to retrieve the versioned +# prompt before rendering. + + +class _NoFetchBackend: + """Stub backend purely to satisfy PromptManager's constructor. + + The example constructs ChatPrompt objects inline (see + ``_build_chat_prompt``) and calls ``manager.render()`` directly, so + ``fetch()`` is never invoked. + """ + + async def fetch(self, name: str, label: str = "production") -> Prompt: + raise NotImplementedError("example constructs prompts inline; fetch not used") + + +_PROMPT_MANAGER = PromptManager(_NoFetchBackend()) + + +# --------------------------------------------------------------------------- +# State +# --------------------------------------------------------------------------- +# ``history`` is the conversation memory: the running list of user + +# assistant Message pairs from all prior turns. Declared with the +# ``append`` reducer so each respond-node update concatenates the two +# new messages (current user turn + assistant response) rather than +# overwriting prior history. +# +# ``user_turns`` is the pre-scripted list of turns the demo runs; +# ``next_turn_index`` advances by one per respond call. In a real +# chat server this would not be on state; turns arrive one per +# invocation rather than as a pre-scripted batch. Keeping the +# scripted shape here lets the demo run end-to-end without an +# interactive prompt. + + +class ChatState(State): + user_turns: list[UserTurn] + next_turn_index: int = 0 + history: Annotated[list[Message], append] = Field(default_factory=list[Message]) + + +# Visual pacing between turns when printing the transcript. Tiny +# delay so the human reader can follow the conversation as it +# arrives rather than seeing the full thing dump at once; tune via +# the constant rather than per-turn. +_TURN_DELAY_S = 0.5 + + +# --------------------------------------------------------------------------- +# Nodes +# --------------------------------------------------------------------------- + + +async def respond(state: ChatState) -> dict[str, Any]: + """Render the chat template for the current turn, call the LLM, + append both the new user message and the assistant response to + history. + """ + turn = state.user_turns[state.next_turn_index] + + # Build a fresh ChatPrompt per turn (text-only or multimodal) and + # render directly through the manager; no fetch round-trip needed + # since we have the Prompt in hand. + prompt = _build_chat_prompt(turn.text, turn.image_url) + rendered = _PROMPT_MANAGER.render( + prompt, + variables={}, + placeholders={"history": state.history}, + ) + + response = await _get_provider().complete( + rendered.messages, + config=RuntimeConfig(temperature=0.0, max_tokens=400), + ) + + # The rendered messages include [system, *history, current_user] + # for THIS chat_template shape. ``rendered.messages[-1]`` is the + # current user turn because the user ContentSegment is the last + # segment in ``_build_chat_prompt``'s template; if the template + # ever grows a trailing assistant or system segment, this index + # has to move. Append (current_user, assistant_response) to + # history so the next turn sees the full conversation. The system + # message is part of the template, not part of history. + current_user_message = rendered.messages[-1] + assert isinstance(current_user_message, UserMessage), ( + "expected rendered messages to end with the new user turn" + ) + + # Print the turn immediately so the conversation streams to the + # reader as the graph executes; otherwise the chat would only + # appear after invoke() returns. Side effects inside a node body + # are fine; the alternative (a custom observer reacting to + # ``completed`` events) would be more "OA-native" but adds + # boilerplate that distracts from this example's headline. + print(_format_turn(state.next_turn_index, turn, response.message)) + await asyncio.sleep(_TURN_DELAY_S) + + return { + "next_turn_index": state.next_turn_index + 1, + "history": [current_user_message, response.message], + } + + +# Single cap for both user text and assistant response in the trace +# transcript. Keeps the printout scannable without privileging one +# side; either both sides truncate or neither. +_TRANSCRIPT_LINE_CAP = 240 + + +def _truncate(s: str, cap: int = _TRANSCRIPT_LINE_CAP) -> str: + if len(s) <= cap: + return s + return s[: cap - 3] + "..." + + +def _format_turn(turn_index: int, turn: UserTurn, assistant: AssistantMessage) -> str: + image_tag = " [+image]" if turn.image_url is not None else "" + user_short = _truncate(turn.text) + assistant_short = _truncate(assistant.content or "") + return f"\n--- Turn {turn_index}{image_tag} ---\nUSER: {user_short}\nASSISTANT: {assistant_short}" + + +def route_after_respond(state: ChatState) -> str | EndSentinel: + """Loop back for the next turn or exit when the scripted turns run out.""" + if state.next_turn_index < len(state.user_turns): + return "respond" + return END + + +# --------------------------------------------------------------------------- +# Graph +# --------------------------------------------------------------------------- + + +def build_graph(): + return ( + GraphBuilder(ChatState) + .add_node("respond", respond) + .add_conditional_edge("respond", route_after_respond) + .set_entry("respond") + .compile() + ) + + +# --------------------------------------------------------------------------- +# Observer (console) +# --------------------------------------------------------------------------- +# OTel observer with a console exporter emits one span per node +# boundary. Inside the respond node, the LLM provider emits the +# ``openarmature.llm.complete`` span carrying the GenAI semconv +# attributes (gen_ai.system, model, usage tokens) plus, per turn, the +# prompt identity if the manager's ``with_active_prompt`` scope is +# active. The demo runs without that scope wrapping to keep the +# loop tight. + + +def _build_observer() -> OTelObserver: + exporter = ConsoleSpanExporter() + processor = SimpleSpanProcessor(exporter) + return OTelObserver( + span_processor=processor, + resource=Resource.create({"service.name": "openarmature-chat-multimodal"}), + ) + + +# --------------------------------------------------------------------------- +# Scripted conversation +# --------------------------------------------------------------------------- +# Four turns: a factual opener, a follow-up that depends on the first +# answer, a multimodal turn with an image, and a closing follow-up. +# The multimodal turn intentionally references "the image you just +# saw" in the next turn to confirm conversation memory carries the +# multimodal context across turns. + + +def _scripted_turns(image_url: str) -> list[UserTurn]: + return [ + UserTurn(text="What was the primary objective of Apollo 11?"), + UserTurn(text="And what year did it launch?"), + UserTurn( + text=("I have a photograph of the Lunar Module. What's distinctive about its design?"), + image_url=image_url, + ), + UserTurn( + text=("Given what you described about the LM, was that design reused on later Apollo missions?"), + ), + ] + + +# --------------------------------------------------------------------------- +# main +# --------------------------------------------------------------------------- + + +def _parse_args() -> argparse.Namespace: + parser = argparse.ArgumentParser( + description=( + "Multi-turn chat demo with a multimodal turn. " + "Conversation streams to stdout as each turn completes." + ) + ) + parser.add_argument( + "--traces", + action="store_true", + help=( + "Attach the OTel observer with a console exporter so node + LLM spans " + "stream to stderr as JSON. Off by default for a cleaner first-read; " + "turn on to see the observability shape end-to-end." + ), + ) + return parser.parse_args() + + +async def main() -> None: + args = _parse_args() + image_url = os.environ.get("IMAGE_URL", DEFAULT_IMAGE_URL) + + graph = build_graph() + if args.traces: + graph.attach_observer(_build_observer()) + + initial = ChatState(user_turns=_scripted_turns(image_url)) + + print("=== openarmature chat-with-multimodal demo ===") + print(f"Image URL: {image_url}") + print(f"Scripted turns: {len(initial.user_turns)}") + if args.traces: + print("OTel traces: ON (spans stream to stderr as each node closes)") + print() + + # Catch the engine-level wrapper ``NodeException`` at the + # ``invoke()`` boundary. The underlying error is attached via + # Python's standard exception-chaining as ``exc.__cause__``; if + # it's an ``LlmProviderError`` we surface the canonical + # ``.category`` string (``provider_rate_limit``, + # ``provider_invalid_request``, etc.) so the failure mode is + # immediately greppable. This is one of three legitimate places + # to handle the error; see the docstring for the other two + # (``RetryMiddleware`` wrapping the node, ``try/except`` inside + # the node body). + final: ChatState | None = None + try: + final = await graph.invoke(initial) + except NodeException as exc: + cause = exc.__cause__ + if isinstance(cause, LlmProviderError): + category = cause.category + else: + category = type(cause).__name__ if cause is not None else "" + print() + print(f"*** node {exc.node_name!r} failed ({category}): {cause} ***") + print() + print("Three places to handle this in production code:") + print(" - Caller-side try/except NodeException (this example).") + print(" - RetryMiddleware on the node for transient categories.") + print(" - try/except inside the node body returning a fallback.") + finally: + await graph.drain() + await _get_provider().aclose() + + if final is None: + return + + print() + print( + f"=== history length: {len(final.history)} messages " + f"({len(final.history) // 2} user/assistant turns) ===" + ) + + +if __name__ == "__main__": + asyncio.run(main()) diff --git a/mkdocs.yml b/mkdocs.yml index 1432f15..b40ca68 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -141,6 +141,7 @@ nav: - Checkpointing and migration: examples/08-checkpointing-and-migration.md - Tool use: examples/09-tool-use.md - Langfuse observability: examples/10-langfuse-observability.md + - Chat with multimodal: examples/11-chat-with-multimodal.md - Patterns: - patterns/index.md - Parameterized entry point: patterns/parameterized-entry-point.md diff --git a/src/openarmature/AGENTS.md b/src/openarmature/AGENTS.md index e1211ad..23c3b36 100644 --- a/src/openarmature/AGENTS.md +++ b/src/openarmature/AGENTS.md @@ -1465,6 +1465,7 @@ _Runnable example programs shipped in the source tree at `examples/`. The full c - **`examples/08-checkpointing-and-migration/main.py`** — openarmature demo: a lunar-mission planning pipeline that checkpoints its progress, then resumes under an upgraded state schema. - **`examples/09-tool-use/main.py`** — openarmature demo: a lunar-mission assistant that calls local Python functions as tools to answer fact and physics questions about Apollo / Artemis missions. - **`examples/10-langfuse-observability/main.py`** — openarmature demo: Langfuse observer + prompt linkage on a lunar mission Q&A pipeline. +- **`examples/11-chat-with-multimodal/main.py`** — openarmature demo: multi-turn chat with conversation memory and a multimodal turn, using ChatPrompt + PlaceholderSegment. ## Discovery cross-references