Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The

### Added

- **Production observability example.** `examples/12-production-observability/` demonstrates the production-grade observability stack end-to-end: `OTelObserver` + `LangfuseObserver` attached to the same graph (proposal 0031), `trace_input_from_state` / `trace_output_from_state` caller hooks on the Langfuse observer (proposal 0043 §8.4.1) deriving domain dicts from State, the built-in `TimingMiddleware` recording per-node duration via an `on_complete` callback, and `invoke(metadata={...})` carrying multi-tenant identifiers (tenantId / requestId / featureFlag) that both observers pick up at once. `InMemoryLangfuseClient` + `InMemorySpanExporter` capture in-process so the demo prints what each backend would have ingested without needing real production credentials.
- **Chat-with-multimodal example.** `examples/11-chat-with-multimodal/` demonstrates `ChatPrompt` + `PlaceholderSegment` (proposal 0046) end-to-end: a four-turn lunar-mission Q&A conversation with conversation memory threaded through state, one mid-conversation turn attaching a photograph via `ImageURLBlockTemplate`, the agent processing the multimodal turn naturally without changing the chat-history shape. Complementary to example 09 (tool use); chat history threading and tool calling are separate primitives.
- **`docs/examples/index.md` catalog now lists example 10.** A pre-existing gap (the Langfuse-observability example was missing from the catalog) caught and fixed alongside the example 11 entry.
- **PyPI + spec-version shields on the docs homepage.** `docs/index.md` now carries dynamic shields for the published PyPI version and the pinned spec version, sourced from `img.shields.io`. Both auto-update on every publish or spec bump; no maintenance burden. Mirrors the same shield URLs the README already uses.
Expand Down
190 changes: 190 additions & 0 deletions docs/examples/12-production-observability.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
# 12 - Production observability with dual observers and timing middleware

A single-turn lunar-mission Q&A endpoint instrumented the way you'd
ship it: BOTH OTel and Langfuse observers attached to the same
graph, caller hooks deriving domain-shaped `trace.input` /
`trace.output` from State, the built-in `TimingMiddleware`
recording per-node duration, and multi-tenant caller-supplied
metadata propagating to both observers in one `invoke()` call.

## Overview

One node, one LLM call, two production-grade observability
backends. The pipeline takes a question, calls the LLM, returns the
answer. The interesting part is the observability wiring:

- `OTelObserver` attached with an `InMemorySpanExporter`
(production swaps this for `BatchSpanProcessor` +
`OTLPSpanExporter` pointed at HyperDX / Honeycomb / Tempo / any
OTLP backend).
- `LangfuseObserver` attached with an `InMemoryLangfuseClient`
(production swaps for `LangfuseSDKAdapter(Langfuse(...))`).
- Both observers consume the same `NodeEvent` stream
independently; node code never knows there are two backends.
- `LangfuseObserver` carries `trace_input_from_state` and
`trace_output_from_state` caller hooks (proposal 0043 §8.4.1)
that derive domain dicts like `{"question": ...}` /
`{"answer": ..., "model": ...}` from State, instead of letting
the observer dump the raw State object.
- `TimingMiddleware` (canonical, from
`openarmature.graph.middleware`) wraps the respond node. An
`on_complete` async callback receives a `TimingRecord` and
prints a one-line timing summary; production would queue to a
metrics backend (StatsD, Prometheus pushgateway, OTLP metrics).
- `invoke(metadata={...})` carries `tenantId`, `requestId`, and
`featureFlag` from the call site. Both observers pick them up:
OTel attaches them as `openarmature.user.*` span attributes,
Langfuse merges them as top-level `trace.metadata` keys.

At the end the demo prints what each backend captured so a reader
sees the same logical events represented two ways.

## What it teaches

- **Two observers on one graph**
([proposal 0031 + the no-double-export claim](../concepts/observability.md)).
Each consumes the `NodeEvent` stream independently; the engine
fans events out to all attached observers. Production deployments
often run both: OTel for infrastructure-side correlation
(logs, distributed tracing across services), Langfuse for
LLM-aware generation rendering.
- **Caller hooks for `trace.input` / `trace.output`**
([proposal 0043, observability §8.4.1](../concepts/observability.md)).
Without the hooks the Langfuse observer either omits the field
(`disable_state_payload=True` default) or dumps the raw State
(when `disable_state_payload=False`). The hooks let you return a
domain dict shaped for the Langfuse UI viewer while keeping PII
the operator hasn't audited out of trace payloads.
- **`TimingMiddleware`**
([reference](../reference/graph.md)).
Wraps a node's execution and dispatches a
`TimingRecord(node_name, duration_ms, outcome, exception_category)`
to an async callback when the chain returns or raises. The
callback fires inline before the chain's result reaches the
engine; keep it fast (queue work, defer I/O).
- **`invoke(metadata={...})` propagation across observers**
([proposal 0034 + 0041 reserved keys](../concepts/observability.md)).
One call site, both backends pick it up: OTel attaches each entry
as `openarmature.user.<key>` cross-cutting span attribute,
Langfuse merges as top-level `trace.metadata` keys plus
per-observation metadata.
- **In-memory captures for both backends**
([reference](../reference/observability.md)).
`InMemoryLangfuseClient` records every Trace / Observation;
`InMemorySpanExporter` records every Span. Production
deployments swap each for a real exporter / SDK adapter; the
observer call surface doesn't change.

## How to run

```bash
uv sync --group examples --all-extras
LLM_API_KEY=sk-... uv run python examples/12-production-observability/main.py
```

`LLM_MODEL` defaults to `gpt-4o-mini`. The pipeline is single-turn
and doesn't need vision capability.

The demo prints in three blocks: a header (the question and the
caller-supplied tenant/request/feature-flag), the LLM answer, then
two captured-trace summaries (OTel spans + Langfuse Trace tree).

## Reading the output

Numbers shown below (durations, token counts, UUIDs) are illustrative
and vary per run; the shape is what matters.

```
=== openarmature production-observability demo ===
question: What was the primary objective of Apollo 11?
tenant id: demo-acme
request id: <uuid>
feature flag:v2-canary

[timing] respond: 1234.5ms (success)
answer: The primary objective of Apollo 11 was ...
model: gpt-4o-mini-2024-07-18

--- captured OTel spans ---
[openarmature.invocation] 1240.0ms openarmature.user.tenantId='demo-acme', ...
[respond] 1235.0ms openarmature.node.name='respond', openarmature.user.tenantId='demo-acme', ...
[openarmature.llm.complete] 1200.0ms gen_ai.system='openai', gen_ai.usage.input_tokens=42, ...

--- captured Langfuse trace ---
Trace id=<uuid>
name='respond'
input={'question': 'What was the primary objective of Apollo 11?'}
output={'answer': '...', 'model': 'gpt-4o-mini-2024-07-18'}
metadata={'tenantId': 'demo-acme', 'requestId': '<uuid>', 'featureFlag': 'v2-canary', ...}
[span] 'respond'
input={'question': '...'}
output={'answer': '...', 'model': '...'}
[generation] 'openarmature.llm.complete'
input=[{'role': 'system', ...}, {'role': 'user', ...}]
output='The primary objective of Apollo 11 ...'
model='gpt-4o-mini-2024-07-18'
usage={'input_tokens': 42, 'output_tokens': 38}
```

- **`[timing] respond: 1234.5ms (success)`**: emitted by the
`TimingMiddleware` callback as soon as the respond chain returns.
`outcome` is `"success"` here; a `ProviderRateLimit` would surface
as `outcome="exception"` with `exception_category="provider_rate_limit"`.
- **OTel spans block**: one line per captured span, sorted by
start time. The relevant attributes shown are a curated subset
for readability; the full attribute set is on each `Span` object
for any reader inspecting them programmatically. Note the
`openarmature.user.*` attributes appearing on every span (the
cross-cutting attribute propagation from `invoke(metadata=...)`).
- **Langfuse trace block**: the same invocation as seen by the
Langfuse data model. `trace.input` / `trace.output` come from the
caller hooks (`{"question": ...}` / `{"answer": ..., "model": ...}`)
rather than the raw State. The Observation tree shows
`[span]` for the node and `[generation]` for the LLM call;
production Langfuse renders these as nested cards in the UI.
- **Identical `correlation_id`** (not shown by the formatter but
present in both captures' metadata): the cross-system join key.
Find a slow Generation in Langfuse, grep for the
`correlation_id` in OTel logs, see the surrounding infrastructure
activity.

## Swapping to production backends

```python
# OTel: real OTLP exporter
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import BatchSpanProcessor

otel_observer = OTelObserver(
span_processor=BatchSpanProcessor(
OTLPSpanExporter(
endpoint="https://your-collector/v1/traces",
headers={"authorization": os.environ["OTLP_AUTH"]},
)
),
resource=Resource.create({"service.name": "lunar-briefing"}),
)

# Langfuse: real SDK adapter
from langfuse import Langfuse
from openarmature.observability.langfuse import LangfuseSDKAdapter

langfuse_observer = LangfuseObserver(
client=LangfuseSDKAdapter(
Langfuse(
public_key="pk-lf-...",
secret_key="sk-lf-...",
host="https://cloud.langfuse.com",
)
),
trace_input_from_state=_trace_input,
trace_output_from_state=_trace_output,
disable_llm_payload=False,
)
```

Same observer call surface, real exporters underneath. Node and
graph code don't change. Example 03 shows the OTel-only side at
finer granularity (`force_flush`, log bridging, error handling);
example 10 shows the Langfuse + `LangfusePromptBackend` prompt-linkage
side.
6 changes: 6 additions & 0 deletions docs/examples/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,12 @@ in the repo.
through `ChatPrompt` + `PlaceholderSegment`. One turn attaches a
photograph; the agent processes it without changing the chat
shape.
- [**12 - Production observability**](12-production-observability.md).
Dual OTel + Langfuse observers attached to one graph, caller
hooks deriving domain-shaped `trace.input` / `trace.output` from
State, built-in `TimingMiddleware` recording per-node duration,
multi-tenant caller-supplied metadata propagating to both
observers in one `invoke()` call.

## Configuration

Expand Down
Loading