jmjava · jmjava · Apr 16, 2026 · Apr 16, 2026
diff --git a/issues/embedded-ai-embabel/11-ai-provider-abstraction.md b/issues/embedded-ai-embabel/11-ai-provider-abstraction.md
@@ -0,0 +1,96 @@
+# Issue: AI Provider Abstraction Layer
+
+**Milestone:** 5 — Embedded AI via Embabel & Chatbot Interface
+**Priority:** Critical (prerequisite for all other Embabel work)
+**Depends on:** None
+
+## Summary
+
+Create an `AIProvider` abstraction layer that decouples docgen from direct OpenAI API calls. This enables switching between providers (OpenAI, Embabel via MCP, Ollama for local models) via configuration, and is the prerequisite for all Embabel integration work.
+
+## Background
+
+Today, docgen has three hard-coded OpenAI integration points:
+
+| File | API | Model | Purpose |
+|------|-----|-------|---------|
+| `wizard.py` | `chat.completions.create` | `gpt-4o` (configurable) | Narration generation |
+| `tts.py` | `audio.speech.create` | `gpt-4o-mini-tts` (configurable) | Text-to-speech |
+| `timestamps.py` | `audio.transcriptions.create` | `whisper-1` (hard-coded) | Audio timestamps |
+
+Each creates its own `openai.OpenAI()` client. There is no abstraction layer, no provider switching, and no support for non-OpenAI backends.
+
+## Acceptance Criteria
+
+- [ ] Create `AIProvider` protocol with methods:
+  ```python
+  class AIProvider(Protocol):
+      def chat(self, model: str, messages: list[dict], **kwargs) -> str: ...
+      def tts(self, model: str, voice: str, text: str, instructions: str, output_path: Path) -> Path: ...
+      def transcribe(self, model: str, audio_path: Path, **kwargs) -> dict: ...
+  ```
+- [ ] Implement `OpenAIProvider` wrapping all current direct calls
+- [ ] Implement `EmbabelProvider` that connects via MCP Python SDK to Embabel server
+- [ ] Implement `OllamaProvider` for local model support:
+  - Chat: Ollama REST API (`/api/chat`)
+  - TTS: falls back to OpenAI (Ollama doesn't support TTS)
+  - Transcribe: falls back to OpenAI or local `whisper.cpp`
+- [ ] Factory function: `get_ai_provider(config) -> AIProvider`
+- [ ] Config in `docgen.yaml`:
+  ```yaml
+  ai:
+    provider: openai              # "openai", "embabel", "ollama"
+    embabel_url: http://localhost:8080/sse
+    ollama_url: http://localhost:11434
+    ollama_model: llama3.2
+  ```
+- [ ] Refactor all three call sites to use `AIProvider`:
+  - `wizard.py:generate_narration_via_llm` → `provider.chat(...)`
+  - `tts.py:TTSGenerator.generate` → `provider.tts(...)`
+  - `timestamps.py:TimestampExtractor.extract` → `provider.transcribe(...)`
+- [ ] Backward compatible: no config = default to `openai` with existing behavior
+- [ ] Make `whisper-1` model configurable (currently hard-coded in `timestamps.py`)
+- [ ] Unit tests with mock providers for each implementation
+
+## Technical Notes
+
+### Provider resolution order
+
+1. Explicit `ai.provider` in `docgen.yaml` → use that
+2. `DOCGEN_AI_PROVIDER` environment variable → override
+3. Default → `openai`
+
+### EmbabelProvider sketch
+
+```python
+class EmbabelProvider:
+    def __init__(self, url: str):
+        self.url = url
+        self._client = None  # lazy MCP client
+
+    async def _connect(self):
+        from mcp import ClientSession
+        # Connect to Embabel SSE endpoint
+        ...
+
+    def chat(self, model, messages, **kwargs):
+        # Invoke Embabel NarrationAgent tool via MCP
+        return self._call_tool("generate_narration", {...})
+
+    def tts(self, model, voice, text, instructions, output_path):
+        # Invoke Embabel TTSAgent tool via MCP, or fall back to OpenAI
+        ...
+```
+
+### Narration lint impact
+
+`NarrationLinter.lint_audio` in `narration_lint.py` also uses `TimestampExtractor` indirectly — it will automatically benefit from the provider abstraction without code changes.
+
+## Files to Create/Modify
+
+- **Create:** `src/docgen/ai_provider.py`
+- **Modify:** `src/docgen/wizard.py` (use provider instead of direct openai)
+- **Modify:** `src/docgen/tts.py` (use provider instead of direct openai)
+- **Modify:** `src/docgen/timestamps.py` (use provider instead of direct openai)
+- **Modify:** `src/docgen/config.py` (add `ai` config block)
+- **Create:** `tests/test_ai_provider.py`
diff --git a/issues/embedded-ai-embabel/12-embabel-agent-definitions.md b/issues/embedded-ai-embabel/12-embabel-agent-definitions.md
@@ -0,0 +1,109 @@
+# Issue: Embabel Agent Definitions (JVM Side)
+
+**Milestone:** 5 — Embedded AI via Embabel & Chatbot Interface
+**Priority:** High
+**Depends on:** Issue 11 (provider abstraction)
+
+## Summary
+
+Create the Embabel Spring Boot application that hosts the AI agents for docgen. These agents are exposed as MCP tools that the Python docgen client can invoke for narration generation, TTS orchestration, pipeline management, script generation, and error diagnosis.
+
+## Background
+
+Embabel is a JVM-based agent framework that uses Goal-Oriented Action Planning (GOAP) to dynamically plan action sequences. By defining docgen-specific agents, we get:
+
+- **Planning**: the agent figures out the optimal sequence of steps (e.g., "to produce a demo video, I need to generate narration, then TTS, then compose...")
+- **Tool use**: agents can invoke docgen CLI commands as tools
+- **LLM mixing**: use GPT-4o for narration quality, cheaper models for classification
+- **MCP exposure**: all agents automatically available to Python via MCP protocol
+
+## Acceptance Criteria
+
+- [ ] Create Embabel Spring Boot project:
+  - Option A: `docgen-agent/` subdirectory in this repo
+  - Option B: Companion repo `docgen-agent` (linked from README)
+- [ ] Domain model classes (Kotlin data classes):
+  ```kotlin
+  data class NarrationRequest(val segment: String, val guidance: String, val sources: List<String>)
+  data class NarrationResponse(val text: String, val wordCount: Int)
+  data class TTSRequest(val segment: String, val voice: String, val model: String)
+  data class PipelineRequest(val steps: List<String>, val options: Map<String, Any>)
+  data class ScriptRequest(val testFile: String, val segment: String, val description: String)
+  data class DiagnosisRequest(val errorLog: String, val segment: String, val context: Map<String, Any>)
+  ```
+- [ ] Agent implementations:
+  - `NarrationAgent` — generates/revises narration from source docs + guidance
+  - `TTSAgent` — wraps TTS generation with voice/model selection and preview
+  - `PipelineAgent` — orchestrates multi-step pipeline via GOAP planning
+  - `ScriptAgent` — generates Playwright capture scripts or Manim scene code
+  - `DebugAgent` — analyzes compose/validation errors and suggests fixes
+- [ ] All agent goals exported as MCP tools: `@Export(remote = true)`
+- [ ] LLM configuration:
+  - GPT-4o for narration generation (quality-critical)
+  - Local model via Ollama for simple classification/routing
+  - Configurable in `application.yml`
+- [ ] Docker Compose setup for running Embabel alongside docgen:
+  ```yaml
+  services:
+    docgen-agent:
+      build: ./docgen-agent
+      ports: ["8080:8080"]
+      environment:
+        - OPENAI_API_KEY=${OPENAI_API_KEY}
+        - SPRING_AI_OPENAI_API_KEY=${OPENAI_API_KEY}
+  ```
+- [ ] Integration tests verifying MCP tool discovery and invocation
+- [ ] Health endpoint for connection checking
+
+## Technical Notes
+
+### Agent architecture
+
+Each agent is an `@EmbabelComponent` with `@Action` methods:
+
+```kotlin
+@Agent("Narration generation agent for docgen")
+class NarrationAgent {
+
+    @Action("Generate narration from source documents")
+    @Export(remote = true)
+    fun generateNarration(request: NarrationRequest): NarrationResponse {
+        // LLM call with docgen-specific system prompt
+    }
+
+    @Action("Revise existing narration based on feedback")
+    @Export(remote = true)
+    fun reviseNarration(segment: String, currentText: String, feedback: String): NarrationResponse {
+        // LLM call with revision context
+    }
+}
+```
+
+### MCP server configuration
+
+```yaml
+# application.yml
+spring:
+  ai:
+    mcp:
+      server:
+        type: SYNC
+    openai:
+      api-key: ${OPENAI_API_KEY}
+```
+
+## Files to Create
+
+- **Create:** `docgen-agent/` (Spring Boot project)
+  - `pom.xml` or `build.gradle.kts`
+  - `src/main/kotlin/com/docgen/agent/`
+    - `DocgenAgentApplication.kt`
+    - `agents/NarrationAgent.kt`
+    - `agents/TTSAgent.kt`
+    - `agents/PipelineAgent.kt`
+    - `agents/ScriptAgent.kt`
+    - `agents/DebugAgent.kt`
+    - `model/` (domain classes)
+  - `src/main/resources/application.yml`
+  - `Dockerfile`
+- **Create:** `docker-compose.yml` (root level, optional)
diff --git a/issues/embedded-ai-embabel/13-python-mcp-client.md b/issues/embedded-ai-embabel/13-python-mcp-client.md
@@ -0,0 +1,110 @@
+# Issue: Python MCP Client Integration
+
+**Milestone:** 5 — Embedded AI via Embabel & Chatbot Interface
+**Priority:** High
+**Depends on:** Issue 11 (provider abstraction), Issue 12 (Embabel agents)
+
+## Summary
+
+Implement the Python-side MCP client that connects to the Embabel agent server, discovers available tools, and provides the `EmbabelProvider` implementation for the AI provider abstraction layer.
+
+## Background
+
+The official MCP Python SDK (`mcp` on PyPI) provides `ClientSession` for connecting to MCP servers. Embabel exposes its agents as MCP tools over SSE (Server-Sent Events) at `http://localhost:8080/sse`. This issue bridges the two by implementing a robust client that handles connection management, tool discovery, invocation, and streaming.
+
+## Acceptance Criteria
+
+- [ ] Add `mcp` Python SDK as optional dependency:
+  ```toml
+  [project.optional-dependencies]
+  embabel = ["mcp>=1.0"]
+  ```
+  Install with: `pip install docgen[embabel]`
+- [ ] Implement `EmbabelClient` class:
+  ```python
+  class EmbabelClient:
+      def __init__(self, url: str = "http://localhost:8080/sse"):
+          ...
+
+      async def connect(self) -> None:
+          """Connect to Embabel SSE endpoint."""
+
+      async def discover_tools(self) -> list[Tool]:
+          """List available MCP tools from Embabel."""
+
+      async def invoke(self, tool_name: str, args: dict) -> Any:
+          """Invoke an MCP tool and return the result."""
+
+      async def stream(self, tool_name: str, args: dict) -> AsyncIterator[str]:
+          """Invoke a tool with streaming response."""
+
+      async def close(self) -> None:
+          """Disconnect from Embabel."""
+  ```
+- [ ] Auto-reconnect on connection loss (exponential backoff, max 3 retries)
+- [ ] Graceful degradation: if Embabel is unavailable, fall back to direct OpenAI provider
+- [ ] Tool invocation wrappers for each agent tool:
+  ```python
+  async def generate_narration(self, segment: str, guidance: str, sources: list[str]) -> str:
+      return await self.invoke("generate_narration", {...})
+  ```
+- [ ] Handle streaming responses for chat interactions (SSE event stream)
+- [ ] Connection health checking (`is_connected`, `ping`)
+- [ ] Config integration: read `ai.embabel_url` from `docgen.yaml`
+- [ ] Synchronous wrapper for CLI usage (the MCP SDK is async, but docgen CLI is sync)
+- [ ] Unit tests with mocked MCP server
+
+## Technical Notes
+
+### MCP Python SDK usage
+
+```python
+from mcp import ClientSession
+from mcp.client.sse import sse_client
+
+async with sse_client(url="http://localhost:8080/sse") as (read, write):
+    async with ClientSession(read, write) as session:
+        await session.initialize()
+        tools = await session.list_tools()
+        result = await session.call_tool("generate_narration", arguments={...})
+```
+
+### Sync wrapper pattern
+
+Since docgen CLI uses Click (synchronous), we need a sync wrapper:
+
+```python
+import asyncio
+
+class EmbabelClientSync:
+    def __init__(self, url: str):
+        self._async_client = EmbabelClient(url)
+        self._loop = asyncio.new_event_loop()
+
+    def invoke(self, tool_name: str, args: dict) -> Any:
+        return self._loop.run_until_complete(
+            self._async_client.invoke(tool_name, args)
+        )
+```
+
+### Fallback behavior
+
+```python
+def get_ai_provider(config):
+    if config.ai_provider == "embabel":
+        try:
+            client = EmbabelClientSync(config.embabel_url)
+            client.connect()
+            return EmbabelProvider(client)
+        except ConnectionError:
+            print("[ai] Embabel unavailable, falling back to OpenAI")
+            return OpenAIProvider()
+    ...
+```
+
+## Files to Create/Modify
+
+- **Create:** `src/docgen/mcp_client.py`
+- **Modify:** `src/docgen/ai_provider.py` (implement EmbabelProvider using mcp_client)
+- **Modify:** `pyproject.toml` (add `embabel` optional dependency)
+- **Create:** `tests/test_mcp_client.py`