feat(monty): run_script/repl_python, HITL approval, multi-server wiring by runyaga · Pull Request #115 · soliplex/frontend

runyaga · 2026-04-15T23:35:47Z

Summary

Adds run_script and repl_python client-side tools to MontyScriptEnvironment, backed by a sandboxed Python interpreter via dart_monty
Wires HITL (human-in-the-loop) approval gate for Python execution — requiresApproval: true suspends the session until the user approves or denies
Fixes multi-server plugin wiring in standard.dart: SoliplexPlugin now receives connections for all registered servers, not just the primary connection
Adds StdoutSink debug logging in standard.dart for development visibility
Enhances tool_call_tile.dart with richer tool call display and clipboard support
Adds SoliplexConnection.alias and serverUrl fields for improved connection identification
Adds HITL unit tests (hitl_test.dart) and expands MontyScriptEnvironment tests

Test plan

dart test passes in soliplex_agent and soliplex_monty_plugin
run_script and repl_python tools appear in LLM context for Monty rooms
Python execution gate: approval banner appears before code runs
Denying a tool call cancels the session (no LLM retry loop)
Multi-server: connections from all servers available inside Python via SoliplexPlugin
Tool call tile renders tool name, arguments, and result with copy support

🤖 Generated with Claude Code

…x API New package `packages/fe_plugin_soliplex` exposes Soliplex server operations as host functions callable from sandboxed Python via dart_monty's plugin system. Host functions: soliplex_list_servers, soliplex_list_rooms, soliplex_get_room, soliplex_get_documents, soliplex_get_chunk, soliplex_list_threads, soliplex_create_thread, soliplex_delete_thread, soliplex_converse (stub), soliplex_upload_file, soliplex_upload_to_thread, soliplex_get_mcp_token. Multi-server support — each function accepts optional `server` parameter. Default room and server configurable at construction. TODO: Wire soliplex_converse with AgUiStreamClient for full AG-UI conversation flow (SSE streaming, client-side tool calling, state pass-through). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…rver API Replace stub soliplex_converse with real AG-UI SSE streaming via new_thread/reply_thread. All functions now require explicit server and room_id — no defaults. - Add SoliplexConnection adapter (avoids soliplex_agent dependency) - 11 host functions: list_servers, list_rooms, get_room, get_documents, get_chunk, list_threads, new_thread, reply_thread, upload_file, upload_to_thread, get_mcp_token - Internal _ThreadState tracks message history and AG-UI state per thread - 23 unit tests with mocked API/SSE streams - Integration tests against demo.toughserv.com + localhost:8000 (multi-server simultaneous connections verified) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Tests prove the full pipeline: Python → AgentSession FFI bridge → SoliplexPlugin host functions → live Soliplex SSE streaming. Working tests (sandbox: true): - list_servers, list_rooms, get_room from Python - Single SSE new_thread conversation Known limitation: FFI native library has global state that corrupts after async I/O host functions. Second execute() SEGFAULTs regardless of sandbox mode. See dart_monty#271. Multi-turn and bwrap codegen tests are written but blocked by this FFI issue. WASM backend or Rust fix needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…Session Single long-lived AgentSession with SoliplexPlugin. Results: - Discovery: list_servers, list_rooms (both servers), get_room - SSE streaming: new_thread on demo cooking room - Multi-turn: 3-turn bruschetta conversation via reply_thread (thread_id persists across execute() calls) - Cross-server: new_thread on local chat room - bwrap codegen → extract → execute 8/10 tests pass. The full pipeline is proven: Python → AgentSession → SoliplexPlugin → SSE streaming → response → state persistence → reply_thread with history → multi-turn works Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Both upload tests pass when run in isolation: - upload_file: agent-test.txt → bwrap_sandbox room on local - upload_to_thread: thread-notes.txt → thread on local chat room Full suite hits intermittent Rust crash on 4th execute() call (same "no active frame" / SEGFAULT as #271). Tests 1-3 and uploads pass reliably. The crash is in the monty crate's VM recompilation path, not in the plugin or upload code. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Multi-server pipeline tests using AgentSession + SoliplexPlugin with fixed dart_monty (NativeFinalizer race fix): - Cross-server discovery: both servers, rooms, skills - Demo recipe → upload to local bwrap_sandbox room - 3-turn pad thai conversation on demo → cross-server summary on local - Pancake recipe: demo → upload → local comments - bwrap codegen with monty rules (LLM formatting inconsistent) State persists across execute() calls: thread_id, recipe text, conversation responses all survive for cross-server handoff. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Test 8 proves the complete pipeline: 1. Create thread on bwrap_sandbox 2. Upload monty-rules.md with full API reference 3. Ask agent to read file and generate code 4. Agent generates valid monty code using host functions 5. Extract code from ```monty``` block 6. Execute: code calls list_servers, list_rooms, get_room 7. Returns skills map across BOTH demo + local servers The generated code correctly uses json.loads() on all host function returns, iterates servers, finds rooms with skills, and returns structured data. Zero human intervention after the prompt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Advanced scenarios: - Upload full monty ruleset with all plugins (soliplex, template, msgbus, fs) - Codegen: data pipeline with caching + templates - Codegen: cross-server intelligence gathering - Codegen: recipe → file → template report card - Codegen: orchestrate conversations across servers debug_null_return.dart proves all SSE calls work in dart run: 3 sequential SSE calls, state persistence, all return non-null. The null returns in dart test are a test-runner zone issue, not a code bug. Production (dart run) works correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

SSE event flow is correct: RunStarted → ThinkingStart → ThinkingContent → TextMessageStart → TextMessageContent → TextMessageEnd → RunFinished. All events arrive, content is accumulated properly. The null returns in tests are caused by transient HTTP 500 from the bwrap_sandbox server when creating threads rapidly after prior SSE streams. The ApiException propagates through Python's state wrapping try/except, silently leaving variables undefined → null. Not an SSE or plugin bug — server-side resource management on bwrap_sandbox with bubblewrap sandboxes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Upload real experiment files from ioi-experiments to bwrap_sandbox, agent generates monty code to solve construction scheduling problems. Test 2 (baseline): Agent generated 2716 chars of scheduling code that runs in monty — creates blackboard dict, tracks jobs/deps/weather/workers, produces a day-by-day schedule. Code executes end-to-end. Test 4 (disruption): Agent generated code but used import os (not available in monty sandbox). Ruleset needs monty stdlib limitations. Pipeline: upload experiment files → agent reads files → generates monty code → extract from code block → execute in sandbox → result. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Rerun with updated prompt rules: 1. Baseline: Executes but WRONG — marks jobs done mid-iteration, assigns H1_FRM same day as H1_FND (dep not actually satisfied yet) 2. Optimal: Same bug — copies baseline logic 3. Disruption: Used open() despite rules — needs stronger guidance 4. Infeasible: CORRECT ✅ — clean f-strings, 9 < 15 = infeasible The baseline/optimal bug is a real algorithmic error: completing jobs inside the same loop pass where deps are checked. Need to collect assignments first, then mark complete after the day loop. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

With sandbox filesystem + updated prompts, all experiments produce correct results: 1. Baseline: Day1=rain, Day2=H1_FND, Day3=H1_FRM+H2_FND, Day4=H1_ROF+H2_FRM ✅ 2. Optimal: Same schedule (already optimal) ✅ 3. Disruption: Alice sick day 2 — H1_FND done by Bob, Alice back day 3 ✅ Generated code correctly: collects assignments first, marks done after 4. Infeasible: 9 slots < 15 jobs = infeasible ✅ Files are now dual-written: server thread (bwrap reads) + sandbox filesystem (generated code reads with Path().read_text()). Monty limitations discovered and documented in prompt rules: - No := walrus, no open(), no % format, no enumerate(start=) - No chained assignment, no dict dot access - bb_dump not a real function (from experiment spec) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Runs each wave5 experiment on its own fresh thread/session, prints the complete generated code for inspection. Findings from run: - Baseline: LLM generated pseudocode with := and custom syntax (not valid Python). The model doesn't reliably follow rules. - Optimal: Used set literals, match expressions, ⊆ operator — not Python at all - Disruption/Infeasible: Server overloaded from too many threads The local Ollama model (gpt-oss) is inconsistent — sometimes generates valid Python, sometimes pseudocode. The prompt rules help but don't guarantee compliance. Need either: 1. Better model (GPT-4o on demo.toughserv.com is more reliable) 2. Validation + error correction loop 3. Stronger prompt constraints Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

show_generated_demo.dart uses demo.toughserv.com (GPT-4o): - All 4 experiments generate code (1254-1554 chars each) - Full monty tracebacks with line numbers shown on errors - GPT-4o generates valid Python syntax (unlike Ollama's pseudocode) - But: uses msg_send/bb_dump/locals() despite rules saying otherwise - Dep checking logic wrong in baseline/optimal (checks within-day) Code analysis per experiment: 1. Baseline: syntax OK, logic bug (deps checked in same day dict) 2. Optimal: syntax OK, logic bug (schedule keys are day numbers) 3. Disruption: defaultdict import crashes monty 4. Infeasible: locals() not available, wrong approach (tries scheduling) Next: strengthen prompt rules to forbid unlisted functions, add error correction loop to fix generated code. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

qwen_experiments.dart: compare 8B vs 35B on 4 tasks (fibonacci, discovery, scheduling, pipeline) qwen_room_chat.dart: 8B asks questions → 35B answers → 5 rounds Results: - 35B explains Python decorators correctly - 8B generates follow-up question about decorator parameters - 35B analyzes 8B's response - Server 500s after ~3 rapid thread creations (server resource limit) Qwen rooms configured with RAG skill, file tools, attachments: - qwen_8b: spark-3b12:8002, Qwen3-8B-FP8 - qwen_vllm: spark-3b12:8000, Qwen3.5-35B-A3B-FP8 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…spose() leak Rename package directory and pubspec name from fe_plugin_soliplex to soliplex_monty_plugin to align with soliplex_client/soliplex_agent naming. Fix SoliplexPlugin.onDispose() which was a no-op — HTTP connections from all registered SoliplexConnection instances were never closed. Closes runyaga/soliplex-audit#3. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Delete 6 debug/experiment scripts that hard-coded /Users/runyaga/dev/... paths - Replace hardcoded demo.toughserv.com with SOLIPLEX_DEMO_URL env var in all test files; SOLIPLEX_LOCAL_URL env var added for local URL (default localhost:8000) - Fix fe_plugin_soliplex → soliplex_monty_plugin in all imports (lib + tests) - wave5 file-reading tests skip gracefully when IOI_EXPERIMENTS_DIR / MONTY_DOCS_DIR env vars are unset Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

MCP connectivity is a Flutter-layer concern — Python scripts receive pre-authenticated handles and should not fetch raw tokens themselves. - Remove _getMcpToken getter and HostFunction from SoliplexPlugin - Remove MCP section from systemPromptContext - Update functions count assertion 11→10 - Add onDispose test (100% coverage on lib/) - Add no-TextMessageStartEvent edge case test - Fix relative import → package: import in soliplex_plugin.dart - Fix import ordering in test file - Format integration test files (pre-existing style debt) Gates: format ✓ analyzer ✓ coverage 100% ✓ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…by dart_monty Implements M2 milestone: MontyScriptEnvironment wraps a dart_monty.AgentSession and exposes execute_python as a ClientTool with reactive ScriptingState signal. Changes: - soliplex_agent: add ScriptingState enum and onAttach/scriptingState to ScriptEnvironment interface; export ToolExecutionContext - soliplex_monty_plugin: MontyScriptEnvironment (lib/src/), unit tests (test/src/), FFI + WASM integration tests (test/integration/) - WASM test infra: dart_test.yaml, custom HTML template, bridge/worker JS committed to lib/wasm_assets/ (wasm binary gitignored, build separately) Gates: format ✓ analyzer ✓ coverage 100% ✓ integration/ffi ✓ integration/wasm ✓ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

MontyScriptEnvironment no longer uses SoliplexPlugin as a MontyPlugin. Soliplex operations are now registered directly as dm.HostFunction on the AgentSession bridge, and the bridge's schema registry is projected to ClientTools visible to the server-side LLM. Key changes: - Register soliplex_list_servers, soliplex_list_rooms, soliplex_list_threads, soliplex_new_thread, soliplex_reply_thread directly on the bridge via _register() — no plugin system involved - _projectToClientTool() converts HostFunctionSchema.toJsonSchema() to Tool.parameters and routes ClientTool executor directly to the Dart handler (no Python hop) - _tools built lazily from session.schemas (filtered) + execute_python - SoliplexConnection.fromServerConnection() factory for clean wiring - Add soliplex_logging dev_dependency for LoggerFactory extension - Add integration tests: T0 (secret_number callback proof), T1 (Soliplex tools visible), T2 (execute_python), T3 (state persistence), T4 (signal) - Add tool/test_integration_ffi.sh and tool/test_integration_wasm.sh - Add tool/chat_probe.dart for manual inspection All 5 tests pass on FFI and WASM/Chrome. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add Mutex to serialise concurrent dm.AgentSession.execute() calls so concurrent Python tool invocations on a shared interpreter cannot stomp each other's variable state - Add wrapSharedScriptEnvironment() factory to soliplex_agent: wraps a caller-owned ScriptEnvironment without taking dispose ownership, making the shared-env pattern explicit and safe - Update stateful test group to use wrapSharedScriptEnvironment instead of wrapScriptEnvironmentFactory so the lifecycle contract is unambiguous - T5: regression guard proving dart_monty Isolate/Worker is non-blocking (43 FFI / 35 WASM heartbeats confirm event loop stays free during Python) - T7: proves fire-and-forget sessions have isolated Python state (fresh dm.AgentSession per spawn = fresh interpreter = no variable leakage) - Fix pre-existing ScriptEnvironment test fakes missing onAttach() / scriptingState; remove redundant internal imports in agent test helpers All 7 integration tests pass on both FFI and WASM (Chrome). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Failing tests written first for each gap; implementation then made them pass. **execution timeout** (`_executionTimeout` field, default 30 s / 2 s in tests) - `Future.timeout()` wraps `_montySession.execute()` inside the mutex; throws `TimeoutException`, releases the mutex cleanly. - `forTest` accepts `executionTimeout:` so timeout tests run at 500 ms without waiting 30 s. **dispose drain** (replace `unawaited(_montySession.dispose())`) - `dispose()` now queues `_montySession.dispose()` via `_executeMutex.protect(...)`, guaranteeing the Python interpreter is only destroyed after any in-flight `execute()` releases the mutex. - Dispose-verify test updated to pump the event loop before verify. **in-mutex `_disposed` re-check** - Calls that entered `_executePython` before `dispose()` but are still waiting at the mutex now throw `StateError` after acquiring it, instead of calling the already-destroyed session. **new unit tests** (19 added, 31 total, 0 warnings): - `timeout` (3): TimeoutException, idle restored, mutex released after timeout - `concurrency` (3): serialisation order, exception isolation, signal cycling - `dispose safety` (2): drain before session.dispose, queued callers rejected - `isolation` (1): deterministic replacement for weak LLM-mediated T7 - `corner cases` (3): large result, missing code key, mid-flight cancel docs **pre-existing fix**: stub `mockSession.schemas → []` in setUp so the `late final _tools` initialiser does not throw on first access. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@OverRide

- Add dcm_options.yaml with ~80 rules adapted from dart_monty (internal path exclusions stripped; only test/** and *.g.dart kept) - Wire both linters into analysis_options.yaml via include directives - monty_script_environment.dart: dynamic→Object?, late final→nullable+??=, async {}→Future.value(), non-null assertion → local var, six newline-before-return, _stateSignal cascade dispose, dispose-class-fields exclusion (tearoff through unawaited(protect()) is not DCM-traceable) - soliplex_plugin.dart: move @OverRide methods before private helpers to satisfy member-ordering (DCM classifies them as public-methods) dart format, dart analyze --fatal-infos, dcm analyze lib: zero issues. 31/31 unit tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace AgentUiDelegate (callback-based, single-tool) with a signal-driven Human-in-the-Loop gate that scales to concurrent tool calls and integrates cleanly with the signals reactive layer. ## What changed **ClientTool API** (`tool_registry.dart`): - `requiresApproval: bool` — when true, AgentSession suspends execution before the tool executor runs and emits a PendingApprovalRequest; the UI must call approveToolCall or denyToolCall to resume. - `platformConsentNote: String? Function()?` — optional callback for tools that trigger an OS-level permission dialog (e.g. clipboard read on web). Returns a human-readable note; AgentSession emits PlatformConsentNotice (non-blocking, informational). **New types**: - `PendingApprovalRequest` — immutable data class (toolCallId, toolName, arguments) emitted on pendingApproval signal. - `PlatformConsentNotice` / `AwaitingApproval` — ExecutionEvent subclasses for consent/approval lifecycle. **AgentSession** (`agent_session.dart`): - `pendingApproval: ReadonlySignal<PendingApprovalRequest?>` — UI watches this to render Allow/Deny UI. - `approveToolCall(String) / denyToolCall(String)` — resolves the Completer gating the suspended tool call. - `_awaitApproval()` — internal gate; stores Completer per toolCallId in _pendingApprovals, signals UI, awaits resolution. Auto-denies on session cancel. **Deleted**: - `AgentUiDelegate` (58 lines) + its 457-line test file. Replaced entirely by the signal approach above. **session_extension.dart**: `onDispose()` → `dispose()` rename for consistency with Dart disposal conventions. **Tests** (`hitl_test.dart`): 12 tests covering requiresApproval defaults, three approval tiers (agent-gate / OS-gate / ungated), PendingApprovalRequest fields, platformConsentNote callbacks, and PlatformConsentNotice equality. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- ScriptEnvironment now implements SessionExtension directly instead of using a separate ScriptEnvironmentExtension adapter class. - Deleted ScriptEnvironmentExtension (no remaining references). - wrapScriptEnvironmentFactory → toOwnedFactory - wrapSharedScriptEnvironment → toSharedFactory - _SharedScriptEnvironmentExtension → _SharedScriptEnvironmentProxy - onDispose() → dispose() in the proxy (matches SessionExtension rename) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Wire AgentSession.pendingApproval through ThreadViewState into a self-contained ToolApprovalSlot widget so only the slot rebuilds on approval changes — not the entire thread body column. **New widgets** (`ui/tool_approval_banner.dart`): - ToolApprovalSlot — owns .watch(context); renders nothing when null. - ToolApprovalBanner — tool name header, scrollable code preview (max 160 px), Allow (FilledButton) / Deny (TextButton). **ThreadViewState**: mirrors session.pendingApproval to its own _pendingApproval signal; subscribes in _attachSession, unsubscribes and resets to null in _detachSession. **Example tools** (`modules/tools/`): - get_device_info — ungated (requiresApproval: false, no consent). - confirm_action — agent-gated (requiresApproval: true); shows action argument in the approval banner preview. - get_clipboard — platform-gated via platformConsentNote; emits PlatformConsentNotice on web (browser clipboard permission), silent on native. **standard.dart**: registers all three example tools; wires MontyScriptEnvironment via extensionFactoryBuilder; startup probe. **macOS / web**: CocoaPods xcconfig includes for shared_preferences re-linking after flutter clean; dart_monty WASM bridge assets. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

MontyScriptEnvironment rewritten to accept a list of MontyPlugins instead of hard-wiring Soliplex connections. Host functions are registered on the dart_monty bridge and projected as direct ClientTools (no Python hop). execute_python now has requiresApproval: true — HITL gate suspends execution until the user allows or denies from the approval banner. probe() validates the interpreter on startup by running `1 + 1`. Regression test: error messages must not leak Rust interpreter internals (NodeIndex, ExprSubscript, node_index:). Currently failing pending runyaga/monty subscript tuple-unpack fix. defensive MissingPluginException guard in DefaultBackendUrlStorage for flutter clean / CocoaPods re-link scenarios. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Split execute_python into run_script (one-shot) and repl_python (persistent REPL) with differentiated descriptions so the LLM picks the right tool - Return Python errors as tool output (status:completed) instead of throwing; includes any print() output that occurred before the error - Return 'None' for in-place ops (arr.sort()) so LLM knows execution succeeded - denyToolCall cancels the AgentSession to prevent LLM retry loops - Serialize approval-required tools in _executeAll to prevent concurrent approval banner deadlock - Wire SoliplexPlugin with all active ServerManager connections (not just current room's server) - ThreadKey (serverId, roomId, threadId) used as _threadStates map key - SoliplexConnection gains alias + serverUrl; _listServers returns full metadata - onDispose no longer closes injected connections (owned by ServerManager) - Remove unsupported help() from systemPromptContext - Copy button on tool call tile code/result blocks - Bold labels + monospace container in ToolCallTile Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…act args * feat: activity log, state panel, and expandable tool call args Surfaces previously invisible AG-UI events in the execution UI: - ActivityLog: collapsible sub-agent call/result log with markdown rendering - StatePanel: info-chiclet toggle showing aguiState JSON with copy support - StepLog: expandable args per tool call step (via ToolCallArgsEvent bridge) - ToolCallTile: upgraded to use ArgsBlock for styled markdown rendering - ArgsBlock: shared widget converting JSON to readable markdown with platform monospace font (SF Mono on Apple), scrollable, with copy button Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(execution-ui): replace markdown renderer in ArgsBlock, hide LLM-internal activity rows - ArgsBlock: swap FlutterMarkdownPlusRenderer for SelectableText + _prettyPrint/_renderMap eliminating nested CodeBlockBuilder container and fixing newline escaping - ActivityLog: filter to skill_tool_call only; hide skill_tool_result rows (error tracebacks, JSON arrays are LLM-internal, not user-facing) - ActivityLog: reject empty-Map args (list_environments '{}') to avoid blank rows Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(execution-ui): remove Thinking steps, drop ExecutionThinkingBlock, compact activity rows Changes A-D from UX analysis: - ActivityIndicator: "Calling tools..." (no numeric count) to avoid mismatch with step log - ExecutionTracker: ThinkingStarted no longer adds a step; remove thinkingBlocks/ isThinkingStreaming signals (LLM reasoning is internal, not user-facing) - StepType enum removed; ExecutionStep simplified (no type field) - ExecutionThinkingBlock removed from LoadingMessageTile and TextMessageTile; static _ThinkingBlock for persisted message thinkingText is retained - thinking_block.dart deleted - ActivityLog: compact inline SelectableText instead of full ArgsBlock container; row padding tightened to vertical: 2 - args_block.dart: prettyPrintArgs/renderMap promoted to top-level for reuse - Tests updated: ThinkingStarted no longer creates a step, thinking block tests removed Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…ly (#7) * refactor(agent): ScriptEnvironment implements SessionExtension directly - ScriptEnvironment now implements SessionExtension, eliminating the ScriptEnvironmentExtension adapter class. - SessionExtension.onDispose() renamed to dispose() for Dart convention. - SessionContext added (serverId + roomId) passed through extension factory so environments can customize per room. - ScriptEnvironmentFactory now takes SessionContext. - toOwnedFactory / toSharedFactory replace wrapScriptEnvironmentFactory. - SharedScriptEnvironmentProxy replaces ScriptEnvironmentExtension. - ScriptingState enum added for reactive interpreter lifecycle. - soliplex_agent exports ScriptingState and SessionContext. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * style: dart format Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * style: add trailing commas to typedef params (linter) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

#9) * feat(m2b): add soliplex_monty_plugin — Python scripting via dart_monty Adds the soliplex_monty_plugin package, which connects the dart_monty Python runtime to the Soliplex agent platform. ## What's included **MontyScriptEnvironment** — ScriptEnvironment backed by dart_monty's AgentSession. Registers SoliplexTools as HostFunctions, projects them as ClientTools for the LLM, runs Python in a background isolate/worker. **SoliplexTool** — flat data struct unifying Python-callable and LLM-callable tool definitions (name, description, parameters, handler). **SoliplexConnection / buildSoliplexTools** — full Soliplex API surface callable from Python: list_servers, list_rooms, get_room, get_documents, get_chunk, new_thread, reply_thread, list_threads, upload_file, upload_to_thread. **toOwnedFactory / toSharedFactory** — two ownership modes: fire-and-forget (isolated Python per session) and stateful (shared interpreter across sessions). **Integration tests** — agent_session_test, monty_env_chat_test (T0–T7), monty_script_environment_test. ## What's not included HITL approval gate (requiresApproval) — deferred to M3. SoliplexTool and ClientTool have no requiresApproval field in this slice; it will be added when feat/hitl-tool-approval is landed. ## soliplex_agent changes - Export ToolExecutionContext from public API (needed by plugin). - Remove redundant direct imports in test helpers. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * chore(monty-plugin): remove coverage artifacts, add .gitignore Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(monty-plugin): migrate os: OsProvider? → OsCallHandler? Follows dart_monty#335 which replaced the OsProvider class hierarchy with the OsCallHandler typedef from dart_monty_core. Parameter is a pass-through to dm.AgentSession(os:). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(m2b): wire MontyScriptEnvironment into Flutter app - AgentRuntimeManager accepts extensionFactoryBuilder so each runtime can receive a per-server SessionExtensionFactory - standard.dart creates a RoomEnvironmentRegistry and wires toRoomSharedFactory + MontyScriptEnvironment with all SoliplexTools; adds debug logging sink and startup probe (fire-and-forget) - Add get_device_info_tool and get_clipboard_tool client tools; confirm_action_tool deferred to M3 (HITL) - Fix OsCallHandler → OsProvider to match current dart_monty API Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(monty-plugin): use git dep for dart_monty; OsCallHandler matches origin/main Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * ci: trigger fresh CI run Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * ci: fix stale pub cache causing package_graph.json parse failure Remove the restore-keys fallback from the pub cache step so CI never restores an old cache from a different pubspec.lock state. Add `rm -rf .dart_tool` before pub get to prevent any stale package_graph.json from interfering with dependency resolution. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(deps): upgrade dart_monty_core to remove broken flutter assets Picks up c3d7a06 from runyaga/dart_monty_core which removes the flutter: assets section (dart_monty_bridge.js etc. are WASM build artifacts not committed to the repo, causing flutter test/build to fail). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

Resolve conflicts with PR #9 (feat/m2b): - soliplex_monty_plugin: take main's buildSoliplexTools/SoliplexTool API - script_environment.dart: take main's SessionContext factory signature - standard.dart: take main's SoliplexTools wiring, remove ConfirmActionTool - test helpers: drop .readonly() on signal, update extensionFactory signatures Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

runyaga and others added 30 commits April 14, 2026 20:34

Merge branch 'soliplex:main' into feat/m2-monty-script-env

2350381

runyaga and others added 3 commits April 18, 2026 05:34

runyaga closed this Apr 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(monty): run_script/repl_python, HITL approval, multi-server wiring#115

feat(monty): run_script/repl_python, HITL approval, multi-server wiring#115
runyaga wants to merge 33 commits intosoliplex:mainfrom
runyaga:feat/m2-monty-script-env

runyaga commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

runyaga commented Apr 15, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant