fix: HTTP transport + RPC endpoint to prevent bridge zombie accumulation#1813
fix: HTTP transport + RPC endpoint to prevent bridge zombie accumulation#1813fayenix wants to merge 3 commits into
Conversation
- Add POST /api/v1/rpc (JSON-RPC-over-HTTP) for out-of-process transport - Add MemosHttpClient class (Python HTTP client, drop-in for MemosBridgeClient) - Provider prefers HTTP when daemon already running on :18800 - Add kill_zombie_bridges() and probe_viewer_status() to daemon_manager - Exempt /api/v1/rpc from session cookie auth (loopback-only) - Remove tracked ephemeral files (daemon/bridge.pid, memos-local/*) - Update .gitignore for runtime state
Memtensor-AI
left a comment
There was a problem hiding this comment.
Needs changes — good architecture, a few issues to address
The three-layer approach is sound and the PR description is excellent. Here are the specific issues:
Security — Auth bypass is too broad
auth.ts — Exempting /api/v1/rpc from session auth entirely is risky. This endpoint gives full access to every dispatched method (turn.start, subagent.record, etc.). Even on loopback, any local process can now invoke arbitrary RPC methods without authentication.
if (pathname === "/api/v1/rpc") return true;Consider restricting to loopback source IP check inside this condition, or requiring the API key for RPC calls. A malicious local process (browser extension, compromised npm script) could abuse this.
Bug — Notification handling returns undefined into a batch array
rpc.ts line ~107 — handleSingle returns undefined for notifications, but in batch mode those undefined values end up in the results array and get serialized as null:
const results = await Promise.all(
parsed.map((item) => handleSingle(item, dispatch)),
);
return results; // contains undefined entries for notificationsPer JSON-RPC 2.0 spec, notifications must not produce response objects. Filter them out:
return results.filter((r) => r !== undefined);Also handle the case where all items are notifications (return 204 or empty).
Bug — kill_zombie_bridges is Linux-only
daemon_manager.py — ss -tlnp doesn't exist on macOS. If this project runs on macOS dev machines, this will silently fail (which is fine) but then ps aux parsing will kill all bridge.cjs processes including the daemon since daemon_pid will be None:
if daemon_pid is not None and pid == daemon_pid:
continue
# If daemon_pid is None, we skip nothing → kill the daemon tooFix: if daemon_pid is None, skip killing entirely or use lsof -iTCP:18800 -sTCP:LISTEN as a macOS fallback.
Minor issues
-
bridge_client.py—MemosHttpClientstores_host_handlersbut they're never invoked. The docstring mentions this, butregister_host_handlersilently swallows the registration. Ifhost.llm.completeis actually needed for certain methods, the HTTP path will silently return degraded results. Consider logging a warning on first call that requires it. -
.gitignore— Adding baredata/anddataat the end is redundant (the first covers the second). Alsodatawithout a trailing slash will match files nameddataanywhere. Likely intentional but worth confirming. -
__init__.py— Thehttp_bridgevariable is used after theexceptblock where it might not be assigned ifMemosHttpClient()constructor itself throws:
http_bridge: MemosHttpClient | None = MemosHttpClient() # can throw
# ...
except Exception as err:
http_bridge.close() # type: ignore — could be unboundInitialize http_bridge = None before the try block and guard the close.
Nit
The import re inside kill_zombie_bridges should be at module level per convention (it's already a stdlib import with no cost).
Overall the design is solid — HTTP-first with stdio fallback is the right call, and the zombie cleanup addresses a real operational pain point. The auth bypass and the "kill everything when daemon_pid is None" bug are the two I'd want fixed before merge.
- auth.ts: add reverse-proxy assumption comment to loopback RPC exemption - rpc.ts: filter undefined from batch notifications (JSON-RPC 2.0 §6); return 204 for all-notification batches - __init__.py: extract _connect_http_bridge() helper; fix unbound http_bridge on constructor exception; conditional cold-start sleep via startup_lock_active() - daemon_manager.py: add macOS lsof fallback; skip killing when daemon PID unidentified; import re to module level - .gitignore: remove redundant bare data entry
Problem
The Hermes MemOS adapter spawns a new
bridge.cjs --no-viewersubprocess for everyAIAgentsession. Over time, 5-16 bridge processes accumulate — only one binds port 18800, the rest are zombies burning CPU and LLM tokens on redundant scoring loops against the same DB.Root cause:
MemTensorProvider.initialize()always callssubprocess.Popen()to spawn a stdio bridge. The daemon HTTP server exists at port 18800 but the Python adapter never connects to it — it always spawns its own process.Fix (Three Layers)
Layer 1 — Zombie cleanup
daemon_manager.py: On startup, enumerate bridge processes and kill any that aren't the port-holding daemon.Layer 2 — HTTP client
New
MemosHttpClientclass inbridge_client.pythat sends JSON-RPC tohttp://127.0.0.1:18800/api/v1/rpcinstead of spawning a subprocess.MemTensorProviderprefers HTTP when the daemon is running.Layer 3 — JSON-RPC endpoint
New
POST /api/v1/rpcroute that accepts JSON-RPC 2.0 envelopes and dispatches to the same handlers the stdio server uses. Gives immediate feature parity for all methods (turn.start,turn.end,subagent.record, etc.) with zero per-method route work.Files Changed
adapters/hermes/memos_provider/bridge_client.py—MemosHttpClientclassadapters/hermes/memos_provider/daemon_manager.py— zombie cleanup + daemon probeadapters/hermes/memos_provider/__init__.py— prefer HTTP when daemon runningapps/memos-local-plugin/server/routes/rpc.ts— new JSON-RPC endpointapps/memos-local-plugin/server/routes/registry.ts— register RPC routesapps/memos-local-plugin/server/routes/auth.ts— exempt loopback RPC from authTesting
All tested on a live Hermes+MemOS instance with 36K traces:
core.health✅turn.start✅ (returns episode + retrieval)skill.list✅ (27 skills)subagent.record✅ (returns traceId + episodeId)Related