Skip to content

fix: HTTP transport + RPC endpoint to prevent bridge zombie accumulation#1813

Open
fayenix wants to merge 3 commits into
MemTensor:mainfrom
chiefmojo:pr/http-client-rpc-endpoint
Open

fix: HTTP transport + RPC endpoint to prevent bridge zombie accumulation#1813
fayenix wants to merge 3 commits into
MemTensor:mainfrom
chiefmojo:pr/http-client-rpc-endpoint

Conversation

@fayenix
Copy link
Copy Markdown

@fayenix fayenix commented May 27, 2026

Problem

The Hermes MemOS adapter spawns a new bridge.cjs --no-viewer subprocess for every AIAgent session. Over time, 5-16 bridge processes accumulate — only one binds port 18800, the rest are zombies burning CPU and LLM tokens on redundant scoring loops against the same DB.

Root cause: MemTensorProvider.initialize() always calls subprocess.Popen() to spawn a stdio bridge. The daemon HTTP server exists at port 18800 but the Python adapter never connects to it — it always spawns its own process.

Fix (Three Layers)

Layer 1 — Zombie cleanup

daemon_manager.py: On startup, enumerate bridge processes and kill any that aren't the port-holding daemon.

Layer 2 — HTTP client

New MemosHttpClient class in bridge_client.py that sends JSON-RPC to http://127.0.0.1:18800/api/v1/rpc instead of spawning a subprocess. MemTensorProvider prefers HTTP when the daemon is running.

Layer 3 — JSON-RPC endpoint

New POST /api/v1/rpc route that accepts JSON-RPC 2.0 envelopes and dispatches to the same handlers the stdio server uses. Gives immediate feature parity for all methods (turn.start, turn.end, subagent.record, etc.) with zero per-method route work.

Files Changed

  • adapters/hermes/memos_provider/bridge_client.pyMemosHttpClient class
  • adapters/hermes/memos_provider/daemon_manager.py — zombie cleanup + daemon probe
  • adapters/hermes/memos_provider/__init__.py — prefer HTTP when daemon running
  • apps/memos-local-plugin/server/routes/rpc.tsnew JSON-RPC endpoint
  • apps/memos-local-plugin/server/routes/registry.ts — register RPC routes
  • apps/memos-local-plugin/server/routes/auth.ts — exempt loopback RPC from auth

Testing

All tested on a live Hermes+MemOS instance with 36K traces:

  • core.health
  • turn.start ✅ (returns episode + retrieval)
  • skill.list ✅ (27 skills)
  • subagent.record ✅ (returns traceId + episodeId)
  • Bridge count stable at 1-2 after 2+ hours (was 16 before fix)

Related

- Add POST /api/v1/rpc (JSON-RPC-over-HTTP) for out-of-process transport
- Add MemosHttpClient class (Python HTTP client, drop-in for MemosBridgeClient)
- Provider prefers HTTP when daemon already running on :18800
- Add kill_zombie_bridges() and probe_viewer_status() to daemon_manager
- Exempt /api/v1/rpc from session cookie auth (loopback-only)
- Remove tracked ephemeral files (daemon/bridge.pid, memos-local/*)
- Update .gitignore for runtime state
@Memtensor-AI Memtensor-AI added the plugin Plugin/adapter/bridge layer (apps/ directory) label May 27, 2026
Copy link
Copy Markdown
Collaborator

@Memtensor-AI Memtensor-AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me review this PR carefully.

Needs changes — good architecture, a few issues to address

The three-layer approach is sound and the PR description is excellent. Here are the specific issues:

Security — Auth bypass is too broad

auth.ts — Exempting /api/v1/rpc from session auth entirely is risky. This endpoint gives full access to every dispatched method (turn.start, subagent.record, etc.). Even on loopback, any local process can now invoke arbitrary RPC methods without authentication.

if (pathname === "/api/v1/rpc") return true;

Consider restricting to loopback source IP check inside this condition, or requiring the API key for RPC calls. A malicious local process (browser extension, compromised npm script) could abuse this.

Bug — Notification handling returns undefined into a batch array

rpc.ts line ~107handleSingle returns undefined for notifications, but in batch mode those undefined values end up in the results array and get serialized as null:

const results = await Promise.all(
  parsed.map((item) => handleSingle(item, dispatch)),
);
return results; // contains undefined entries for notifications

Per JSON-RPC 2.0 spec, notifications must not produce response objects. Filter them out:

return results.filter((r) => r !== undefined);

Also handle the case where all items are notifications (return 204 or empty).

Bug — kill_zombie_bridges is Linux-only

daemon_manager.pyss -tlnp doesn't exist on macOS. If this project runs on macOS dev machines, this will silently fail (which is fine) but then ps aux parsing will kill all bridge.cjs processes including the daemon since daemon_pid will be None:

if daemon_pid is not None and pid == daemon_pid:
    continue
# If daemon_pid is None, we skip nothing → kill the daemon too

Fix: if daemon_pid is None, skip killing entirely or use lsof -iTCP:18800 -sTCP:LISTEN as a macOS fallback.

Minor issues

  • bridge_client.pyMemosHttpClient stores _host_handlers but they're never invoked. The docstring mentions this, but register_host_handler silently swallows the registration. If host.llm.complete is actually needed for certain methods, the HTTP path will silently return degraded results. Consider logging a warning on first call that requires it.

  • .gitignore — Adding bare data/ and data at the end is redundant (the first covers the second). Also data without a trailing slash will match files named data anywhere. Likely intentional but worth confirming.

  • __init__.py — The http_bridge variable is used after the except block where it might not be assigned if MemosHttpClient() constructor itself throws:

http_bridge: MemosHttpClient | None = MemosHttpClient()  # can throw
# ...
except Exception as err:
    http_bridge.close()  # type: ignore — could be unbound

Initialize http_bridge = None before the try block and guard the close.

Nit

The import re inside kill_zombie_bridges should be at module level per convention (it's already a stdlib import with no cost).


Overall the design is solid — HTTP-first with stdio fallback is the right call, and the zombie cleanup addresses a real operational pain point. The auth bypass and the "kill everything when daemon_pid is None" bug are the two I'd want fixed before merge.

fayenix and others added 2 commits May 28, 2026 06:17
- auth.ts: add reverse-proxy assumption comment to loopback RPC exemption
- rpc.ts: filter undefined from batch notifications (JSON-RPC 2.0 §6); return 204 for all-notification batches
- __init__.py: extract _connect_http_bridge() helper; fix unbound http_bridge on constructor exception; conditional cold-start sleep via startup_lock_active()
- daemon_manager.py: add macOS lsof fallback; skip killing when daemon PID unidentified; import re to module level
- .gitignore: remove redundant bare data entry
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

plugin Plugin/adapter/bridge layer (apps/ directory)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants