native module refinement#1875
Conversation
Greptile SummaryThis PR refines the native-module lifecycle with a stop-lock to prevent re-entrant shutdown races, orphan-process prevention via Confidence Score: 5/5Safe to merge; all remaining findings are P2 style/reliability suggestions that do not block correct operation. Previously flagged P1 issues (TimeoutExpired after SIGKILL, re-entrant stop) are resolved. New findings are P2: a reduced shutdown timeout that may be too short for heavy-teardown processes, missing stderr in the build-failure exception, and a minor find_library edge case. None of these are blocking correctness bugs. dimos/core/native_module.py — shutdown_timeout reduction and build-error message regression deserve a second look before shipping to users with slow-teardown native processes. Important Files Changed
Sequence DiagramsequenceDiagram
participant Parent as NativeModule (parent)
participant Proc as Native Process (child)
participant WD as Watchdog Thread
Parent->>Proc: Popen(start_new_session=True, preexec_fn=PR_SET_PDEATHSIG)
Parent->>WD: Thread(target=_watch_process).start()
WD->>Proc: proc.wait()
alt Normal stop()
Parent->>Parent: _stop_lock: _stopping=True
Parent->>Proc: send_signal(SIGTERM)
Proc-->>Parent: exits within shutdown_timeout
Parent->>WD: watchdog.join(timeout)
WD-->>Parent: joined
Parent->>Parent: _stop_lock: _process=None, _watchdog=None
Parent->>Parent: super().stop()
else Process crashes
Proc-->>WD: proc.wait() returns (unexpected exit)
WD->>Parent: self.stop() [re-entrant, lock prevents double-stop]
else Parent dies unexpectedly
Proc->>Proc: SIGTERM via PR_SET_PDEATHSIG
end
Reviews (3): Last reviewed commit: "-" | Re-trigger Greptile |
Document all unaddressed review comments across these PRs: - PR #1875: 6 comments from leshy/paul-nechifor on native_module.py - PR #1791: 8 comments from leshy on nav stack architecture/naming - PR #1663: 3 comments from paul-nechifor on test_mcp_server.py - PR #1485: 4 greptile bot suggestions (3 fixed, 1 minor typing nit)
Keep both: dev's _WorkerState pattern and our branch's SIGINT ignore for clean coordinator-orchestrated shutdown. Also add pr_responses.yaml for PR #1875 review comments.
Replace the ad-hoc PROD env var check with a proper GlobalConfig field, readable via global_config.prod. pydantic-settings auto-reads from the PROD env var.
Co-authored-by: Paul Nechifor <paul@nechifor.net>
- Restore LogFormat enum and JSON log parsing in _read_log_stream
- Add auto_build config to gate expensive nix rebuilds per-module
- Replace PROD env var with global_config.prod
- Handle TimeoutExpired after SIGKILL in stop()
- Use find_library("c") instead of hardcoded libc.so.6
- Simplify _child_preexec_linux docstring
- Document shell=True trusted-source requirement
…tive_rebuild3 # Conflicts: # dimos/core/coordination/python_worker.py # dimos/core/global_config.py
d7404f9 to
6b07823
Compare
# Conflicts: # dimos/core/native_module.py
Pull request was converted to draft
Documents responses to leshy and paul-nechifor comments on native_module.py covering: stdout/stderr capture restoration, env var vs config system, autobuild behavior, build change detection cost, ctypes portability fix, comment cleanup, and watchdog process note.
Native module refinement (rebuild logic and logging) and keyboard interrupt improvement