Update Terminus env harness and sandbox backends#729
Conversation
Darktex
left a comment
There was a problem hiding this comment.
Note: This is an automated review by Claude Code, not a human review.
Alignment Review — PR #729
Update Terminus env harness and sandbox backends
Automated Checks
- Lint: PASS — no lint violations found in new files
- Format: FAIL —
harness.pyandtest_terminus_harness.pyhave lines exceeding 88 chars; runuv run ruff format envs/terminus_env/harness.py tests/envs/test_terminus_harness.py - Debug code: CLEAN
Tier 1: Bugs / Correctness
-
Format violations —
harness.py:88,harness.py:250,test_terminus_harness.py:154exceed line length. Runuv run ruff formaton both files. -
Unbounded recursion in
_normalize_terminal_payload(harness.py:385-392) — Whenpayload["arguments"]is a dict, the function recurses without a depth bound. Pathological input with deeply nested{"arguments": {"arguments": ...}}could hit Python's recursion limit. Low risk in practice but worth noting. -
Dead branch in
_build_verify(harness.py:226-227) —state is Noneis unreachable becauseStepEnvSessionAdapter._read_state()always returns aTerminusState. Harmless but could use a comment if intentional as a fallback. -
Test coverage gaps (non-blocking):
- No test for
build_terminal_tool_call("")(empty string) - No test for non-JSON language code fences (e.g.
```python ... ```) - No test for
_build_reset_kwargssandbox config key forwarding
- No test for
Tier 2: Alignment
ALIGNMENT FLAG: Sandbox config keys forwarded from task dicts
_build_reset_kwargsforwardssandbox_image,sandbox_flavor,hf_sandbox_image,hf_sandbox_flavoretc. from task dicts intoreset(). Two overlapping key sets with no conflict resolution. If tasks can specify sandbox backend, training orchestration loses control over execution environment, complicating reproducibility. Is this intentional?
ALIGNMENT FLAG: terminus_reward() placement
- The function is exported from
harness.pybut the PR description suggests it may also exist in an example script. If it's canonical, confirm it's inharness.py.__all__. If example-only, update the PR description.
Architecture Assessment
- "Agents cannot reset": PASS —
TerminusSessionFactory.create()wrapsTerminusEnvclient insideStepEnvSessionAdapter. Agent only seesterminaltool. - Client-server separation: PASS —
harness.pyimports from.client, not.server. - Rewards inside environment: PASS —
_build_tool_resultand_build_verifyforward rewards from the environment's step results. No external reward synthesis. - Dual API boundary: PASS — MCP for agent tools, Gym-like API for orchestration via
StepEnvSessionAdapter.
Verdict: COMMENT
The harness architecture is sound and correctly implements OpenEnv patterns. The server-side files (hf_sandbox.py, local_sandbox.py — bwrap command construction and HF credential handling) should receive careful human review before merge. Format violations need fixing.
Automated review by Claude Code | Learn more
Darktex
left a comment
There was a problem hiding this comment.
Note: This is an automated review by Claude Code, not a human review.
Alignment Review Report
Automated Checks
- Lint: SKIPPED (no network access in this environment;
ruff,usortcould not be invoked) - Debug code: CLEAN — no debug artifacts found in changed files
- Tests: SKIPPED (no network access for dependency installation)
Tier 1: Fixes Required
-
envs/terminus_env/pyproject.toml+envs/terminus_env/uv.lock—hf-sandboxdependency classification mismatch.pyproject.tomlcorrectly placeshf-sandbox>=0.1.1under[project.optional-dependencies] hf, butuv.lockrecords it as a core (non-extra) package dependency. This meansuv pip install openenv-terminus-envinstallshf-sandboxunconditionally, defeating the optional backend design. Regenerate the lockfile withuv lockafter confirmingpyproject.tomlcoredependenciesdoes NOT includehf-sandbox. -
envs/terminus_env/server/gradio_ui.py— UI strings hardcode "HF sandbox" after the backend was made pluggable. Four strings now say things like "Reset the environment to create an HF sandbox" and "Session closed. The HF sandbox was released." These are incorrect when using thelocalbackend. Change to backend-agnostic wording (e.g. "Reset the environment to create a sandbox session" / "Session closed. The sandbox was released."). -
envs/terminus_env/models.py—CommandResultdocstring now reads "Outcome of one shell command run inside the HF sandbox." butCommandResultis backend-agnostic (produced by both HF and local backends). Change to "Outcome of one shell command run inside the sandbox." -
envs/terminus_env/harness.py__all__— Theterminus_rewardfunction is exported from__init__.pyand is part of the public surface, butharness.py's own__all__list is missing"terminus_reward". Add it so the module's__all__is consistent with what__init__.pyre-exports. -
envs/terminus_env/harness.py_build_reset_kwargs— The diff includes"sandbox_backend"and"sandbox_root"in the forwarded key list, but the file as seen on the branch is missing them. Verify the PR branch contains the diff-stated version; if not, the harness cannot forward per-task backend selection to the server, silently ignoring those task fields.
Tier 2: Alignment Discussion
ALIGNMENT FLAG: terminus_reward implements reward extraction outside the environment boundary
- Principle at stake: "Rewards inside environment" (PRINCIPLES.md, RFC 002); "Reward computation must stay inside environment boundary" (INVARIANTS.md §Architectural, point 3)
- The concern:
terminus_rewardis a TRL-facing reward function (signaturecompletions, **kwargs -> list[float]) that parses reward values from tool message content on the training side. The reward itself originates in the environment (the environment emitsreward=in the tool message), butterminus_rewardre-parses and surfaces it to TRL as the authoritative reward signal. This creates a parallel reward extraction path that lives outside the environment — if the environment's reward encoding changes,terminus_rewardsilently breaks. Is parsing the reward out of the message content on the client/trainer side acceptable here, or should the reward come through the standardresult.rewardchannel only? - Suggested reviewer: @Darktex
ALIGNMENT FLAG: build_terminal_tool_call uses ast.parse on model-generated text
- Principle at stake: "Design for LLMs" / "Minimize lifecycle deltas" (PRINCIPLES.md)
- The concern: The function parses model output by running
ast.parse(f"terminal({match.group('body')})", mode="eval")wherematch.group('body')is extracted from raw model text viare.DOTALL. Althoughast.literal_evalis used for values (safe), theast.Namefallback path meansterminal(command=malicious_name)silently produces{"command": "malicious_name"}— bare identifiers become string values. The sandbox isolates execution, but is this fallback path intended for production use or only for the Pi-model compatibility case? - Suggested reviewer: @Darktex
Summary
- 5 mechanical issues to fix (1 critical: dependency lockfile mismatch; 2 cosmetic but inaccurate: UI/docstring text; 2 consistency:
__all__and_build_reset_kwargskey list) - 2 alignment points for human review (reward extraction outside environment boundary; AST parsing of model output with identifier fallback)
The sandbox backend refactor itself is clean and the new architecture (pluggable local/hf selection via TERMINUS_SANDBOX_BACKEND) is a good improvement. The harness and test coverage are solid. The blocking issues are the uv.lock mismatch and the UI text that falsely presents all backends as "HF sandbox."
Automated review by Claude Code | Learn more
Summary
This splits the Terminus environment/server support out of the Terminus Pi TRL example PR.
Changes:
terminus_env.harness.TerminusSessionFactoryandbuild_terminal_tool_callValidation
python3 -m py_compile envs/terminus_env/harness.py envs/terminus_env/server/hf_sandbox.py envs/terminus_env/server/local_sandbox.py envs/terminus_env/server/terminus_env_environment.py envs/terminus_env/client.py tests/envs/test_terminus_environment.py tests/envs/test_terminus_harness.pyruff check envs/terminus_env/harness.py envs/terminus_env/server/hf_sandbox.py envs/terminus_env/server/local_sandbox.py envs/terminus_env/server/terminus_env_environment.py envs/terminus_env/client.py tests/envs/test_terminus_environment.py tests/envs/test_terminus_harness.pyPYTHONPATH=src:envs uv run --with pytest python -m pytest tests/envs/test_terminus_environment.py tests/envs/test_terminus_harness.pyResult:
10 passed in 38.12s.