Draft
Conversation
The 'hash' in the old name suggested a Redis HASH data structure, but the key actually addresses a STRING storing a SHA-256 hex digest. Rename removes ambiguity. Caught in code review before Task 2 wrote tests against the old name.
Implement register, verify_token, renew_alive, is_alive in dispatch/runner.py with TDD (8 test cases). Fix RedisCache fakeredis singleton so cross-instance state is shared in tests (shared FakeServer).
- tests use RUNNER_ALIVE_TTL_SEC constant instead of hardcoded 30 - renew_alive() docstring documents caller's verify_token responsibility
Adds reclaim_orphan_atomic() backed by a single Lua script so that only one runner can claim an orphaned job; also enforces max-attempts policy. Adds lupa dev-dep to enable fakeredis Lua execution in tests.
redis-py's register_script() already handles EVALSHA caching internally. The Python-side _script_cache + _get_reclaim_script() helpers added unnecessary complexity. Caught in code review.
Implements POST /runners/register, POST /runners/<id>/heartbeat, GET /runners/<id>/next-job, PUT /runners/<id>/jobs/<jb>/complete. Wires blueprint into app.py and adds test_case_info property to engine.Problem so dispatch layer can read task metadata via duck-typing interface used in both production and mock tests.
Add _on_attempts_exhausted helper that updates Submission.status to 6 (JE) and deletes the orphaned job hash when Lua reclaim returns -1.
…o sandbox - mongo/submission.py::submit() removes TESTING shortcut and old send() call; now calls dispatch.job.enqueue_job() to push to Redis queue - Tests updated: removed manual enqueue_job() calls in test_runner_api.py that were workarounds before submit() was wired - New test added: test_submit_enqueues_job_to_redis_pending verifies submit() pushes exactly one job to JOBS_PENDING with correct submission_id
…fixture - mongo/submission.py::rejudge() now calls dispatch.job.enqueue_job() (consistent with submit() refactor in previous commit) - tests/conftest.py: add autouse fixture that flushdb between every test to prevent fakeredis state leakage across tests (was causing pre-existing flakiness in TestTeacherGetSubmission)
Delete target_sandbox, send, sandbox_resp_handler, assign_token, and verify_token from Submission class; also drop unused `import requests as rq`.
Drop on_submission_complete handler and OnSubmissionCompleteBody schema; these were part of the push-based dispatch that is now replaced by the pull-based runner API.
…onfig Remove sandbox_instances field from UpdateConfigBody and simplify update_config to only update rate_limit; runners now self-register via the pull-based dispatch. Also drop unused imports (requests, current_app). Update test_edit_config to match new API shape.
…ntyped ListField Delete class Sandbox(EmbeddedDocument) which is no longer referenced. Convert SubmissionConfig.sandbox_instances to a plain ListField with no schema to preserve DB compatibility on production until the PG migration.
…token - dispatch/runner.py: add verify_any_token() — scans registered runners - model/problem.py: get_testdata/get_checksum/get_meta now accept any runner's token - mongo/sandbox.py: deleted (find_by_token replaced) Resolves the gap where Plan A Task 14 was blocked by these legacy endpoints. Runners (Sandbox) now authenticate to the testdata fetch path with their per-runner rk_token instead of the shared SANDBOX_TOKEN.
Covers Plan A spec Section 13 failure modes: - Happy path: submit → pull → complete → status=AC - Orphan reclaim: rn1 dies mid-job, rn2 picks up, rn1 zombie complete rejected - Max attempts: 3 reclaims exhausted → Submission marked JE These complete Plan A's verification — Backend dispatch path is end-to-end exercised without needing the real Sandbox.
After task.to_mongo() the task is a SON dict (no .cases attribute). The same fix is already applied in get_result() above; this method was missed. Triggered 500 on GET /api/submission/<id> for any submission with at least one task result.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull-Based Job Dispatch — Backend Implementation Plan (Plan A)
Goal: Refactor Back-End submission dispatch from "push to Sandbox" to "Sandbox-pulls-from-Redis-queue", with self-registration, heartbeat, and orphan-job reclaim. Runner-side changes deferred to Plan B.
Architecture: Add
dispatch/module (Redis-backed runner registry + job queue) andmodel/runner.pyblueprint exposing 4 endpoints (register / heartbeat / next-job / complete). Modifymongo/submission.py::submit()to enqueue to Redis instead of POSTing to Sandbox. Remove all push-related code paths. Tests use a "mock runner" (HTTP client simulating Sandbox) to verify the full flow without needing the real Sandbox.Tech Stack: Python 3.11, Flask, mongoengine, redis-py (with fakeredis in tests), Pydantic, ULID, pytest, mongomock, testcontainers.
Working directory for this plan:
Back-End/(all file paths in this plan are relative toBack-End/unless stated otherwise).Spec Reference
Source design:
docs/superpowers/specs/2026-04-28-pull-based-job-dispatch-design.mdThis Plan A covers Sections 5, 6, 7, 8, 9, 13 (Backend pieces only). Plan B will cover Section 10 (Runner refactor) + Sections 11-12 (Infra/Migration).
File Structure
New files (Back-End)
Modified files
Deleted files
Phases
Phase 1: dispatch/ module foundation
Task 1: Create dispatch package skeleton + config + redis_keys
Files:
Create:
dispatch/__init__.pyCreate:
dispatch/config.pyCreate:
dispatch/redis_keys.pyCreate:
tests/unittest/dispatch/__init__.pyStep 1: Create empty package init
# tests/unittest/dispatch/__init__.pyTask 2: Implement dispatch/runner.py with TDD
Files:
Test:
tests/unittest/dispatch/test_runner.pyCreate:
dispatch/runner.pyStep 1: Write failing tests for register / verify_token / renew_alive / is_alive
Run:
cd Back-End && poetry run pytest tests/unittest/dispatch/test_runner.py -vExpected: FAIL with
ImportError: cannot import name 'runner' from 'dispatch'(or similar)Run:
cd Back-End && poetry run pytest tests/unittest/dispatch/test_runner.py -vExpected: All 8 tests PASS
Task 3: Implement dispatch/scripts.py (Lua reclaim) with TDD
Files:
Test:
tests/unittest/dispatch/test_reclaim_script.pyCreate:
dispatch/scripts.pyStep 1: Write failing tests
Run:
cd Back-End && poetry run pytest tests/unittest/dispatch/test_reclaim_script.py -vExpected: FAIL with
ImportError: cannot import name 'reclaim_orphan_atomic'Run:
cd Back-End && poetry run pytest tests/unittest/dispatch/test_reclaim_script.py -vExpected: All 4 tests PASS
Task 4: Implement dispatch/job.py — enqueue + claim_next_job (pending path)
Files:
tests/unittest/dispatch/test_job.pydispatch/job.pyThis task covers the basic enqueue + claim from pending queue. Orphan reclaim path is added in Task 5.
Run:
cd Back-End && poetry run pytest tests/unittest/dispatch/test_job.py -vExpected: FAIL with
ImportError: cannot import name 'job'or similarRun:
cd Back-End && poetry run pytest tests/unittest/dispatch/test_job.py -vExpected: All 4 tests PASS
Task 5: Add orphan reclaim path to claim_next_job + complete_job
Files:
Modify:
tests/unittest/dispatch/test_job.py(add tests)Modify:
dispatch/job.py(add orphan reclaim + complete_job)Step 1: Write failing tests for orphan reclaim and complete_job
Append to
tests/unittest/dispatch/test_job.py:Run:
cd Back-End && poetry run pytest tests/unittest/dispatch/test_job.py -vExpected: New tests FAIL (existing 4 still pass)
Replace the existing
claim_next_joband addcomplete_jobindispatch/job.py:Run:
cd Back-End && poetry run pytest tests/unittest/dispatch/test_job.py -vExpected: All 9 tests PASS
Phase 2: Runner HTTP API blueprint
Task 6: Add Pydantic schemas for runner API
Files:
Create:
model/schemas/runner.pyModify:
model/schemas/__init__.pyStep 1: Create schema file
model/schemas/__init__.pyuses explicit imports (no wildcard). Add this block at the end:Run:
cd Back-End && poetry run python -c "from model.schemas import RegisterRunnerBody, CompleteJobBody; print('ok')"Expected: prints
okTask 7: Add @require_runner_token decorator
Files:
Test:
tests/unittest/test_runner_auth.pyCreate:
model/utils/runner_auth.pyModify:
model/utils/__init__.py(export the decorator)Step 1: Write failing tests
Run:
cd Back-End && poetry run pytest tests/unittest/test_runner_auth.py -vExpected: FAIL with import error
Add to
model/utils/__init__.py:Run:
cd Back-End && poetry run pytest tests/unittest/test_runner_auth.py -vExpected: All 5 tests PASS
Task 8: Implement model/runner.py blueprint with all 4 endpoints
Files:
Test:
tests/unittest/test_runner_api.pyCreate:
model/runner.pyModify:
model/__init__.py(exportrunner_api)Step 1: Write failing tests for all 4 endpoints
Run:
cd Back-End && poetry run pytest tests/unittest/test_runner_api.py -vExpected: FAIL —
runner_apinot registered, blueprint doesn't existmodel/__init__.pyuses bothfrom . import Xandfrom .X import *, with an aggregated__all__. Add three changes:The new
model/runner.pyalready defines__all__ = ["runner_api"], so this propagates correctly.Modify
app.py'sapi2prefixlist to add the runner_api at prefix/runners:URL convention note: Other blueprints register without
/api/because Caddy rewrites/api/*→ backend's/*. So in production, runners hithttps://noj.tw/api/runners/register, but inside Flask the route is just/runners/register. Tests use the Flask test client (no Caddy), so test URLs use/runners/...(already correct in Step 1).Run:
cd Back-End && poetry run pytest tests/unittest/test_runner_api.py -vExpected: All 9 tests PASS (after URL adjustment)
Task 9: Wire up exhaustion handling (max attempts → mark Submission JE)
Files:
dispatch/job.py(claim_next_job: handle reclaim_result == -1 by marking JE)tests/unittest/dispatch/test_job.py(verify JE marking happens)The Lua script returns
-1when attempts exhausted. The Python wrapper currently swallows this — we need to mark the affected Submission as JE.Append to
tests/unittest/dispatch/test_job.py:Run:
cd Back-End && poetry run pytest tests/unittest/dispatch/test_job.py::test_claim_next_job_marks_submission_je_when_exhausted -vExpected: FAIL — Submission.status not changed
Modify the orphan reclaim loop in
claim_next_jobto call into a callback when result is -1:Run:
cd Back-End && poetry run pytest tests/unittest/dispatch/test_job.py -vExpected: All 10 tests PASS
Phase 3: Submission flow integration
Task 10: Refactor
Submission.submit()to use enqueue_jobFiles:
Modify:
mongo/submission.py(submit method)Modify:
tests/test_submission.py(verify enqueue happens, not HTTP POST)Step 1: Write failing test
Add to
tests/test_submission.py(new test, doesn't replace existing yet):Run:
cd Back-End && poetry run pytest tests/test_submission.py::test_submit_enqueues_job_to_redis_pending -vExpected: FAIL — pending list is empty (current submit() POSTs to sandbox; in TESTING mode it returns True without doing anything)
Find the existing
submitmethod (around line 384) and replace its tail:The key changes:
Remove the
if current_app.config['TESTING'] or self.handwritten:shortcut for non-handwritten.Replace
return self.send()withenqueue_job(self); return True.Step 4: Run test to verify pass
Run:
cd Back-End && poetry run pytest tests/test_submission.py::test_submit_enqueues_job_to_redis_pending -vExpected: PASS
Run:
cd Back-End && poetry run pytest tests/test_submission.py -vExpected: Some tests may fail because they assumed
submit()was a no-op in TESTING mode and now it enqueues. Review failures.For each failing test that calls
submit()and then asserts something about state:add_fake_output()after submit (this fixture already exists intests/utils/submission.py).Document any test changes needed in commit message.
Task 11: Refactor
Submission.rejudge()to use enqueue_jobFiles:
Modify:
mongo/submission.py(rejudge method)Modify:
tests/test_submission.py(test for rejudge enqueuing)Step 1: Write failing test
Run:
cd Back-End && poetry run pytest tests/test_submission.py::test_rejudge_enqueues_new_job_to_pending -vExpected: FAIL
Find
rejudge()(around line 346) and replace its body with this (note: NOTESTINGshortcut — we want enqueue to happen even in tests so we can verify the flow):Run:
cd Back-End && poetry run pytest tests/test_submission.py::test_rejudge_enqueues_new_job_to_pending -vExpected: PASS
Run:
cd Back-End && poetry run pytest tests/test_submission.py -k rejudge -vExpected: All PASS, or document any required test updates
Task 12: Add Redis cleanup fixture to conftest.py
Files:
tests/conftest.pyWithout this, fakeredis state leaks across tests, causing flakes.
Modify
tests/conftest.py— add after existing fixtures:Run:
cd Back-End && poetry run pytest -xExpected: Full suite passes (or document any pre-existing issues separately)
Phase 4: Cleanup — remove old push code
Task 13: Remove old push methods from mongo/submission.py
Files:
mongo/submission.pyRemove these methods (no longer reachable):
target_sandbox(self)(~lines 292-307)send(self)(~lines 423-459)sandbox_resp_handler(self, resp)(~lines 263-290)assign_tokenclassmethodverify_tokenclassmethodAlso remove the import of
requests as rqif unused after removals.Open
mongo/submission.pyand delete the methods listed above. Keepprocess_result()— it's still used by the new complete endpoint.Run:
cd Back-End && poetry run pytest -xExpected: All pass. If anything imports the removed methods, update those imports.
Run:
cd Back-End && grep -rn "target_sandbox\|sandbox_resp_handler\|assign_token\|verify_token" --include="*.py" .Expected: No results (or only in deleted/test code about to be cleaned up)
Task 14: Delete mongo/sandbox.py
Files:
Delete:
mongo/sandbox.pyModify:
mongo/__init__.py(remove import if any)Step 1: Search for usages of find_by_token
Run:
cd Back-End && grep -rn "find_by_token\|from mongo.sandbox\|from mongo import sandbox" --include="*.py" .Update each file to no longer import
find_by_token. If a route used it for the old callback, that route is being removed in Task 15 anyway.cd Back-End git rm mongo/sandbox.pyCheck
mongo/__init__.pyfor any line referring tosandboxand remove it.Run:
cd Back-End && poetry run pytest -xExpected: All pass
Task 15: Remove old
PUT /<submission>/completeroute from model/submission.pyFiles:
Modify:
model/submission.pyModify:
model/schemas/submission.py(remove OnSubmissionCompleteBody)Modify:
model/schemas/__init__.py(remove export)Step 1: Locate the route
Find
@submission_api.put('/<submission>/complete')inmodel/submission.py(around line 345-362). Read the function body to confirm it's the old callback.Remove the entire route handler function from
model/submission.py.OnSubmissionCompleteBodyschemaDelete
OnSubmissionCompleteBodyclass frommodel/schemas/submission.py.Also remove its name from
model/schemas/__init__.py— find the explicit import block:Find
from .schemas import (... OnSubmissionCompleteBody ...)and remove that name from the import.Run:
cd Back-End && poetry run pytest -xExpected: Tests that hit the old endpoint now fail with 404. Update or delete those tests.
Task 16: Remove
PUT /api/submission/config(sandbox_instances) endpointFiles:
Modify:
model/submission.py(remove update_config function)Modify:
model/schemas/submission.py(remove UpdateConfigBody)Modify:
model/schemas/__init__.py(remove export)Step 1: Locate the route
Find
@submission_api.put('/config')(or similar) decoratingupdate_configinmodel/submission.py(~lines 489-537).The endpoint also handles
rate_limit. Two options:a. Delete entire endpoint (simpler —
rate_limitcould be a config file / env var instead)b. Strip out only
sandbox_instanceshandling, keeprate_limitChoose (b): keep the endpoint for
rate_limitto preserve admin functionality, just remove thesandbox_instancesfield handling.In
model/submission.py::update_config:Run:
cd Back-End && poetry run pytest -x -k configExpected: Tests that exercise
sandbox_instancespart fail. Update or remove them.Task 17: Clean references to SubmissionConfig.sandbox_instances in engine.py
Files:
mongo/engine.pyThe field itself stays (to avoid mongo schema conflicts in production), but any default values referencing the (now-removed)
SandboxEmbeddedDocument should be cleaned.Look in
mongo/engine.pyforsandbox_instances = EmbeddedDocumentListField(...)(~line 432).Replace:
With:
Find
class Sandbox(EmbeddedDocument)(~line 422) and delete it entirely.Run:
cd Back-End && poetry run pytest -xExpected: All pass. If anything imported
engine.Sandbox, those imports must be updated/removed.Task 18: Verification pass — full test suite + lint
No code changes — verification only.
Run:
cd Back-End && poetry run pytest --cov=./ --cov-config=.coveragerc -vExpected: All tests pass; coverage report shows new dispatch module covered
Run:
cd Back-End && poetry run yapf --recursive --parallel --diff .Expected: No diff (CI compliance)
Run:
cd Back-End && poetry run yapf -ir .Then verify diff with git, commit:
Phase 5: End-to-end integration tests
Task 19: Mock-runner happy-path integration test
Files:
tests/integration/__init__.pytests/integration/test_runner_flow.pyThis test exercises the full Backend dispatch path via HTTP, simulating a runner with a Python test client.
# tests/integration/__init__.pyRun:
cd Back-End && poetry run pytest tests/integration/test_runner_flow.py -vExpected: All 3 tests PASS
Task 20: Mock-runner orphan reclaim integration test
Files:
Modify:
tests/integration/test_runner_flow.pyStep 1: Add test for orphan reclaim end-to-end
Append to
tests/integration/test_runner_flow.py:Run:
cd Back-End && poetry run pytest tests/integration/test_runner_flow.py::test_orphan_reclaim_when_runner_dies -vExpected: PASS
Task 21: Mock-runner max-attempts → JE integration test
Files:
Modify:
tests/integration/test_runner_flow.pyStep 1: Add test for max-attempts exhaustion
Append:
Run:
cd Back-End && poetry run pytest tests/integration/test_runner_flow.py::test_max_attempts_marks_submission_je -vExpected: PASS
Run:
cd Back-End && poetry run pytest tests/integration/ -vExpected: All tests PASS
Run:
cd Back-End && poetry run pytest --cov=./ --cov-config=.coveragercExpected: All pass; coverage on dispatch/ and model/runner.py at >85%
Plan A Done — Verification Checklist
Before declaring Plan A complete, verify:
poetry run pytestpasses 100%poetry run yapf --recursive --parallel --diff .shows no diffdispatch/andmodel/runner.py>85%target_sandbox,find_by_token, etc.)Handoff to Plan B
Plan B (Runner refactor) is unblocked once this plan is complete and Backend is deployed in a test/staging environment. Plan B will:
Sandbox/app.pyandSandbox/dispatcher/dispatcher.pySandbox/Dockerfiledocker-compose.ymlfor Redis AOF + new env vars