Skip to content

Fix workflow stop in online mode by sending stop command to RQ worker#383

Merged
t0mdavid-m merged 2 commits intomainfrom
claude/fix-vendor-queue-error-EwPar
Apr 28, 2026
Merged

Fix workflow stop in online mode by sending stop command to RQ worker#383
t0mdavid-m merged 2 commits intomainfrom
claude/fix-vendor-queue-error-EwPar

Conversation

@t0mdavid-m
Copy link
Copy Markdown
Member

Summary

Fixes a bug where clicking "Stop Workflow" in online mode (vendor queue) would not actually interrupt a running workflow. The worker would continue executing while the UI showed inconsistent state.

Problem

When QueueManager.cancel_job() was called on a job in the "started" state, it only called Job.cancel(), which marks the job as canceled in Redis registries but does not interrupt the worker that is actively executing the workflow. This left the worker running the job to completion while the UI showed the job as canceled.

Solution

  • Send stop command to worker: For jobs in "started" state with an assigned worker, call rq.command.send_stop_job_command() to message the worker over Redis pubsub and interrupt the work-horse process.
  • Handle edge cases gracefully:
    • Jobs without a worker_name assigned (race condition) fall back to Job.cancel()
    • Double-clicks on "Stop Workflow" don't raise InvalidJobOperation errors
    • Missing jobs return False instead of raising
  • Map RQ "stopped" status: Added "stopped" → JobStatus.CANCELED mapping in get_job_info() so the UI correctly displays stopped jobs as canceled rather than queued.

Key Changes

  • Enhanced cancel_job() to detect started jobs and send stop command before canceling
  • Added proper exception handling for InvalidJobOperation and NoSuchJobError
  • Made cancellation idempotent (safe to call multiple times)
  • Updated status mapping to recognize RQ's "stopped" state
  • Added comprehensive test coverage for all cancellation scenarios

https://claude.ai/code/session_01Ny1NgFejDt9w6mNnNpEFuB

Job.cancel() only updates Redis registries; for jobs already executing in
a worker it leaves the work-horse running, so the workflow keeps producing
log output and the UI shows inconsistent state after Stop is pressed.

cancel_job now sends rq.command.send_stop_job_command to the worker for
started jobs, treats already-canceled/stopped jobs as success (idempotent
double-clicks), and maps RQ's 'stopped' status to CANCELED in get_job_info
so stopped jobs don't appear stuck in 'queued'.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

Warning

Rate limit exceeded

@t0mdavid-m has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 56 minutes and 52 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fd4bb9d6-8c50-46db-a43a-cb08049297eb

📥 Commits

Reviewing files that changed from the base of the PR and between b855b5e and 44c48d1.

📒 Files selected for processing (3)
  • .github/workflows/ci.yml
  • src/workflow/QueueManager.py
  • tests/test_queue_manager_cancel.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/fix-vendor-queue-error-EwPar

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@t0mdavid-m t0mdavid-m merged commit ce4092f into main Apr 28, 2026
13 checks passed
t0mdavid-m pushed a commit that referenced this pull request Apr 28, 2026
After PR #383 the RQ worker actually terminates on Stop, but the UI
kept showing "running" and a second Stop click rendered "Errors
occurred". Two causes:

1. stop_workflow cleared .job_id on success, so get_workflow_status
   fell through to the local-mode pid_dir fallback. The killed worker
   left stale child PID files there, so the fallback flipped running
   back to True forever.
2. The static log-display branch only knew "WORKFLOW FINISHED" vs
   error, so a cancelled run was misreported as an error.

Fix: stop_workflow now writes a "WORKFLOW CANCELLED" marker via
Logger and removes the stale pid_dir; .job_id is kept so the queue
status flow stays authoritative and renders the Cancelled pill.
StreamlitUI's static display dispatches through a new pure helper
classify_log_outcome (finished/cancelled/error). Also fills in the
missing canceled branch in _show_queue_status so the queue pill
actually renders for canceled jobs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants