Skip to content

fix(_run): terminate loop on config parse errors (#14)#15

Merged
mhusbynflow merged 6 commits into
masterfrom
mhusbynflow/investigate-github-issue-14
May 26, 2026
Merged

fix(_run): terminate loop on config parse errors (#14)#15
mhusbynflow merged 6 commits into
masterfrom
mhusbynflow/investigate-github-issue-14

Conversation

@mhusbynflow

Copy link
Copy Markdown
Collaborator

Summary

  • Fixes Errors parsing the configuration do not stop the Python process #14: the _run polling loop hangs when Nextflow exits early on a config parse error
  • Loop now breaks on execution.return_code alone — rc.txt is written unconditionally by the shell wrapper after Nextflow exits, so it's a more reliable signal than execution.finished (which is parsed from the log)
  • Bumps version to 0.13.1

Why this regressed

In v0.11 the loop polled the subprocess directly (process.poll()). The v0.12 refactor moved subprocess creation into submit_execution, and commit d681e3e ("Fix process reference problem") deliberately stopped holding a reference to the Popen object — to avoid subprocess GC/finalizer noise — and substituted execution.finished as the done-signal. That parses from .nextflow.log, which doesn't terminate with a recognised pattern when Nextflow exits on ConfigParseException, so the loop never breaks.

Switching to execution.return_code preserves d681e3e's "no Popen reference" design goal while restoring v0.11's actual behaviour.

Test plan

  • New unit test test_loop_terminates_when_return_code_set_without_finished covering the bug (fails without the fix, passes with it)
  • Updated test_can_run_and_poll to gate on return_code (the new semantics) instead of finished
  • All 124 unit tests pass
  • Manual smoke test with the repro from Errors parsing the configuration do not stop the Python process #14 (nextflow.config referencing an unset variable)

🤖 Generated with Claude Code

mhusbynflow and others added 3 commits May 26, 2026 11:31
The polling loop in _run required both `execution.return_code` and
`execution.finished` to terminate. `execution.finished` is parsed from
`.nextflow.log` and only set when the log ends with "Goodbye" or a
recognised stack trace. When Nextflow exits early on a config parse
error (e.g. an unresolved ${VAR} in nextflow.config), the log ends with
the offending source line, `finished` is never set, and the loop spins
forever even though the subprocess has exited and rc.txt has been
written.

`rc.txt` is the more reliable signal: the shell `; echo $? >rc.txt`
runs unconditionally after Nextflow exits, so a non-empty return_code
means the subprocess is done. This restores the behaviour from v0.11
without re-introducing the Popen reference that d681e3e was deliberately
removing.

Bumps version to 0.13.1.

Fixes #14.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Point the published package's author metadata at the team rather than
an individual maintainer.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous fix made the polling loop break on `execution.return_code`
alone. That works for fresh runs, but when the same output_path is
reused (e.g. resume runs), rc.txt still holds the return code from
the previous run while nextflow is starting up. The shell only writes
the new return code after nextflow exits (`; echo $? >rc.txt`), so the
polling loop would see the stale rc, treat the new run as finished
immediately, and break before any output was captured.

Prepending `:>rc.txt;` to the shell command truncates the file to zero
bytes at the start, so the polling loop reads an empty rc until the
new run actually finishes. `:` is a POSIX no-op, so this is portable.

Caught by CI: `test_can_handle_pipeline_error` in both run and poll
integration suites — the resume run after an errored first run was
returning immediately with stale state.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@mhusbynflow

Copy link
Copy Markdown
Collaborator Author

@claude please review this PR

Previous push (bcd0cfd) did not trigger a CI run — webhook dropped.
This empty commit forces Actions to schedule a fresh run on the
fixed code.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@mhusbynflow

Copy link
Copy Markdown
Collaborator Author

@claude please review this PR

mhusbynflow and others added 2 commits May 26, 2026 14:12
GitHub Actions outage dropped earlier webhook deliveries; pushing
an empty commit to schedule a fresh run on the fixed code.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ale poll

Breaking the loop on rc.txt alone races the log on slower I/O: nextflow
writes 'Task completed' lines for killed processes and 'Goodbye' just
before exiting, but those entries can land on disk *after* rc.txt is
read in the same poll iteration. The previous fix worked on macOS but
the integration test 'test_can_handle_pipeline_error' failed on Linux
CI because lower2.status hadn't been parsed yet when the loop broke.

Restore execution.finished as the primary terminate signal (which
chronologically guarantees log entries have settled). Fall back to
rc-only after a second poll where rc is still set but finished hasn't
arrived — covers the issue #14 case (config parse error) where the
log doesn't end with 'Goodbye' or a stack trace and finished will
never be parsed.

Costs at most one extra poll interval (default 1s) on the early-exit
path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@mhusbynflow mhusbynflow merged commit 9ebf001 into master May 26, 2026
12 checks passed
@mhusbynflow mhusbynflow deleted the mhusbynflow/investigate-github-issue-14 branch May 26, 2026 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Errors parsing the configuration do not stop the Python process

1 participant