fix(_run): terminate loop on config parse errors (#14)#15
Merged
Conversation
The polling loop in _run required both `execution.return_code` and
`execution.finished` to terminate. `execution.finished` is parsed from
`.nextflow.log` and only set when the log ends with "Goodbye" or a
recognised stack trace. When Nextflow exits early on a config parse
error (e.g. an unresolved ${VAR} in nextflow.config), the log ends with
the offending source line, `finished` is never set, and the loop spins
forever even though the subprocess has exited and rc.txt has been
written.
`rc.txt` is the more reliable signal: the shell `; echo $? >rc.txt`
runs unconditionally after Nextflow exits, so a non-empty return_code
means the subprocess is done. This restores the behaviour from v0.11
without re-introducing the Popen reference that d681e3e was deliberately
removing.
Bumps version to 0.13.1.
Fixes #14.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Point the published package's author metadata at the team rather than an individual maintainer. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous fix made the polling loop break on `execution.return_code` alone. That works for fresh runs, but when the same output_path is reused (e.g. resume runs), rc.txt still holds the return code from the previous run while nextflow is starting up. The shell only writes the new return code after nextflow exits (`; echo $? >rc.txt`), so the polling loop would see the stale rc, treat the new run as finished immediately, and break before any output was captured. Prepending `:>rc.txt;` to the shell command truncates the file to zero bytes at the start, so the polling loop reads an empty rc until the new run actually finishes. `:` is a POSIX no-op, so this is portable. Caught by CI: `test_can_handle_pipeline_error` in both run and poll integration suites — the resume run after an errored first run was returning immediately with stale state. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Collaborator
Author
|
@claude please review this PR |
Previous push (bcd0cfd) did not trigger a CI run — webhook dropped. This empty commit forces Actions to schedule a fresh run on the fixed code. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Collaborator
Author
|
@claude please review this PR |
GitHub Actions outage dropped earlier webhook deliveries; pushing an empty commit to schedule a fresh run on the fixed code. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ale poll Breaking the loop on rc.txt alone races the log on slower I/O: nextflow writes 'Task completed' lines for killed processes and 'Goodbye' just before exiting, but those entries can land on disk *after* rc.txt is read in the same poll iteration. The previous fix worked on macOS but the integration test 'test_can_handle_pipeline_error' failed on Linux CI because lower2.status hadn't been parsed yet when the loop broke. Restore execution.finished as the primary terminate signal (which chronologically guarantees log entries have settled). Fall back to rc-only after a second poll where rc is still set but finished hasn't arrived — covers the issue #14 case (config parse error) where the log doesn't end with 'Goodbye' or a stack trace and finished will never be parsed. Costs at most one extra poll interval (default 1s) on the early-exit path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_runpolling loop hangs when Nextflow exits early on a config parse errorexecution.return_codealone —rc.txtis written unconditionally by the shell wrapper after Nextflow exits, so it's a more reliable signal thanexecution.finished(which is parsed from the log)Why this regressed
In v0.11 the loop polled the subprocess directly (
process.poll()). The v0.12 refactor moved subprocess creation intosubmit_execution, and commit d681e3e ("Fix process reference problem") deliberately stopped holding a reference to thePopenobject — to avoid subprocess GC/finalizer noise — and substitutedexecution.finishedas the done-signal. That parses from.nextflow.log, which doesn't terminate with a recognised pattern when Nextflow exits onConfigParseException, so the loop never breaks.Switching to
execution.return_codepreserves d681e3e's "no Popen reference" design goal while restoring v0.11's actual behaviour.Test plan
test_loop_terminates_when_return_code_set_without_finishedcovering the bug (fails without the fix, passes with it)test_can_run_and_pollto gate onreturn_code(the new semantics) instead offinished🤖 Generated with Claude Code