Symptom
test/cli/logs.test.ts > bitsocial logs -f log file rotation (synthetic) > continues watching old file if no new file appears (test/cli/logs.test.ts:426) intermittently fails only on windows-latest. ubuntu and macos consistently pass.
When it fails, the assertion error is:
AssertionError: expected '[2026-05-01T00:00:00.000Z] initial li…' to contain 'APPENDED_LINE'
- Expected
+ Received
- APPENDED_LINE
+ [2026-05-01T00:00:00.000Z] initial line
+
+
i.e. bitsocial logs -f printed the seed line correctly but never picked up the line the test appended to the file. The test waits 8 seconds (visible from 8047ms/8041ms durations across runs) before timing out the assertion.
Observed history
Both #43 (fix/logs-follow-windows-stat-stale) and #41 (fix/logs-follow-windows-polling) targeted this code path. The test is much less flaky than it used to be — but clearly still not deterministic.
What this PR is asking for
Not "fix the bug now" — we don't have enough signal from CI to know whether this is:
- a real watcher race that still exists on Windows under load,
- a filesystem-timestamp issue (mtime resolution differs on Windows),
- a buffering/flush issue in the child
bitsocial logs -f process,
- a vitest fork-pool scheduling issue specific to the Windows runner,
- or something else.
Right now the failure reproduces ~1 time in N on a hosted runner, never (or very rarely) on a developer's own Windows machine. Without more data we'd be guessing.
Concrete asks, in priority order:
- Have the test dump more diagnostics on failure. When the assertion fails, log:
- the full contents of the watched log file (so we can see whether the appended line landed on disk at all);
- the directory listing (so we can see whether file rotation kicked in unexpectedly);
- the captured stdout+stderr of the spawned
bitsocial logs -f child (currently it's discarded);
- the wall-clock timeline of:
write seed line → spawn child → expect "initial line" → append APPENDED_LINE → expect "APPENDED_LINE" (with timestamps), so we can tell whether the watcher saw the append at all or whether it saw it but the assertion polled too soon.
- the file's
stat result (mtime/atime/size in ms) before and after the append.
- Tee the spawned child's stderr/stdout to a file and upload as a CI artifact on failure — so the next Windows flake gives us its complete
logs -f trace instead of just the test's expectation diff.
- Run the test in a loop (e.g.
repeats: 20) on Windows specifically in a separate scheduled CI job, not the main PR check. A "5/20 fails" signal is much more actionable than "1/many merges fails."
Once #1 and #2 land, we can let the natural flake rate produce one good failure log and root-cause it. Until then every retry just re-rolls the dice.
Related
Symptom
test/cli/logs.test.ts > bitsocial logs -f log file rotation (synthetic) > continues watching old file if no new file appears(test/cli/logs.test.ts:426) intermittently fails only onwindows-latest. ubuntu and macos consistently pass.When it fails, the assertion error is:
i.e.
bitsocial logs -fprinted the seed line correctly but never picked up the line the test appended to the file. The test waits 8 seconds (visible from8047ms/8041msdurations across runs) before timing out the assertion.Observed history
Both #43 (
fix/logs-follow-windows-stat-stale) and #41 (fix/logs-follow-windows-polling) targeted this code path. The test is much less flaky than it used to be — but clearly still not deterministic.What this PR is asking for
Not "fix the bug now" — we don't have enough signal from CI to know whether this is:
bitsocial logs -fprocess,Right now the failure reproduces ~1 time in N on a hosted runner, never (or very rarely) on a developer's own Windows machine. Without more data we'd be guessing.
Concrete asks, in priority order:
bitsocial logs -fchild (currently it's discarded);write seed line→spawn child→expect "initial line"→append APPENDED_LINE→expect "APPENDED_LINE"(with timestamps), so we can tell whether the watcher saw the append at all or whether it saw it but the assertion polled too soon.statresult (mtime/atime/size in ms) before and after the append.logs -ftrace instead of just the test's expectation diff.repeats: 20) on Windows specifically in a separate scheduled CI job, not the main PR check. A "5/20 fails" signal is much more actionable than "1/many merges fails."Once #1 and #2 land, we can let the natural flake rate produce one good failure log and root-cause it. Until then every retry just re-rolls the dice.
Related