fix(worker): degrade gracefully when analytics file is empty#81
Merged
Conversation
When the user un-ticks 'Activity history' in Discord's data-request form, Discord still ships an analytics file path in the zip but the file is empty. The old code: 1. Set is_partial=False the moment a file path was found, regardless of whether the file had any events. 2. Crashed at the end of the loop because the average-time print divides by len(compute_time_per_line), which stays [] for an empty file. ZeroDivisionError surfaced as UNKNOWN_ERROR. Originally I had this raise MISSING_ACTIVITY_DATA, but a Discord re-export takes ~30 days to land, and the user's package still has real value without analytics: owner profile, server list, DMs, payments, and per-channel message totals (which come from the CSVs, not analytics) are all available. Erroring out makes us throw away all of that for a month-long round-trip the user may not realize they signed up for. So: keep is_partial=True (already the default), guard the divide, let processing complete. The frontend already has package_is_partial in the SQLite output and can render activity-driven stats as N/A when set. (Issue dumpus-app/dumpus-app#232 tracks the proper banner UX for this.)
67c2bca to
a3fadf2
Compare
Androz2091
added a commit
that referenced
this pull request
May 4, 2026
…tics_line_count (#82) PR #81 used analytics_line_count > 0 as the divide-by-zero guard, but that variable is incremented at the top of the analytics loop — before the 'if not event_type: continue' skip. So a file whose lines all lack event_type (Discord apparently ships these for some kinds of partial exports) re-hit the same ZeroDivisionError surfaced as UNKNOWN_ERROR for package 14be73071701467d3aae144ab2e72f86. The actual condition for 'we have nothing to average' is that compute_time_per_line is empty. Guard on it directly. Behavior is identical for the two clean cases (empty file, fully-populated file) and additionally handles the lines-without-event_type case gracefully — is_partial stays True, no stats produced, no crash.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A user reported `UNKNOWN_ERROR` for package `14be73071701467d3aae144ab2e72f86`. CloudWatch:
```
File "/app/tasks.py", line 429, in read_analytics_file
print(f'Average compute time per line: {sum(compute_time_per_line) / len(compute_time_per_line)}')
ZeroDivisionError: division by zero
```
Two interacting bugs:
This shows up when the user un-ticks Activity history in Discord's export request form. Discord ships an empty analytics file rather than omitting the path.
Why graceful instead of erroring
I initially shipped this as a `MISSING_ACTIVITY_DATA` hard error (analogous to `MISSING_USER_DATA` from #80) and a matching frontend copy in dumpus-app/dumpus-app#422. On reflection that's the wrong call: a Discord re-export takes ~30 days, and the user's package still has real value without analytics:
What's lost: daily-sent-messages chart, sending-times-by-hour, sessions, voice sessions, top items ranked by activity. Asking the user to throw away the rest for a month-long round-trip is unkind.
Fix
```diff
if analytics_file_name:
```
`is_partial` defaults to `True` at the top of `read_analytics_file` (line 152). It was being incorrectly flipped to `False` before we knew whether the file had real events. Fix: only flip to `False` when we actually parsed at least one event.
Companion frontend work (separate concern, not this PR)
The SQLite output already has `package_data.package_is_partial`. Whether the frontend currently renders activity-driven stats gracefully when partial is its own thread — tracked by dumpus-app/dumpus-app#232 ("Add a banner informing users that some stats are missing"). The bare minimum is that nothing crashes; the polish is showing N/A on the activity-only screens.
The `MISSING_ACTIVITY_DATA` frontend PR (dumpus-app/dumpus-app#422) is no longer needed and will be closed.
Test plan