Skip to content

fix(worker): degrade gracefully when analytics file is empty#81

Merged
Androz2091 merged 1 commit into
mainfrom
fix/empty-analytics-file
May 4, 2026
Merged

fix(worker): degrade gracefully when analytics file is empty#81
Androz2091 merged 1 commit into
mainfrom
fix/empty-analytics-file

Conversation

@Androz2091
Copy link
Copy Markdown
Member

@Androz2091 Androz2091 commented May 4, 2026

A user reported `UNKNOWN_ERROR` for package `14be73071701467d3aae144ab2e72f86`. CloudWatch:

```
File "/app/tasks.py", line 429, in read_analytics_file
print(f'Average compute time per line: {sum(compute_time_per_line) / len(compute_time_per_line)}')
ZeroDivisionError: division by zero
```

Two interacting bugs:

  1. Crash: the averaging print divides by `len(compute_time_per_line)`. When the package's analytics file path exists in the zip but the file is empty, the for-loop runs zero times, the list stays `[]`, and the print explodes → relabeled as `UNKNOWN_ERROR`.
  2. `is_partial` lie: `is_partial` was set to `False` the moment a file path was found, regardless of whether the file had any events. So even avoiding the crash above, the run would have been called successful with all-zero activity stats.

This shows up when the user un-ticks Activity history in Discord's export request form. Discord ships an empty analytics file rather than omitting the path.

Why graceful instead of erroring

I initially shipped this as a `MISSING_ACTIVITY_DATA` hard error (analogous to `MISSING_USER_DATA` from #80) and a matching frontend copy in dumpus-app/dumpus-app#422. On reflection that's the wrong call: a Discord re-export takes ~30 days, and the user's package still has real value without analytics:

  • ✅ Owner profile + avatar (`user.json`)
  • ✅ Server list + names (`servers/index.json`)
  • ✅ Friends / DM partners
  • ✅ Payments / Nitro spent
  • ✅ Per-channel message counts (CSV-derived, not analytics-derived)

What's lost: daily-sent-messages chart, sending-times-by-hour, sessions, voice sessions, top items ranked by activity. Asking the user to throw away the rest for a month-long round-trip is unkind.

Fix

```diff
if analytics_file_name:

  •        is_partial = False
    
  •        compute_time_per_line = []
           for line in TextIOWrapper(zip.open(analytics_file_name)):
               …
    
  •        print(f'Average compute time per line: {sum(compute_time_per_line) / len(compute_time_per_line)}')
    
  •        if analytics_line_count > 0:
    
  •            is_partial = False
    
  •            print(f'Average compute time per line: {sum(compute_time_per_line) / len(compute_time_per_line)}')
    
  •        else:
    
  •            print('Analytics file is empty — keeping is_partial=True; activity-driven stats will be N/A on the client.')
    

```

`is_partial` defaults to `True` at the top of `read_analytics_file` (line 152). It was being incorrectly flipped to `False` before we knew whether the file had real events. Fix: only flip to `False` when we actually parsed at least one event.

Companion frontend work (separate concern, not this PR)

The SQLite output already has `package_data.package_is_partial`. Whether the frontend currently renders activity-driven stats gracefully when partial is its own thread — tracked by dumpus-app/dumpus-app#232 ("Add a banner informing users that some stats are missing"). The bare minimum is that nothing crashes; the polish is showing N/A on the activity-only screens.

The `MISSING_ACTIVITY_DATA` frontend PR (dumpus-app/dumpus-app#422) is no longer needed and will be closed.

Test plan

  • Merge → CI deploys.
  • Re-submit the failing package — expect it to PROCESS successfully (no UNKNOWN_ERROR) with `is_partial=true` in the resulting SQLite.
  • Open the package in the web app — sanity-check that activity-driven screens render without crashing (they may show empty/zero values until #232 lands).
  • No regression on packages with real activity data: `is_partial` resolves to `false`, all screens populate.

When the user un-ticks 'Activity history' in Discord's data-request
form, Discord still ships an analytics file path in the zip but the
file is empty. The old code:

1. Set is_partial=False the moment a file path was found, regardless
   of whether the file had any events.
2. Crashed at the end of the loop because the average-time print
   divides by len(compute_time_per_line), which stays [] for an
   empty file. ZeroDivisionError surfaced as UNKNOWN_ERROR.

Originally I had this raise MISSING_ACTIVITY_DATA, but a Discord
re-export takes ~30 days to land, and the user's package still has
real value without analytics: owner profile, server list, DMs,
payments, and per-channel message totals (which come from the CSVs,
not analytics) are all available. Erroring out makes us throw away
all of that for a month-long round-trip the user may not realize
they signed up for.

So: keep is_partial=True (already the default), guard the divide,
let processing complete. The frontend already has package_is_partial
in the SQLite output and can render activity-driven stats as N/A
when set. (Issue dumpus-app/dumpus-app#232 tracks the proper banner
UX for this.)
@Androz2091 Androz2091 force-pushed the fix/empty-analytics-file branch from 67c2bca to a3fadf2 Compare May 4, 2026 08:04
@Androz2091 Androz2091 changed the title fix(worker): MISSING_ACTIVITY_DATA when analytics file is empty fix(worker): degrade gracefully when analytics file is empty May 4, 2026
@Androz2091 Androz2091 merged commit bb688fb into main May 4, 2026
1 check passed
@Androz2091 Androz2091 deleted the fix/empty-analytics-file branch May 4, 2026 08:07
Androz2091 added a commit that referenced this pull request May 4, 2026
…tics_line_count (#82)

PR #81 used analytics_line_count > 0 as the divide-by-zero guard, but
that variable is incremented at the top of the analytics loop —
before the 'if not event_type: continue' skip. So a file whose lines
all lack event_type (Discord apparently ships these for some kinds of
partial exports) re-hit the same ZeroDivisionError surfaced as
UNKNOWN_ERROR for package 14be73071701467d3aae144ab2e72f86.

The actual condition for 'we have nothing to average' is that
compute_time_per_line is empty. Guard on it directly. Behavior is
identical for the two clean cases (empty file, fully-populated file)
and additionally handles the lines-without-event_type case gracefully
— is_partial stays True, no stats produced, no crash.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant