Skip to content

fix(worker): surface EXPIRED_LINK / INVALID_LINK instead of UNKNOWN_ERROR#79

Merged
Androz2091 merged 1 commit into
mainfrom
fix/expired-link-error-mapping
May 2, 2026
Merged

fix(worker): surface EXPIRED_LINK / INVALID_LINK instead of UNKNOWN_ERROR#79
Androz2091 merged 1 commit into
mainfrom
fix/expired-link-error-mapping

Conversation

@Androz2091
Copy link
Copy Markdown
Member

Summary

Two interacting bugs were causing every link-download failure to surface as `UNKNOWN_ERROR`, including the very common case of a Discord link that expired before the worker picked it up.

Bug 1: download_file treated 4xx as `INVALID_LINK`

`click.discord.com` 302s to a Google Cloud Storage signed URL. Once the signature TTL is up (Discord exports expire fairly quickly), the GCS URL returns `HTTP 400`. The old code:

```python
if r.status_code != 200 or 'content-type' not in r.headers or 'application/octet-stream' not in r.headers['content-type']:
raise Exception('INVALID_LINK')
```

…raised `INVALID_LINK`, which was wrong — the link wasn't malformed, it was expired.

Bug 2: `process_package` masked everything except EXPIRED_LINK

```python
expected = ('EXPIRED_LINK') # ← this is a STRING, not a tuple!
if expected not in current:
current = 'UNKNOWN_ERROR'
```

Python tuples need a comma: `('EXPIRED_LINK',)`. The intended single-element tuple is just the bare string. The `in` check then accidentally did substring match on a single code — so even when the worker raised `INVALID_LINK`, the route relabeled it `UNKNOWN_ERROR`.

Fix

  • HEAD returns 4xx → raise `EXPIRED_LINK` (verified: the user's example expired link returns `HTTP 400` from the GCS redirect target).
  • Wrong status / missing or wrong content-type → `INVALID_LINK`.
  • Anything else → `UNKNOWN_ERROR` with the full traceback preserved.
  • `EXPECTED_ERROR_CODES` is now a proper tuple of both known codes.

Verified against a real expired link

```
$ curl -sS -I -L "<user's example expired upn>"
HTTP/2 302 ← click.discord.com
location: https://storage.googleapis.com/discord-harvest-prd/.../package.zip?...

HTTP/2 400 ← GCS, signature expired
```

So `requests.head(link, allow_redirects=True).status_code` is `400` for an expired link → my new branch catches it and raises `EXPIRED_LINK`.

Frontend follow-up (separate)

The frontend's status-response type doesn't currently list `INVALID_LINK` as a possible `errorMessageCode` (only the four: UNKNOWN_PACKAGE_ID, UNKNOWN_ERROR, UNAUTHORIZED, EXPIRED_LINK). If we want the friendly "Link expired" copy to render for INVALID_LINK too, that's a one-line type widening in dumpus-app — happy to open it as a follow-up PR.

Test plan

  • Merge → CI deploys.
  • Submit the user's example expired link via `/process`. Expect status to settle on `isErrored: true, errorMessageCode: 'EXPIRED_LINK'` (not `UNKNOWN_ERROR`).
  • CloudWatch should show `Link is expired (upstream returned 400).` instead of `The link does not point to a valid file.` for these.

…RROR

Two bugs were combining to mask real failures:

1. download_file treated any non-200 HEAD as INVALID_LINK, including the
   4xx response that click.discord.com → GCS gives once the signed URL
   has expired. The user's link looked fine when they pasted it; it just
   timed out before the worker got to it.

2. process_package had `expected = ('EXPIRED_LINK')` — a string, not a
   tuple, because the parens are grouping rather than building a tuple.
   The substring check (`if expected not in current`) only recognized
   EXPIRED_LINK; INVALID_LINK and any future codes got relabeled as
   UNKNOWN_ERROR.

Net effect from the user's POV: they'd retry an expired link and see
'UNKNOWN_ERROR' three times before realizing they needed a fresh export.

Now:
- HEAD 4xx → EXPIRED_LINK (the common case for stale links).
- HEAD non-200 / wrong content-type → INVALID_LINK.
- Anything else → UNKNOWN_ERROR with the full traceback preserved.
- The expected-codes list is a real tuple containing both known codes.
@Androz2091 Androz2091 merged commit 4aa85e5 into main May 2, 2026
1 check passed
@Androz2091 Androz2091 deleted the fix/expired-link-error-mapping branch May 2, 2026 02:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant