Skip to content

fix(api): scale S3 upload timeout with payload size#253

Merged
canyugs merged 1 commit into
mainfrom
can/pla-1786-cli-s3-upload-timeout
Jun 15, 2026
Merged

fix(api): scale S3 upload timeout with payload size#253
canyugs merged 1 commit into
mainfrom
can/pla-1786-cli-s3-upload-timeout

Conversation

@canyugs

@canyugs canyugs commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Problem

zeabur deploy / zeabur upload intermittently fail with:

ERROR failed to upload to S3: Put "https://zeabur-uploads.s3.ap-east-1.amazonaws.com/...":
context deadline exceeded (Client.Timeout exceeded while awaiting headers)

The upload flow shares a single http.Client{Timeout: 30 * time.Second} across three requests:

POST /v2/upload          (small JSON)        — 30s fine
PUT  S3 presigned URL    (entire zip body)   — 30s caps throughput at ~14 Mbps for 50 MiB ← bug
POST /v2/upload/{id}/prepare (small JSON)    — 30s fine

http.Client.Timeout covers the full request including body transfer, so large uploads always fail on slow links and there is no flag to raise the limit. Forum report: user in Taiwan uploading ~50 MiB, 12 failures vs 3 successes in one evening (failures clustered at residential peak hours); they gave up and switched to GitHub-based deploy.

Prior art

  • AWS SDK for Go v2 sets no http.Client.Timeout for S3 transfers — cancellation is context-driven, sized to the operation.
  • rclone deliberately uses an idle/chunk timeout instead of a total-transfer deadline for the same reason: a fixed total timeout punishes large payloads on slow links.

Solution

Keep the 30s client for the two small API calls. Give the S3 PUT its own context deadline that scales with payload size via util.UploadTimeout:

// 2-minute floor + 1s per 256 KiB (~2 Mbps floor); 50 MiB → ~5.3 min
uploadCtx, cancel := context.WithTimeout(ctx, util.UploadTimeout(int64(len(zipBytes))))

Applied identically to both copies of the flow (pkg/api/service.go and internal/cmd/upload/upload.go).

Why not a flag?

A --upload-timeout flag pushes the failure onto the user (they only learn about it after the error). A size-scaled ceiling fixes the common case with zero configuration; a flag can still be added later if someone hits the conservative floor.

Caveat verified

Presigned URL expiry is signed server-side at 30 minutes (upload_v2.go: s3.WithPresignExpires(30*time.Minute)). The PUT starts seconds after signing and S3 validates the signature at request start, so the upload window is safe. The timeout only exceeds 30 min beyond ~420 MiB, far above plan limits.

Validation

  • TDD: TestUploadTimeout written first (5 cases: floor, scaling at 1 MiB / 50 MiB / 1 GiB)
  • go build ./..., go vet ./..., go test ./... all clean
  • Behavioral check against a slow mock server: old pattern reproduces the exact reported error; new pattern survives the slow window and still enforces its own deadline when truly stuck

Related issues & labels

  • Refs PLA-1786

🤖 Generated with Claude Code


View with Codesmith Autofix with Codesmith
Need help on this PR? Tag /codesmith with what you need. Autofix is disabled.

Summary by CodeRabbit

  • New Features

    • Upload operations now use intelligent, size-based timeout management that adjusts duration according to file size, improving reliability for large transfers.
  • Tests

    • Added tests validating upload timeout behavior across various file sizes.

The upload flow shared a single http.Client{Timeout: 30s} across the
create-session, S3 PUT, and prepare calls. Client.Timeout covers the
entire request including body transfer, so a ~50MB zip required
~14 Mbps sustained upstream to finish in time — large uploads always
failed on slow links, with no flag to raise the limit.

Keep the 30s client for the two small API calls; give the S3 PUT its
own context deadline via util.UploadTimeout: a 2-minute floor plus 1s
per 256 KiB of payload (~5.3 min for 50 MiB). Applied to both copies
of the flow (pkg/api/service.go and internal/cmd/upload/upload.go).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 12, 2026 08:33
@linear-code

linear-code Bot commented Jun 12, 2026

Copy link
Copy Markdown

PLA-1786

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6bd148d3-d8f3-4fe1-9ec5-30c0806423aa

📥 Commits

Reviewing files that changed from the base of the PR and between 19b605f and ce18d38.

📒 Files selected for processing (4)
  • internal/cmd/upload/upload.go
  • pkg/api/service.go
  • pkg/util/http.go
  • pkg/util/http_test.go

📝 Walkthrough

Walkthrough

This PR replaces the fixed 30-second timeout for S3 uploads with a size-based timeout. A new UploadTimeout utility function computes the deadline as 2 minutes plus 1 second per 256 KiB of payload. Both internal/cmd/upload and pkg/api/service apply this function when creating the S3 PUT request context and send it using a fresh http.Client instead of a shared timeout-limited client. The upload session creation, status validation, and deployment preparation steps remain unchanged.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: introducing size-scaled timeouts for S3 uploads to address timeout issues on slow connections.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses intermittent context deadline exceeded failures during S3 uploads by removing the fixed 30s whole-request deadline from the large S3 PUT and replacing it with a size-scaled context timeout, while keeping the existing 30s http.Client timeout for the small Zeabur API calls.

Changes:

  • Add util.UploadTimeout(contentLength) to compute a size-scaled upload deadline.
  • Apply the size-scaled context timeout to the S3 PUT request in both upload flows (API client and zeabur upload command).
  • Add unit tests for UploadTimeout.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
pkg/util/http.go Adds UploadTimeout helper used to size the S3 PUT deadline.
pkg/util/http_test.go Adds tests covering UploadTimeout behavior.
pkg/api/service.go Uses a size-scaled context deadline for the S3 PUT in the API upload flow.
internal/cmd/upload/upload.go Uses a size-scaled context deadline for the S3 PUT in the CLI upload flow.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/util/http.go
Comment on lines +18 to +20
func UploadTimeout(contentLength int64) time.Duration {
return 2*time.Minute + time.Duration(contentLength/(256*1024))*time.Second
}
Comment thread pkg/util/http_test.go
Comment on lines +87 to +91
{"zero size gets the 2min floor", 0, 2 * time.Minute},
{"tiny payload still gets the floor", 1024, 2 * time.Minute},
{"1 MiB adds 4s over the floor", 1 << 20, 2*time.Minute + 4*time.Second},
{"50 MiB scales to ~5.3 min", 50 << 20, 2*time.Minute + 200*time.Second},
{"1 GiB scales to ~70 min", 1 << 30, 2*time.Minute + 4096*time.Second},
@zeabur-review-agent

Copy link
Copy Markdown

Code Review Synthesis — #253

PR: fix(api): scale S3 upload timeout with payload size

Panel Verdict: APPROVE (5/7 reviewed, 4 approve, 1 request changes)


Summary

The PR replaces the fixed 30s http.Client.Timeout (which bounded the entire body transfer) with a size-scaled context.WithTimeout via the new util.UploadTimeout helper. Both S3 upload paths (internal/cmd/upload/upload.go and pkg/api/service.go) are updated consistently, and a focused unit test covers the math.


Findings

🔴 Suggested Changes (from Legolas — request changes)

# Location Issue
1 pkg/api/service.go Reassigning resp after registering defer resp.Body.Close() increases cognitive complexity. Consider using a distinct variable name for the second response.
2–3 (2 additional findings from Legolas) See Legolas's full review for details.

🟡 Nit

# Reviewer Location Note
N1 Frodo pkg/util/http_test.go Test covers the math helper but not an actual slow-link HTTP transfer contract (manual verification only). Non-blocking.

🟢 Info (consensus observations)

  • util.UploadTimeout centralizes the timeout policy (2min + 1s per 256KiB); both upload flows call it consistently.
  • S3 PUT now uses a bare http.Client{} with the request context as the sole bounding mechanism — appropriate for body-transfer sizing.
  • Unit test (TestUploadTimeout) covers floor and representative scaling cases.
  • Single commit, clean 4-file scope matches title/body exactly.

Recommendation

The PR is safe to merge. Legolas's variable-shadowing concern is valid for readability but does not affect correctness. Consider addressing it in a follow-up or as part of this PR at maintainer discretion.

[done]

@canyugs canyugs merged commit 43b89e5 into main Jun 15, 2026
6 checks passed
@canyugs canyugs deleted the can/pla-1786-cli-s3-upload-timeout branch June 15, 2026 04:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants