Skip to content

feat(proxy): configurable per-app transport-retry interval#300

Open
VioletVenti wants to merge 1 commit into
SaladDay:mainfrom
VioletVenti:feat/proxy-retry-interval
Open

feat(proxy): configurable per-app transport-retry interval#300
VioletVenti wants to merge 1 commit into
SaladDay:mainfrom
VioletVenti:feat/proxy-retry-interval

Conversation

@VioletVenti

Copy link
Copy Markdown

Summary

The local proxy retried upstream connection/timeout (transport) errors immediately — the interval between retries was hardcoded to 0s with no field, CLI flag, or proxy show line to configure it. This PR makes that interval configurable per app.

  • New per-app retry_interval_seconds field on proxy_config (default 0 = immediate retry).
  • CLI setter: cc-switch proxy config --retry-interval-seconds <N>.
  • Shown per-app in cc-switch proxy show under a new Retry interval section.
  • Real schema migration (v11 → v12); old DBs upgrade without data loss and keep prior behavior.

Motivation

RequestForwarder retries retryable transport errors (connect/timeout) for non-Claude apps (uses_internal_transport_retry), but the retry loop only did attempt += 1; continue;no sleep between attempts. Under upstream instability, hammering an unresponsive endpoint with zero backoff is rarely what you want, yet there was no knob to tune it. This adds that knob.

Changes

Config / DB

  • proxy_config.retry_interval_seconds INTEGER NOT NULL DEFAULT 0, added to CREATE TABLE (fresh DBs) and via schema migration v11 → v12 (add_column_if_missing, guarded on table existence, idempotent). SCHEMA_VERSION 11 → 12.
  • DAO: both per-app SELECTs, the UPDATE, and default_app_proxy_config carry the field.

Runtime (forwarder)

  • ForwardOptions.retry_interval_seconds: Option<Duration>, threaded from HandlerContext at the 4 handler sites.
  • A maybe_sleep_retry_interval helper sleeps the configured interval at all four transport-retry continue points (streaming × {connect, timeout}, buffered × {connect, timeout}). None / Duration::ZERO skips the sleep entirely, so the existing fast path adds zero async overhead.

CLI

  • cc-switch proxy config --retry-interval-seconds <N> (range 0–300; Claude/Codex/Gemini only). Read-modify-write via the standard per-app DAO path.
  • cc-switch proxy show: new per-app Retry interval section (e.g. Codex: 3s; 0 rendered as 立即/immediate).

Backward compatibility

  • Default is 0, preserving the exact prior immediate-retry behavior.
  • Migration is additive and idempotent; existing rows keep their data and get 0 for the new column.
  • New DBs get the column from CREATE TABLE; the migration is a no-op there.

Tests

  • schema_migration_v11_adds_retry_interval_seconds — migration adds the column (type/default/not-null), preserves pre-existing row data, lands at v12.
  • retry_interval_seconds_round_trips_per_app — DAO write/read, per-app independence.
  • buffered_transport_retry_sleeps_configured_interval_between_attemptsproves the proxy actually sleeps the configured interval between retries (Codex + closed-port connect-refused; asserts wall-clock ≥ interval).
  • buffered_transport_retry_with_zero_interval_does_not_sleep — control: None stays fast (no sleep).
  • proxy_config_retry_interval_persists_across_reopen — CLI setter persists across DB reopen (simulates daemon restart); plus an out-of-range rejection test.
  • All ~58 existing ForwardOptions test sites updated mechanically (retry_interval_seconds: None).

cargo fmt --check, cargo clippy --all-targets, and cargo test are green. (One unrelated, pre-existing flaky test provider_service_switch_claude_merges_live_and_state fails on a clean main checkout as well — not introduced here.)

Scope notes / out of scope

  • Only the pre-first-byte transport-retry layer is affected (same layer that already consumes max_retries).
  • Auth is untouched (the proxy is token-transparent; only the retry cadence changes).
  • Mid-stream disconnect retries (after bytes are already sent) are an inherent limitation and remain out of scope.
  • The retry interval does not count against each attempt's request_timeout budget; the CLI caps it at 300s to avoid pathological misconfiguration.

🤖 Generated with Claude Code

The local proxy retried upstream connection/timeout errors immediately
(hardcoded 0s interval between attempts) with no way to configure it.
Add a per-app `retry_interval_seconds` field so the wait between
transport retries is tunable.

- New `retry_interval_seconds` column on `proxy_config` (INTEGER NOT NULL
  DEFAULT 0) via schema migration v11 -> v12; default 0 preserves the
  prior immediate-retry behavior (fully backward compatible). Fresh DBs
  get the column in CREATE TABLE; the migration guards on table existence
  and is idempotent.
- DAO read/write (per-app SELECT x2, UPDATE, defaults) carries the field.
- Forwarder sleeps the configured interval between retries at all four
  transport-retry continue points (streaming/buffered x connect/timeout);
  None or Duration::ZERO skips the sleep, so the fast path adds no async
  overhead. Only the pre-first-byte transport-retry layer is affected.
- CLI: `cc-switch proxy config --retry-interval-seconds <N>` (0-300,
  Claude/Codex/Gemini only), shown per-app in `cc-switch proxy show` under
  a new "Retry interval" section.
- Tests: migration (column + default + data preserved), DAO round-trip,
  forwarder timing (proves the proxy actually sleeps >= interval on retry,
  plus a no-interval control), CLI setter persistence across DB reopen.

Auth is untouched; mid-stream disconnect retries remain out of scope.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant