Skip to content

Add opt-in persistent connection pooling to PostsData#166

Draft
maximus1108 wants to merge 1 commit intomainfrom
maxperry/persistent-connection-pooling
Draft

Add opt-in persistent connection pooling to PostsData#166
maximus1108 wants to merge 1 commit intomainfrom
maxperry/persistent-connection-pooling

Conversation

@maximus1108
Copy link
Copy Markdown
Contributor

Summary

  • Adds opt-in persistent_connections class attribute to PostsData module
  • When enabled, raw_ssl_request uses a per-class Net::HTTP::Persistent pool instead of creating a fresh Net::HTTP per request
  • Eliminates TCP+TLS handshake overhead (~345ms at 115ms RTT) on repeat requests to the same host
  • Zero behavior change for consumers that don't opt in (default: false)

Motivation

Carrier service rate requests in Shopify Core create a new TCP+TLS connection for every single HTTP request via ActiveUtils::PostsDataConnectionNet::HTTP.new. For high-latency third-party hosts (e.g., carriers in Argentina at ~115ms RTT), the handshake alone costs ~345ms — 63% of the total request time.

By pooling connections and reusing them for subsequent requests to the same host, we can eliminate this overhead entirely on warm hits, reducing p95 latency from ~600ms to ~250ms for these carriers.

See the full context document for detailed analysis of the problem, host distribution (55K hosts, power-law — top 50 handle 84% of volume), worker threading model (29 threads/pod), and pool sizing rationale.

Design

  • Opt-in per class: self.persistent_connections = true after include ActiveUtils::PostsData
  • Per-class pool isolation: Each including class gets its own Net::HTTP::Persistent instance
  • Per-request timeout override: pool.read_timeout is set before each request call, supporting dynamic timeouts
  • Error compatibility: Network errors caught and re-raised as ActiveUtils::ConnectionError / ResponseError
  • Return type unchanged: raw_ssl_request still returns Net::HTTP::HTTPResponse

Configuration

class ShippingRateProvider
  include ActiveUtils::PostsData

  self.persistent_connections = true
  self.pool_size = 100          # 29 threads + warm connection headroom
  self.pool_idle_timeout = 60   # seconds
  self.pool_keep_alive = 60     # seconds
  self.pool_max_requests = 100  # recycle after N requests
end

Test plan

  • All existing posts_data_test.rb tests pass unchanged (verified: 8/8)
  • All existing connection_test.rb tests pass unchanged (verified: 32/32)
  • All existing network_connection_retries_test.rb tests pass unchanged (verified: 23/23)
  • 13 new tests covering: default off, pool reuse, per-request timeout, error mapping (timeout/reset/refused → ConnectionError, non-2xx → ResponseError), GET/POST, headers, pool clearing, config defaults
  • Consumer opt-in in Core to be done as a follow-up PR (ShippingRateProvider behind feature flag)

🤖 Generated with Claude Code

When a class that includes PostsData sets `self.persistent_connections = true`,
`raw_ssl_request` will use a per-class Net::HTTP::Persistent pool instead of
creating a fresh Net::HTTP instance per request. This enables TCP+TLS connection
reuse across requests to the same host, eliminating repeated handshake overhead.

Key design decisions:
- Opt-in via class attribute (default: false) — zero behavior change for existing consumers
- Per-class pool isolation — each including class gets its own pool instance
- Per-request timeout override — pool.read_timeout is set before each request,
  supporting dynamic timeouts (e.g., carrier services with 3-10s based on throughput)
- Error compatibility — network errors are caught and re-raised as
  ActiveUtils::ConnectionError, preserving existing rescue clauses
- Return type unchanged — raw_ssl_request still returns Net::HTTP::HTTPResponse

Pool configuration attributes:
- pool_size (default: 100) — global cap on total connections across all hosts/threads
- pool_idle_timeout (default: 60s) — idle connections closed on next checkout
- pool_keep_alive (default: 60s) — HTTP Keep-Alive header timeout
- pool_max_requests (default: 100) — recycle connection after N requests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant