Skip to content

fix: resolve #932 #935 #936 #938 — security, reliability, and observability#1041

Merged
hman38705 merged 1 commit into
solutions-plug:mainfrom
rejoicetukura-blip:fix/issues-932-935-936-938
Jun 30, 2026
Merged

fix: resolve #932 #935 #936 #938 — security, reliability, and observability#1041
hman38705 merged 1 commit into
solutions-plug:mainfrom
rejoicetukura-blip:fix/issues-932-935-936-938

Conversation

@rejoicetukura-blip

@rejoicetukura-blip rejoicetukura-blip commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Summary

Resolves four issues across security, blockchain reliability, and observability.


closes #935 — RPC retry backoff lacks jitter (thundering herd on recovery)

Problem: Exponential backoff was deterministic — all API instances retried simultaneously after an RPC outage.

Fix:

  • Replaced deterministic backoff with full-jitter: delay = random(0, min(cap, base * 2^attempt))
  • Added RPC_BACKOFF_JITTER_FACTOR env var (default 1.0 = full jitter, 0.0 = no jitter)
  • Added rpc_backoff_jitter_factor to Config and BlockchainClient
  • Unit tests: 100-simulation run verifies delays are unique; zero-jitter produces deterministic values

Files: src/blockchain.rs, src/config.rs, Cargo.toml (adds rand = "0.8")


closes #938 — Ledger divergence not handled in blockchain sync worker

Problem: Sync worker had no checkpoint persistence or gap detection — restarts and missed events silently corrupted market state.

Fix:

  • Persist last processed ledger as a checkpoint key in Redis (7-day TTL) separate from the in-memory cursor
  • On restart, compare stored checkpoint against current ledger; detect and log any gap with size
  • During normal sync, detect sequence skips and emit log::warn with gap_size
  • log::error alert fires when gap exceeds 10 ledgers (market state may be inconsistent)
  • New Prometheus counter blockchain_ledger_gaps_total{gap_type="sync"}

Files: src/blockchain.rs, src/metrics.rs


closes #936 — Blockchain sync worker crash not tracked or alerted

Problem: Sync worker was fire-and-forget — panics silently stopped market data updates with no metric, alert, or restart.

Fix:

  • Extracted inner run_sync_loop (no coordinator coupling) from run_sync_worker
  • start_background_tasks wraps the loop in a supervised restart: catches panics via JoinHandle, increments counter, restarts after 1 s back-off
  • New Prometheus counter blockchain_sync_worker_restarts_total
  • New Prometheus gauge blockchain_sync_worker_last_heartbeat_ts updated on every successful poll cycle
  • New GET /health/ready endpoint: returns 200 when heartbeat is ≤ 60 s old, 503 SERVICE_UNAVAILABLE otherwise — suitable as a Kubernetes readiness probe

Files: src/blockchain.rs, src/metrics.rs, src/handlers.rs, src/main.rs


closes #932 — Email idempotency key allows pre-computation

Problem: SHA-256(recipient || template) is fully deterministic from public inputs — attackers can pre-compute keys and poison the idempotency cache.

Fix:

  • Replaced with HMAC-SHA256(secret, recipient || "|" || template || "|" || hour_bucket)
  • EMAIL_IDEMPOTENCY_SECRET env var (falls back to HMAC_KEY) is required for production; prevents external pre-computation
  • Hour-boundary timestamp bucket bounds the validity window to ~1 hour — pre-computed keys expire each hour even if the secret leaks
  • Updated idempotency_key signature; secret threaded through EmailService (idempotency_secret field) and EmailQueue
  • Unit test: different_secret_produces_different_key directly asserts the acceptance criterion

Files: src/email/service.rs, src/email/queue.rs, src/config.rs


New environment variables

Variable Default Purpose
RPC_BACKOFF_JITTER_FACTOR 1.0 Jitter fraction for RPC retry backoff
EMAIL_IDEMPOTENCY_SECRET falls back to HMAC_KEY HMAC secret for email idempotency keys

New metrics

Metric Type Description
blockchain_sync_worker_restarts_total Counter Worker restart count after panics
blockchain_sync_worker_last_heartbeat_ts Gauge Unix timestamp of last sync heartbeat
blockchain_ledger_gaps_total Counter Ledger sequence gaps detected

New endpoints

Endpoint Description
GET /health/ready Readiness probe — 503 if sync worker heartbeat > 60 s old

Testing

…-plug#936, solutions-plug#938

solutions-plug#935 — Add full jitter to RPC retry backoff
- Introduce RPC_BACKOFF_JITTER_FACTOR env var (default 1.0 = full jitter)
- Replace deterministic exponential backoff with random(0, min(cap, base*2^n))
- Add rpc_backoff_jitter_factor field to Config and BlockchainClient
- Add unit tests verifying unique delays and zero-jitter determinism

solutions-plug#938 — Handle ledger gaps/forks in blockchain sync worker
- Persist last processed ledger as a checkpoint key in Redis (7d TTL)
- On restart, detect and log gaps between checkpoint and current ledger
- During normal sync, detect sequence skips and emit log.warn with gap size
- Emit log.error alert when gap exceeds 10 ledgers
- Add observe_ledger_gap metric (blockchain_ledger_gaps_total)

solutions-plug#936 — Supervised sync worker with Prometheus metrics and health endpoint
- Extract inner run_sync_loop (no coordinator) from run_sync_worker
- Wrap in supervised restart loop in start_background_tasks: catches panics,
  increments blockchain_sync_worker_restarts_total, restarts after 1s
- Add blockchain_sync_worker_last_heartbeat_ts gauge updated each poll cycle
- Add GET /health/ready endpoint: returns 503 if heartbeat older than 60s

solutions-plug#932 — HMAC-SHA256 email idempotency key
- Replace SHA-256(recipient||template||data) with HMAC-SHA256(secret, recipient||template||hour_bucket)
- Add EMAIL_IDEMPOTENCY_SECRET env var (falls back to HMAC_KEY)
- Add hour-boundary timestamp bucket to bound pre-computation window
- Update idempotency_key signature; propagate secret through EmailService and queue
- Add test verifying key changes when secret changes
@drips-wave

drips-wave Bot commented Jun 29, 2026

Copy link
Copy Markdown

@rejoicetukura-blip Great news! 🎉 Based on an automated assessment of this PR, the linked Wave issue(s) no longer count against your application limits.

You can now already apply to more issues while waiting for a review of this PR. Keep up the great work! 🚀

Learn more about application limits

hman38705 added a commit that referenced this pull request Jun 30, 2026
…ity, and observability

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@hman38705 hman38705 merged commit 22e7522 into solutions-plug:main Jun 30, 2026
4 of 11 checks passed
hman38705 added a commit that referenced this pull request Jun 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants