Skip to content

Implement Edge Cookie identity system with KV graph and integration tests#621

Open
ChristianPavilonis wants to merge 63 commits intomainfrom
feature/edge-cookies-final
Open

Implement Edge Cookie identity system with KV graph and integration tests#621
ChristianPavilonis wants to merge 63 commits intomainfrom
feature/edge-cookies-final

Conversation

@ChristianPavilonis
Copy link
Copy Markdown
Collaborator

@ChristianPavilonis ChristianPavilonis commented Apr 7, 2026

Summary

  • Implement the Edge Cookie (EC) identity system — a publisher-owned, first-party cookie-based identity mechanism that replaces the previous Synthetic ID concept. EC enables consent-gated identity generation, a KV-backed identity graph with CAS concurrency, partner sync (pixel, batch, pull), an identify endpoint, and auction bidstream decoration with partner EIDs.
  • Add 7 integration test scenarios covering the full EC lifecycle: generation, consent withdrawal, identify edge cases, concurrent partner syncs, and batch sync flows.
  • Rename "Synthetic ID" → "Edge Cookie" across the entire codebase, configuration, and documentation.

Changes

File / Area Change
crates/trusted-server-core/src/ec/ (new module) Full EC implementation: mod.rs (lifecycle orchestration), generation.rs (HMAC-SHA256), cookies.rs (cookie read/write), consent.rs (consent gating), device.rs (device signal derivation, bot gate), kv.rs + kv_types.rs (KV identity graph with CAS), partner.rs (partner registry + admin), sync_pixel.rs (pixel sync), batch_sync.rs (S2S batch sync), pull_sync.rs (background pull sync), identify.rs (identity lookup with CORS), eids.rs (EID encoding), finalize.rs (response finalization middleware), admin.rs (admin endpoints)
crates/integration-tests/ 7 new EC lifecycle test scenarios in tests/common/ec.rs and tests/frameworks/scenarios.rs; updated integration.rs test runner and Viceroy config fixtures
crates/trusted-server-adapter-fastly/src/main.rs New EC endpoint routes namespaced under /_ts/api/v1/ and /_ts/admin/; send_to_client() pattern with background pull sync dispatch
crates/trusted-server-core/src/auction/ endpoints.rs and formats.rs updated to decorate bid requests with partner EIDs from the KV identity graph (user.id, user.eids, user.consent)
crates/trusted-server-core/src/integrations/prebid.rs Bidder deduplication fix; blank auction EC header handling
crates/js/lib/src/integrations/prebid/index.ts JS-side Prebid integration updates for EC headers
crates/trusted-server-core/src/synthetic.rs Deleted — replaced by the EC module
crates/trusted-server-core/src/cookies.rs Slimmed down; EC-specific cookie logic moved to ec/cookies.rs
crates/trusted-server-core/src/settings.rs + settings_data.rs EC configuration fields, partner config, consent config migration
crates/trusted-server-core/src/consent/ Updated consent module for EC-specific consent checks; KV consent storage
crates/trusted-server-core/src/http_util.rs New HTTP helpers for EC endpoints (CORS, request parsing)
crates/trusted-server-core/src/proxy.rs EC header propagation in proxy layer
crates/trusted-server-core/src/publisher.rs Publisher config updates for EC
crates/trusted-server-core/src/constants.rs New EC-related constants
docs/guide/ec-setup-guide.md (new) End-to-end EC setup documentation
docs/guide/edge-cookies.md (new) EC concept overview
docs/guide/api-reference.md API reference updated with all EC endpoints
docs/guide/configuration.md Configuration docs updated for EC settings
docs/guide/synthetic-ids.md Deleted — replaced by EC docs
docs/ (various) Documentation updates reflecting Synthetic ID → Edge Cookie rename
docs/internal/superpowers/specs/2026-03-24-ssc-technical-spec-design.md EC technical spec updated with schema extensions, device signals, bot gate
fastly.toml New KV store bindings for EC
trusted-server.toml EC configuration section added

Closes

Closes #532
Closes #533
Closes #534
Closes #535
Closes #536
Closes #537
Closes #538
Closes #539
Closes #540
Closes #541
Closes #542
Closes #543
Closes #544
Closes #611
Closes #612

Follow-up

Follow-up: The pull sync mechanism (ec/pull_sync.rs) currently passes ec_id and client_ip as query parameters to partner endpoints. While connections are HTTPS-only, these values can appear in partner server access logs and CDN logs. A future improvement should migrate to POST body or hash/encrypt identifiers before transmission. Tracked separately.

Test plan

  • cargo test --workspace
  • cargo clippy --workspace --all-targets --all-features -- -D warnings
  • cargo fmt --all -- --check
  • JS tests: cd crates/js/lib && npx vitest run
  • JS format: cd crates/js/lib && npm run format
  • Docs format: cd docs && npm run format
  • Other: 7 EC lifecycle integration test scenarios (full lifecycle, consent withdrawal, identify edge cases, concurrent partner syncs, batch sync)

Checklist

  • Changes follow CLAUDE.md conventions
  • No unwrap() in production code — use expect("should ...")
  • Uses tracing macros (not println!)
  • New code has tests
  • No secrets or credentials committed

@ChristianPavilonis ChristianPavilonis self-assigned this Apr 7, 2026
@ChristianPavilonis ChristianPavilonis force-pushed the feature/edge-cookies-final branch from de0525d to 0940c34 Compare April 7, 2026 19:01
@ChristianPavilonis ChristianPavilonis requested review from aram356 and prk-Jr and removed request for aram356 April 7, 2026 19:04
Comment thread crates/trusted-server-core/src/ec/kv.rs Fixed
Comment thread crates/trusted-server-core/src/ec/kv.rs Fixed
Comment thread crates/trusted-server-core/src/ec/kv.rs Fixed
Comment thread crates/trusted-server-core/src/ec/kv.rs Fixed
Comment thread crates/trusted-server-core/src/storage/kv_store.rs Fixed
Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Comprehensive EC identity subsystem: HMAC-based ID generation, KV identity graph with CAS concurrency, partner sync (pixel, batch, pull), identify endpoint, and auction bidstream decoration. The architecture is well-designed — clean separation of concerns, strong input validation, and solid concurrency model. A few cleartext logging issues and a docs inconsistency need addressing before merge.

Blocking

🔧 wrench

  • Cleartext EC ID logging: Full EC IDs are logged at warn!/error!/trace! levels in 5 locations across ec/mod.rs (lines 128, 211, 296), sync_pixel.rs (line 91), and pull_sync.rs (line 91). The log_id() truncation helper was introduced in this PR for exactly this purpose but is not used consistently. See inline comments for fixes.
  • Stale env var in docs: TRUSTED_SERVER__EDGE_COOKIE__SECRET_KEY still appears in configuration.md (lines 40, 956) and error-reference.md (line 154). The PR renamed the TOML section from [edge_cookie] to [ec] and updated some env var references but missed these. Users following the docs will set a variable that is silently ignored, resulting in a startup failure from the placeholder passphrase. Should be TRUSTED_SERVER__EC__PASSPHRASE.

❓ question

  • HttpOnly omitted from EC cookie (ec/cookies.rs:11): Intentionally omitted for a hypothetical future JS use case. Is there a concrete plan? The identify endpoint already exposes the EC ID. If not, HttpOnly would be cheap XSS defense-in-depth.

Non-blocking

🤔 thinking

  • Uppercase EC ID rejection in batch sync (batch_sync.rs:209): is_valid_ec_id rejects uppercase hex, but batch sync validates before normalizing (line 225). Partners submitting uppercase EC IDs get invalid_ec_id instead of normalization.

♻️ refactor

  • _pull_enabled index lacks CAS (partner.rs:564): Read-modify-write without generation markers. Concurrent partner registrations can overwrite each other's index updates. Self-healing via fallback, but inconsistent with the CAS discipline used elsewhere.

🌱 seedling

  • Integration tests don't verify identify response shape: FullLifecycle and ConcurrentPartnerSyncs only assert uids.<partner>. The ec, consent, degraded, eids, and cluster_size fields from the API reference are never checked.
  • Pixel sync failure paths untested end-to-end: ts_reason=no_ec, no_consent, domain mismatch redirects are unit-tested but not covered in integration tests.

📝 note

  • trusted-server.toml ships banned placeholder: passphrase = "trusted-server" is rejected by reject_placeholder_secrets. Works in CI via env override, but new contributors will hit a confusing startup error. A TOML comment would help.

⛏ nitpick

  • Eid/Uid missing Deserialize: openrtb.rs types derive Serialize only. If the auction ever needs to parse EIDs from provider responses, Deserialize will be needed.

CI Status

  • cargo fmt: PASS
  • clippy: PASS
  • cargo test: PASS
  • vitest: PASS
  • integration tests: FAIL
  • CodeQL: FAIL (likely related to cleartext logging findings above)

Comment thread crates/trusted-server-core/src/ec/mod.rs Outdated
Comment thread crates/trusted-server-core/src/ec/mod.rs
Comment thread crates/trusted-server-core/src/ec/sync_pixel.rs Outdated
Comment thread crates/trusted-server-core/src/ec/pull_sync.rs Outdated
Comment thread crates/trusted-server-core/src/ec/cookies.rs Outdated
Comment thread crates/trusted-server-core/src/ec/batch_sync.rs Outdated
Comment thread crates/trusted-server-core/src/ec/partner.rs Outdated
Copy link
Copy Markdown
Collaborator

@prk-Jr prk-Jr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Comprehensive EC identity subsystem: HMAC-based ID generation, KV identity graph with CAS concurrency, partner sync (pixel, batch, pull), identify endpoint, and auction bidstream decoration. The architecture is well-designed with clean separation of concerns, strong concurrency discipline, and solid security choices throughout. Four new blocking findings join the existing ones — three are input-validation gaps and one is a PII leak in the pull sync URL builder that should be resolved before merge.

Blocking

🔧 wrench

  • PII leak: client IP forwarded to partnerspull_sync.rs:273: build_pull_request_url appends the raw user IP as the ip query parameter on every outbound pull-sync request, exposing it in partner access logs. Contradicts the privacy-preserving design intent and is likely a GDPR Article 5 concern. Remove the ip parameter or gate it behind an explicit per-partner allow_ip_forwarding config flag.
  • Pull sync response body unboundedpull_sync.rs:325: take_body_bytes() with no size cap before JSON parsing. A malicious partner can exhaust WASM memory. Batch sync and admin endpoints both have MAX_BODY_SIZE guards; this path needs one too (64 KiB is generous for a {"uid":"..."} response).
  • Whitespace-only UIDs bypass validationbatch_sync.rs:217 and sync_pixel.rs:41: is_empty() passes " " and "\t", which get stored as garbage EID values in the KV graph. Replace with trim().is_empty() at both sites.
  • rand::thread_rng() WASM compatibility unverifiedgeneration.rs:52: requires getrandom with the wasi feature on wasm32-wasip1. Native cargo test passing does not prove the WASM build is safe; integration tests are already failing so this may not have been caught. Switch to OsRng or add getrandom = { version = "0.2", features = ["wasi"] } explicitly.

Non-blocking

🤔 thinking

  • process_mappings UpsertResult branches untestedbatch_sync.rs:232: NotFound, ConsentWithdrawn, and Stale have zero unit test coverage. Stale is documented as "counted as accepted" with no regression test.
  • Shared error messages make pull sync validation tests non-isolatingpartner.rs:152,165: both missing-URL and missing-token conditions return the identical error string, so the tests asserting on substrings pass even if the wrong branch fires. Use distinct messages per condition.

♻️ refactor

  • ec_consent_granted has no testsconsent.rs:20: the entry-point gate for all EC creation has no #[cfg(test)] section. Add smoke tests for granted and denied paths.
  • KV tombstone path never exercised in finalize testsfinalize.rs:149: all finalize tests pass kv: None, so the tombstone write and the cookie-EC ≠ active-EC case in withdrawal_ec_ids are untested.
  • Missing HMAC prefix stability testgeneration.rs:228: no test asserts that the same IP + passphrase always produces the same 64-char hash prefix, which is the core deduplication guarantee for the KV graph.
  • normalize_ec_id duplicated in integration testsintegration-tests/tests/common/ec.rs:374: reimplements normalize_ec_id_for_kv from core. Re-export and use the canonical implementation.

⛏ nitpick

  • Use saturating_sub for consistencykv.rs:605: the subtraction is safe due to the guard above but inconsistent with saturating_sub used throughout the rest of the module.
  • log_id should encapsulate the suffixmod.rs:51: every call site manually appends "…" in the format string; move it inside the function.

Praises 👍

  • CAS-based optimistic concurrency (kv.rs): bounded retries, graceful ItemPreconditionFailed handling, and MAX_CAS_RETRIES preventing infinite loops — textbook correct for a single-writer KV model.
  • Constant-time API key comparison (partner.rs): subtle::ConstantTimeEq for timing attack prevention on key lookups. Many implementations miss this entirely.
  • KvMetadata fast-path consent check (kv_types.rs): mirroring ok/country/cluster_size in metadata to avoid streaming the full KV body is an excellent performance optimisation with the right constraint test.
  • evaluate_known_browser with OnceLock-cached hash table (device.rs): pre-hashing fingerprints once and caching via OnceLock is the right WASM-compatible lazy-init pattern.
  • HMAC stability explicitly documented (generation.rs:30-33): noting that the output format is a "stable contract" that would invalidate all existing identities if changed is exactly the kind of correctness annotation that prevents future breakage.

CI Status

  • cargo fmt: PASS
  • cargo clippy: PASS
  • cargo test: PASS
  • vitest: PASS
  • integration tests: FAIL
  • CodeQL: FAIL (cleartext logging — covered in prior review)

Comment thread crates/trusted-server-core/src/ec/pull_sync.rs Outdated
Comment thread crates/trusted-server-core/src/ec/pull_sync.rs
Comment thread crates/trusted-server-core/src/ec/batch_sync.rs Outdated
Comment thread crates/trusted-server-core/src/ec/sync_pixel.rs Outdated
Comment thread crates/trusted-server-core/src/ec/generation.rs
Comment thread crates/trusted-server-core/src/ec/finalize.rs
Comment thread crates/trusted-server-core/src/ec/generation.rs
Comment thread crates/integration-tests/tests/common/ec.rs Outdated
Comment thread crates/trusted-server-core/src/ec/kv.rs Outdated
Comment thread crates/trusted-server-core/src/ec/mod.rs Outdated
Comment thread crates/trusted-server-core/src/ec/kv.rs Fixed
Comment thread crates/trusted-server-core/src/ec/kv.rs Fixed
Comment thread crates/trusted-server-core/src/storage/kv_store.rs Fixed
Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR lands a large, well-partitioned Edge Cookie identity subsystem, and the prior blocking feedback (PII logging via log_id(), whitespace UID checks, pull-sync response size cap, dropping raw client IP from pull-sync query strings, constant-time auth comparisons) has been materially addressed. Blocking issues remaining are: a CI-red integration test (deterministic, root cause in the test client, not the server), an unverified rand::thread_rng() on wasm32-wasip1 that CI cannot catch because no wasm build runs, two new CodeQL alerts on ec/kv.rs riding the pre-existing settings-taint false-positive flow, one untested backend-state machine in batch_sync, and two questions on the auction / partner surfaces.

Blocking

🔧 wrench

  • CI red — test_ec_lifecycle_fastly deterministic failure: the test client omits Sec-Fetch-Dest: document and Accept: text/html, so is_navigation_request() correctly returns false and EC never generates (crates/integration-tests/tests/common/ec.rs:93).
  • rand::thread_rng() on wasm32-wasip1 unverified; CI builds no wasm target (crates/trusted-server-core/src/ec/generation.rs:52). Add a wasm build to CI and/or switch to an explicit getrandom with the wasi feature.
  • Two new CodeQL alerts on ec/kv.rs:637 and :789 ride the same settings.reject_placeholder_secrets() taint flow that already produces noise on main. Both are false positives but block the CodeQL gate. Fix with a scoped suppression or a repo-level CodeQL config filter on rust/cleartext-logging for settings-derived values.
  • UpsertResult::{NotFound, ConsentWithdrawn, Stale} untested in batch_sync::process_mappings (crates/trusted-server-core/src/ec/batch_sync.rs:231-244). Prior reviewer flagged this; still unresolved.

❓ question

  • Auction user.id = "" on the no-consent path is emitted into the OpenRTB bid request JSON (crates/trusted-server-core/src/auction/endpoints.rs:60-77, formats.rs:186). Has this been validated against the live Prebid Server target? If not, change UserInfo.id to Option<String> with skip_serializing_if.
  • update_pull_enabled_index race self-heal path untested (crates/trusted-server-core/src/ec/partner.rs:552-604). The documented fallback at partner.rs:350 is the only thing keeping the index correct under concurrent upserts — please add a concurrency test or file a tracked follow-up.

Non-blocking

🤔 thinking

  • MAX_CAS_RETRIES = 3, no backoff — likely to starve under contention on hot prefixes; consider exponential backoff + jitter (crates/trusted-server-core/src/ec/kv.rs:300-339).
  • HashMap<String, KvPartnerId> non-deterministic serialization — breaks future hash-based dedup and byte-diffing of stored values; use BTreeMap or IndexMap (crates/trusted-server-core/src/ec/kv_types.rs:59).
  • seen_domains drop-newest eviction freezes long-lived ECs on their first 50 domains, defeating the last timestamp; switch to LRU or document (crates/trusted-server-core/src/ec/kv.rs:627-642).

♻️ refactor

  • EID gating failures log at debug! — bump to warn! so ad-stack anomalies surface in production (crates/trusted-server-core/src/auction/endpoints.rs:80).
  • ec_consent_granted() is a 1-line pass-through — inline or document the sealing-point intent (crates/trusted-server-core/src/ec/consent.rs:20-22).
  • Unvalidated domain strings in EC graph writes — add a 255-byte length cap and hostname-shape check at the write boundary (crates/trusted-server-core/src/ec/kv.rs update_last_seen, kv_types.rs).

🌱 seedling

  • No integration test for the GPC / consent-denied identify path. Once the Sec-Fetch-Dest fix lands, add ec_full_lifecycle_with_gpc sending Sec-GPC: 1 and asserting /_ts/api/v1/identify returns 403 / no EID payload (crates/integration-tests/tests/frameworks/scenarios.rs:501-565).
  • No wasm-target build in CI — root cause of how the rand::thread_rng() concern reached this point unnoticed. Adding cargo build -p trusted-server-adapter-fastly --target wasm32-wasip1 would catch an entire class of deps-pin regressions.

📌 out of scope

  • Pull-sync EC ID still travels as a URL query parameter — acknowledged in the PR description as deferred. Please file the follow-up issue and reference it from the commit that ships this PR so it is not lost (crates/trusted-server-core/src/ec/pull_sync.rs:260-263).

CI Status

  • cargo fmt: PASS
  • cargo clippy: PASS
  • cargo test (native): PASS
  • vitest: PASS
  • integration tests: FAILtest_ec_lifecycle_fastly (see blocking #1)
  • CodeQL: FAIL — 15 high alerts; 13 are pre-existing settings.reject_placeholder_secrets() taint-flow false positives already present on main, 2 are new on ec/kv.rs riding the same flow (see blocking #3)

Comment thread crates/integration-tests/tests/common/ec.rs Outdated
Comment thread crates/trusted-server-core/src/ec/generation.rs
Comment thread crates/trusted-server-core/src/ec/kv.rs Outdated
Comment thread crates/trusted-server-core/src/ec/batch_sync.rs
Comment thread crates/trusted-server-core/src/auction/endpoints.rs Outdated
Comment thread crates/trusted-server-core/src/ec/kv_types.rs Outdated
Comment thread crates/trusted-server-core/src/ec/kv.rs Outdated
Comment thread crates/trusted-server-core/src/auction/endpoints.rs Outdated
Comment thread crates/trusted-server-core/src/ec/consent.rs
Comment thread crates/trusted-server-core/src/ec/kv.rs Outdated
Comment thread crates/trusted-server-core/src/ec/kv.rs Fixed
Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Large PR (98 files, +11,974 / −3,099). Architecture and consent plumbing are solid; this review focuses on concrete defects and hardening gaps verified against 109c929c. Three CI checks currently fail (integration tests, format-typescript, CodeQL); two of those are trivially fixable and inline-commented below. The CodeQL failure appears to be a false positive that needs a team decision.

Blocking

🔧 wrench

  • Integration tests fail to compile: crates/integration-tests/Cargo.toml missing trusted-server-core dev-dependency while tests/common/ec.rs:19 and tests/frameworks/scenarios.rs:5 import it. (inline)
  • format-typescript CI failure: crates/js/lib/src/integrations/prebid/index.ts:413 fails prettier — one-line fix via npm run format. (inline)
  • Unbounded cookie payloads: ec/prebid_eids.rs:43 (ts-eids) and ec/prebid_eids.rs:111 (sharedId) decode/clone attacker-controlled cookie values with no size ceiling. (inline)

❓ question

  • Consent-withdrawal split-brain at ec/finalize.rs:56: consent-store delete failures are logged-and-continued while KV tombstones still write. Intentional (KV = source of truth) or should withdrawal fail-closed? (inline)
  • CodeQL cross-cutting: 15 high-severity rust/cleartext-logging alerts all name settings.reject_placeholder_secrets() as the tainted source, pinned to log-call lines that do not actually reference that method (e.g. auction/orchestrator.rs:125 is a timeout warning). This is taint-analyzer propagation through a boolean validator that uses .expose() internally; the logged bytes are field names, not secret material. How would the team like to resolve: (a) #[allow] on the validator with a comment explaining the false positive, (b) rename reject_placeholder_secretsvalidate_secrets_not_placeholders to break the taint-name heuristic, or (c) configure a CodeQL suppression?

Non-blocking

🤔 thinking

  • Bot gate is presence-only (ec/device.rs:76) — attackers with spoofed JA4 pass. Document threat model. (inline)
  • user.id = None when EC not allowed (auction/endpoints.rs:60) — correct for consent, but may depress bidder dedup/yield. Consider a session-scoped ephemeral ID. (inline)
  • Pull-sync only on organic routes (trusted-server-adapter-fastly/src/main.rs:373) — auction-heavy SPA publishers never refresh KV. Document or extend. (inline)

♻️ refactor

  • Duplicated bearer parser in ec/identify.rs:34 and ec/batch_sync.rs:101 — extract to ec/auth.rs. (inline)
  • Rate-limiter hourly→minute conversion is lossy by up to ~2× (ec/rate_limiter.rs:59) — document and test the overshoot bound. (inline)

🌱 seedling

  • Non-deterministic EID order on timestamp ties (ec/eids.rs:64) — add partner_id as secondary sort key. (inline)
  • No post-deserialize bounds on KvEntry (ec/kv.rs:158) — legacy/corrupt oversized entries will round-trip through. (inline)
  • platform/test_support.rs defines stubs (NoopConfigStore, NoopSecretStore, noop_services) that no unit test exercises, so trait drift will silently break them. Either consume in ≥1 unit test or remove.
  • Auction ext metadata unsrialized: auction/formats.rs computes strategy/provider-count/timing into result.metadata but never emits it into ext.trustedServer on the OpenRTB response — a useful debug surface.

⛏ nitpick

  • has_cookie_mismatch naming (ec/mod.rs:398) — ambiguous; prefer header_overrides_cookie. (inline)

📌 out of scope

  • ec_id/client_ip as query parameters in pull sync — PR description acknowledges as follow-up.

CI Status

  • cargo fmt: PASS
  • clippy: PASS
  • cargo test (workspace): PASS
  • vitest: PASS
  • format-typescript: FAIL (see inline)
  • integration tests: FAIL — compile error (see inline)
  • CodeQL: FAIL — 15 high (likely false positives) (see question above)
  • browser integration tests: PASS
  • format-docs: PASS

Comment thread crates/integration-tests/Cargo.toml
Comment thread crates/js/lib/src/integrations/prebid/index.ts Outdated
Comment thread crates/trusted-server-core/src/ec/prebid_eids.rs Outdated
Comment thread crates/trusted-server-core/src/ec/prebid_eids.rs
Comment thread crates/trusted-server-core/src/ec/finalize.rs Outdated
Comment thread crates/trusted-server-core/src/ec/identify.rs Outdated
Comment thread crates/trusted-server-core/src/ec/rate_limiter.rs
Comment thread crates/trusted-server-core/src/ec/eids.rs Outdated
Comment thread crates/trusted-server-core/src/ec/kv.rs
Comment thread crates/trusted-server-core/src/ec/mod.rs Outdated
@ChristianPavilonis
Copy link
Copy Markdown
Collaborator Author

Addressed the remaining merge blockers in 4d2ef30:

  • added the missing trusted-server-core dev-dependency for crates/integration-tests
  • formatted crates/js/lib/src/integrations/prebid/index.ts to clear the Prettier failure
  • added raw size guards for ts-eids and sharedId cookie ingestion, with focused regression tests
  • removed the remaining EC-ID-bearing log fields from the two CodeQL false-positive sites to unblock the cleartext-logging findings

Validation run locally:

  • cargo test --workspace
  • cargo fmt --all -- --check
  • cargo clippy --workspace --all-targets --all-features -- -D warnings
  • cd crates/js/lib && npx vitest run
  • cd crates/js/lib && npx prettier --check src/integrations/prebid/index.ts
  • cargo test --manifest-path crates/integration-tests/Cargo.toml --target "$(rustc -vV | sed -n 's/^host: //p')" --no-run

Comment thread crates/trusted-server-core/src/ec/kv.rs Fixed
Comment thread crates/trusted-server-core/src/storage/kv_store.rs Fixed
@ChristianPavilonis
Copy link
Copy Markdown
Collaborator Author

Follow-up update in 6016c48 to address the remaining open review threads:

  • extracted shared Bearer auth to crates/trusted-server-core/src/ec/auth.rs and removed duplication from identify.rs / batch_sync.rs
  • made EID ordering deterministic on synced ties with a stable partner_id secondary key
  • documented + tested the rate-limiter hourly→per-minute rounding behavior
  • added post-deserialize validation for loaded KvEntry values
  • renamed has_cookie_mismatch to cookie_differs_from_active_ec
  • documented the withdrawal authority model (KV tombstone authoritative, consent-store cleanup best-effort)
  • clarified the browser-heuristic threat model, the no-consent auction identity choice, and the current pull-sync trigger contract

Validation re-run locally:

  • cargo test --workspace
  • cargo fmt --all -- --check
  • cargo clippy --workspace --all-targets --all-features -- -D warnings

I also replied to and resolved the remaining open review threads tied to these changes.

@ChristianPavilonis ChristianPavilonis linked an issue Apr 21, 2026 that may be closed by this pull request
Copy link
Copy Markdown
Collaborator

@prk-Jr prk-Jr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR lands the Edge Cookie identity system end to end, but I found two merge blockers in the request lifecycle.

Blocking

🔧 wrench

  • Identify GET still missing browser-direct CORS support: OPTIONS /_ts/api/v1/identify reflects CORS headers, but GET /_ts/api/v1/identify never applies them, so the documented browser-direct flow still fails after a successful preflight. classify_origin() also accepts matching hosts regardless of scheme, while the spec only allows https origins.

Fix: classify origin in the GET path too, return 403 for denied origins, reflect CORS headers on allowed GET responses, and require origin_url.scheme() == "https" in classify_origin(). (crates/trusted-server-core/src/ec/identify.rs:31, crates/trusted-server-core/src/ec/identify.rs:171)

CI Status

  • cargo fmt: PASS
  • cargo test: PASS
  • vitest: PASS
  • browser integration tests: PASS
  • integration tests: FAIL (test_ec_lifecycle_fastly: organic GET / should set ts-ec cookie)
  • CodeQL: FAIL (12 open alerts on the merge ref)

Comment thread crates/trusted-server-adapter-fastly/src/main.rs Outdated
Comment thread crates/trusted-server-core/src/ec/identify.rs
Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Edge Cookie identity subsystem is materially complete and the prior review's blockers have been addressed: navigation-headers fix unblocks test_ec_lifecycle_fastly, UpsertResult variants are tested, user.id is Option<String> with skip_serializing_if, the partner-registry race is gone (architecture is now config-derived), MAX_CAS_RETRIES is 5 with backoff+jitter, raw client IP is no longer forwarded to pull-sync partners, and the WASM build now runs in CI.

Remaining blocking issues: a plaintext partner pull-token sitting on a Debug-deriving struct, an unintentional-looking read/write asymmetry in the bot gate, and a still-failing CodeQL gate riding the same Settings Debug taint-flow false positive that was flagged on the prior review.

Blocking

🔧 wrench

  • Plaintext ts_pull_token on PartnerConfig (crates/trusted-server-core/src/ec/registry.rs:44, :208)

❓ question

  • Bot-gate read/write asymmetry intentional? (crates/trusted-server-adapter-fastly/src/main.rs:208-212 vs. :272-276)
  • CodeQL CI gate is failing on 12 new rust/cleartext-logging alerts, all riding the settings.reject_placeholder_secrets()log::debug!("Settings {settings:?}") taint flow that was flagged on the prior review. Settings Debug is redacted via Redacted<T>, so the alerts are benign — but they block the gate. Sites: crates/trusted-server-adapter-fastly/src/main.rs:71, crates/trusted-server-core/src/publisher.rs:178/:341/:382, crates/trusted-server-core/src/html_processor.rs:70, crates/trusted-server-core/src/consent/mod.rs:173, crates/trusted-server-core/src/auction/orchestrator.rs:125/:278/:338/:408, crates/trusted-server-core/src/integrations/nextjs/html_post_process.rs:123/:456. Plan to suppress at PR level or add a repo-wide CodeQL filter on Redacted-derived flows before merge?

Non-blocking

🤔 thinking

  • std::thread::sleep in CAS retry loop on wasm32-wasip1 (crates/trusted-server-core/src/ec/kv.rs:343, :426, :471, :558)
  • seen_domains still uses HashMap while ids was converted to BTreeMap (crates/trusted-server-core/src/ec/kv_types.rs:133)
  • MIN_PASSPHRASE_LENGTH = 8 is weak entropy for the HMAC-SHA256 anchor (crates/trusted-server-core/src/settings.rs:353)
  • seen_domains drop-newest eviction freezes long-lived ECs at the 50-domain cap (crates/trusted-server-core/src/ec/kv.rs:667-683) — repeat from prior review

♻️ refactor

  • PartnerRegistry::all() exposes HashMap iteration order to callers (crates/trusted-server-core/src/ec/registry.rs:174-176)
  • RateLimiter::exceeded_per_minute round-trips through hourly math (crates/trusted-server-core/src/ec/rate_limiter.rs:42-46, :49-52)

🌱 seedling

  • No end-to-end integration tests for batch sync 429 / 400 / 413 paths (crates/integration-tests/tests/frameworks/scenarios.rs:761-779)
  • Bot-gate observability hook (crates/trusted-server-core/src/ec/device.rs:84-86) — operators have no metric for what fraction of traffic is excluded

📌 out of scope

  • Repo-level CodeQL filter for the Redacted-derived rust/cleartext-logging taint flow so this stops gating PRs. Second consecutive PR review where the same false-positive flow blocks CI.

⛏ nitpick

  • derive(Debug) on PartnerRegistry / PartnerConfig (crates/trusted-server-core/src/ec/registry.rs:17, :53) — combined with the plaintext token wrench, this is the loaded gun.

CI Status

  • cargo fmt: PASS
  • cargo clippy: PASS
  • cargo test (workspace): PASS
  • vitest: PASS
  • integration tests: PASS
  • browser integration tests: PASS
  • WASM release build: PASS
  • CodeQL: FAIL — 12 new alerts, all on the same Settings Debug taint-flow false-positive flow (see ❓ above)

Comment thread crates/trusted-server-core/src/ec/registry.rs Outdated
Comment thread crates/trusted-server-adapter-fastly/src/main.rs
Comment thread crates/trusted-server-core/src/ec/kv.rs Outdated
Comment thread crates/trusted-server-core/src/ec/kv_types.rs Outdated
Comment thread crates/trusted-server-core/src/settings.rs
Comment thread crates/trusted-server-core/src/ec/registry.rs
Comment thread crates/trusted-server-core/src/ec/rate_limiter.rs
Comment thread crates/integration-tests/tests/frameworks/scenarios.rs
Comment thread crates/trusted-server-core/src/ec/device.rs
Comment thread crates/trusted-server-core/src/ec/registry.rs
Copy link
Copy Markdown
Collaborator Author

@ChristianPavilonis ChristianPavilonis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clearing an old pending local review so I can reply to the current threads from the branch update.

Comment thread crates/integration-tests/tests/common/ec.rs Outdated
Comment thread crates/trusted-server-core/src/ec/kv.rs Fixed
Comment thread crates/trusted-server-core/src/ec/kv.rs Fixed
Comment thread crates/trusted-server-core/src/storage/kv_store.rs Fixed
Comment thread crates/trusted-server-core/src/ec/generation.rs
Comment thread crates/trusted-server-core/src/ec/eids.rs Outdated
Comment thread crates/trusted-server-core/src/ec/kv.rs
Comment thread crates/trusted-server-core/src/ec/mod.rs Outdated
Comment thread crates/trusted-server-adapter-fastly/src/main.rs Outdated
Comment thread crates/trusted-server-core/src/ec/identify.rs
ChristianPavilonis and others added 18 commits April 27, 2026 11:24
…bid EID cookie sync

Remove the pixel sync endpoint and admin partner registration endpoint.
Partners are now defined in [[ec.partners]] TOML config and loaded into an
in-memory PartnerRegistry at startup with O(1) lookups by ID, API key hash,
and source domain.

The identify endpoint now requires Bearer token authentication and returns
only the requesting partner's UID (scoped response).

Client-side ID collection uses Prebid's getUserIdsAsEids() API — the TSJS
Prebid integration writes a ts-eids cookie after each auction, and the
backend ingests matched partner UIDs into the KV identity graph during
ec_finalize_response.

Batch sync and pull sync are retained, updated to use PartnerRegistry.
… sync coverage

Fix integration test deterministic failure by adding Sec-Fetch-Dest: document
and Accept: text/html headers to EcTestClient.get() so is_navigation_request()
recognizes organic requests. Update all EC scenarios for the new architecture:
config-based partners, Bearer-authenticated identify with scoped responses,
batch sync instead of pixel sync.

Change UserInfo.id from String to Option<String> with skip_serializing_if to
avoid emitting empty user.id in OpenRTB bid requests on the no-consent path.

Add batch_sync unit tests for UpsertResult::{NotFound, ConsentWithdrawn, Stale}
rejection paths (previously untested).

Add safety comments on ec/kv.rs log sites where CodeQL flags false positives
(log_id() already truncates the EC ID).

Add integration test partners (inttest, inttest2) to trusted-server.toml.
Read the sharedId cookie (set by Prebid's SharedID User ID Module) on
organic requests and store the value as a partner UID in the EC identity
graph. Uses the same debounce and best-effort semantics as the ts-eids
cookie ingestion path.

Add sharedid partner config (source_domain: sharedid.org, atype: 1) to
trusted-server.toml. The backend matches the cookie value to this partner
via PartnerRegistry.find_by_source_domain() during ec_finalize_response.
… module

Origin/main introduced RuntimeServices platform abstraction (PR2-PR6) after
our branch diverged. The rebase left inconsistencies between the trait
definitions from our branch and the implementations from origin/main.

Restore source files to their ORIG_HEAD state to reconcile:
- Remove services: &RuntimeServices from IntegrationProxy::handle
  implementations (datadome, didomi, gpt, lockr, permutive, testlight, gtm)
- Remove services from run_auction, proxy_request, AuctionContext
- Restore publisher, proxy, orchestrator, types to branch-correct state
- Restore streaming/html/rsc_flight modules to branch state
… sensitive information'

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
… sensitive information'

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
… sensitive information'

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
@ChristianPavilonis ChristianPavilonis force-pushed the feature/edge-cookies-final branch from 9261993 to d8c943d Compare April 27, 2026 16:26
Comment thread crates/trusted-server-core/src/publisher.rs Dismissed
Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Deep review of the EC identity subsystem (crates/trusted-server-core/src/ec/**), the auction-bidstream wiring, and the Fastly adapter route table. The design is sound and the test coverage is strong — the blockers below are concrete hardening gaps in pre-validation memory bounds, HMAC key strength, and post-send compute safety, not architectural concerns.

Blocking

🔧 wrench

  • DoS via unbounded request body in batch syncreq.take_body_bytes() reads the full body before MAX_BODY_SIZE is checked, so an authenticated partner can pressure the WASM heap before the limit fires. Check Content-Length first. (crates/trusted-server-core/src/ec/batch_sync.rs:130-136)
  • MIN_PASSPHRASE_LENGTH = 8 is too short for the long-lived HMAC-SHA256 key that derives every visitor's EC ID. NIST SP 800-107 recommends ≥ the security strength (32 bytes). Raise to 32 (or at least 24). (crates/trusted-server-core/src/settings.rs:412)

❓ question

  • Pull-sync pending.wait() has no per-partner timeout — with max_concurrency = 3, a single hung partner pins the worker compute window post-send. Is the intent to rely on Fastly's compute cap, or should each wait be bounded? (crates/trusted-server-core/src/ec/pull_sync.rs:264)

Non-blocking

🤔 thinking

  • Broad x-ts-* header strip in ec_finalize_response — works today only because the outer finalize_response re-adds x-ts-version / x-ts-env afterwards. Fragile if middleware order changes. (crates/trusted-server-core/src/ec/finalize.rs:154-168)
  • CORS reflects the original-cased Origin even though validation lowercases — browsers do byte-exact compare on Access-Control-Allow-Origin. (crates/trusted-server-core/src/ec/identify.rs:204-212)
  • upsert_partner_id recovery path writes a synthetic entry with country = "ZZ", no consent context, and consent.ok = true when the root row is missing. Could fail-closed instead. (crates/trusted-server-core/src/ec/kv.rs:393-415)
  • Public surface in device.rs is broader than needed — most internals can be pub(super). (crates/trusted-server-core/src/ec/device.rs:102-176)

♻️ refactor

  • EC_RESPONSE_HEADERS const is redundant with the dynamic x-ts-* strip — pick one. (crates/trusted-server-core/src/ec/finalize.rs:23-28)
  • Group batch-sync mappings by EC ID to halve CAS round-trips when the same EC repeats. (crates/trusted-server-core/src/ec/batch_sync.rs:166-224)

🌱 seedling

  • Read-side entry.validate() runs on every KV load — write-side validation already exists via serialize_entry, so reads can trust the round-trip. (crates/trusted-server-core/src/ec/kv.rs:150-169)
  • Schema version is enforced but no migration path is documented — define the strategy now so a future v2 bump isn't an emergency. (crates/trusted-server-core/src/ec/kv_types.rs:17, 360-365)

⛏ nitpick

  • is_valid_ec_id case asymmetry between hash prefix (lowercase only) and suffix (mixed-case alphanumeric) is intentional but undocumented. (crates/trusted-server-core/src/ec/generation.rs:166-186)
  • let _timestamp = mapping.timestamp; is dead code — either remove or replace with an intent comment. (crates/trusted-server-core/src/ec/batch_sync.rs:191)

👍 praise

  • Strong unit-test coverage across every EC module — CAS conflict simulation, withdrawal dedupe, JA4/H2 known-browser fingerprinting, schema-version validation, base64 EID truncation binary search, and CORS classification all have explicit tests.
  • Centralised ec_finalize_response is invoked from every route (auth-fail, streaming, organic) so withdrawal and cookie semantics stay consistent.
  • Clean separation between create_or_revive (writer-side, allows revive) and upsert_partner_id_if_exists (reader-rejecting via UpsertResult::ConsentWithdrawn) — maps directly to the per-mapping rejection reasons in batch sync.

CI Status

  • The status-check rollup for 4166637 shows only Code Quality / Analyze (javascript-typescript): SUCCESS. The repository's Rust CI workflow (fmt / clippy / cargo test --workspace) and JS test workflow do not appear to have run on this head SHA. Please confirm with the project owners and run them locally before merge — the diff (+15,470 / −4,431 across 106 files) is too large to land on a partial check.

StatusCode::PAYLOAD_TOO_LARGE,
"body_too_large",
));
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrench — Body size check runs after the full body is read. req.take_body_bytes() materialises the entire request body into a Vec<u8> before MAX_BODY_SIZE is consulted, so an authenticated partner with rate-limit room can submit a multi-hundred-MB body and pressure the WASM heap before any limit fires.

Suggested fix — short-circuit on Content-Length first, then keep the post-read fallback:

if let Some(cl) = req
    .get_header_str("content-length")
    .and_then(|s| s.parse::<usize>().ok())
{
    if cl > MAX_BODY_SIZE {
        return Ok(error_response(StatusCode::PAYLOAD_TOO_LARGE, "body_too_large"));
    }
}
let body_bytes = req.take_body_bytes();
if body_bytes.len() > MAX_BODY_SIZE {
    return Ok(error_response(StatusCode::PAYLOAD_TOO_LARGE, "body_too_large"));
}

/// This lower bound is only meant to reject obviously bad values; operators
/// are still expected to use a high-entropy random passphrase per the EC
/// setup and key-rotation documentation.
const MIN_PASSPHRASE_LENGTH: usize = 8;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrenchMIN_PASSPHRASE_LENGTH = 8 is too short for the long-lived HMAC-SHA256 key used to derive every visitor's EC ID. NIST SP 800-107 recommends the key be ≥ the security strength (32 bytes for SHA-256). 8 ASCII chars is at most ~56 bits of entropy. The placeholder allowlist doesn't help here — non-placeholder weak keys still validate.

Suggested fix — raise to 32 (or 24 if operator burden is a concern) and document the change in docs/guide/key-rotation.md:

const MIN_PASSPHRASE_LENGTH: usize = 32;

fn drain_pull_batch(kv: &KvIdentityGraph, ec_id: &str, in_flight: &mut Vec<InFlightPull>) {
for pending in in_flight.drain(..) {
let partner_id = pending.partner_id;
let response = match pending.pending.wait() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

questionpending.wait() has no per-partner timeout. With max_concurrency = 3 and post-send dispatch, a single hung partner pins the worker compute window until Fastly's hard cutoff. That hits the per-request compute quota and prevents the next batch from dispatching at all.

Is the intent to rely solely on Fastly's compute cap, or should each wait be bounded (e.g. 2–3s) with timeouts treated as transport errors? If the former, please add a comment documenting the contract; if the latter, switch to wait_with_timeout and log+skip on timeout.


for header in &dynamic_ts_headers {
response.remove_header(header.as_str());
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinking — The dynamic strip removes every x-ts-* response header on consent-denied paths. It's correct today only because the outer finalize_response in main.rs:484-498 re-adds x-ts-version / x-ts-env afterwards. That ordering is load-bearing but documented only in main.rs. A future middleware that intentionally sets a non-EC x-ts-* header before ec_finalize_response runs would silently lose it.

Consider scoping the strip to the registered partner-ID set (using PartnerRegistry) plus the explicit EC_RESPONSE_HEADERS list, instead of the raw prefix match.

.to_ascii_lowercase();

if host == publisher_host || host.ends_with(&format!(".{publisher_host}")) {
return CorsDecision::Allowed(origin.to_owned());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinkinghost is lowercased for the equality check, but CorsDecision::Allowed(origin.to_owned()) reflects the unmodified Origin header. Browsers compare Access-Control-Allow-Origin to Origin byte-exactly, so a request from https://Foo.test-publisher.com would pass server-side validation but be rejected by the browser at the CORS layer.

Real-world browsers send Origin lowercase, so this is a corner case rather than a hot bug — but tightening it (either by lowercasing the reflected value or requiring exact-case match) avoids surprising failures.

})?;

Ok(entry)
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌱 seedlingdeserialize_entry calls entry.validate() on every read. With ~5KB entries and validation cost scaling with seen_domains.len(), a hot /identify path that hits N entries/sec pays the cost in tail latency. Validation already runs on every write via serialize_entry at line 101, so reads can trust the round-trip.

Not a blocker for this PR, but worth a follow-up to gate read-side validation behind a cfg(debug_assertions) or remove it once the schema is settled.

"unsupported KV entry schema version {} (expected {})",
self.v, SCHEMA_VERSION
));
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌱 seedling — Schema version is enforced (any v != SCHEMA_VERSION is a hard read failure), but no migration path is documented. If a v2 schema ever ships, every v1 entry becomes an Err on read until migrated.

Define the migration story now (lazy convert? best-effort tombstone? backfill job?) so a future schema bump isn't an emergency.

.bytes()
.all(|b| b.is_ascii_digit() || (b'a'..=b'f').contains(&b))
&& suffix_part.bytes().all(|b| b.is_ascii_alphanumeric())
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpickis_valid_ec_id allows mixed-case alphanumeric in the suffix but only lowercase hex in the prefix. Intentional, but worth one line of doc on why (suffix comes from the random-charset that includes uppercase; prefix comes from hex::encode which is always lowercase). Otherwise a future reader will read this as inconsistency.

});
continue;
}
let _timestamp = mapping.timestamp;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpicklet _timestamp = mapping.timestamp; is dead code. Either remove it (the field already deserialises into mapping.timestamp and is retained-for-compat per the doc comment), or replace with // timestamp retained for API compat — not used for ordering so the intent is explicit.

cluster_size,
};

let mut response = json_response_with_origin(StatusCode::OK, &body, allowed_origin.as_deref())?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 praise — Clean IdentifyResponse design: degraded is set on KV failure so callers can distinguish "no synced UID" from "KV down" without inspecting status codes; CORS reflection is gated on a strict origin classification; consent enforcement runs after auth so partners get 403 consent denied rather than a generic 401.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants