Support Sourcepoint GPP consent for EC generation#642
Support Sourcepoint GPP consent for EC generation#642ChristianPavilonis wants to merge 80 commits intofeature/edge-cookies-finalfrom
Conversation
- Rename 'Synthetic ID' to 'Edge Cookie (EC)' across all external-facing identifiers, config, internal Rust code, and documentation - Simplify EC hash generation to use only client IP (IPv4 or /64-masked IPv6) with HMAC-SHA256, removing User-Agent, Accept-Language, Accept-Encoding, random_uuid inputs and Handlebars template rendering - Downgrade EC ID generation logs to trace level since client IP and EC IDs are sensitive data - Remove unused counter_store and opid_store config fields and KV store declarations (vestigial from template-based generation) - Remove handlebars dependency Breaking changes: wire field synthetic_fresh → ec_fresh, response headers X-Synthetic-ID → X-TS-EC, cookie synthetic_id → ts-ec, query param synthetic_id → ts-ec, config section [synthetic] → [edge_cookie]. Closes #462
…igration - Add ec/ module with EcContext lifecycle, generation, cookies, and consent - Compute cookie domain from publisher.domain, move EC cookie helpers - Fix auction consent gating, restore cookie_domain for non-EC cookies - Add integration proxy revocation, refactor EC parsing, clean up ec_hash - Remove fresh_id and ec_fresh per EC spec §12.1 - Migrate [edge_cookie] config to [ec] per spec §14
Implement Story 3 (#536): KV-backed identity graph with compare-and-swap concurrency, partner ID upserts, tombstone writes for consent withdrawal, and revive semantics. Includes schema types, metadata, 300s last-seen debounce, and comprehensive unit tests. Also incorporates earlier foundation work: EC module restructure, config migration from [edge_cookie] to [ec], cookie domain computation, consent gating fixes, and integration proxy revocation support.
Implement Story 4 (#537): partner KV store with API key hashing, POST /admin/partners/register with basic-auth protection, strict field validation (ID format, URL allowlists, domain normalization), and pull-sync config validation. Includes index-based API key lookup and comprehensive unit tests.
Implement Story 5 (#538): centralize EC cookie set/delete and KV tombstone writes in finalize_response(), replacing inline mutation scattered across publisher and proxy handlers. Adds consent-withdrawal cleanup, EC header propagation on proxy requests, and docs formatting.
Implement Story 8 (#541): POST /api/v1/sync with Bearer API key auth, per-partner rate limiting, batch size cap, per-mapping validation and rejection reasons, 200/207 response semantics, tolerant Bearer parsing, and KV-abort on store unavailability.
Implement Story 9 (#542): server-to-server pull sync that runs after send_to_client() on organic traffic only. Refactors the Fastly adapter entrypoint from #[fastly::main] to explicit Request::from_client() + send_to_client() to enable post-send background work. Pull sync enumerates pull-enabled partners, checks staleness against pull_sync_ttl_sec, validates URL hosts against the partner allowlist, enforces hourly rate limits, and dispatches concurrent outbound GETs with Bearer auth. Responses with uid:null or 404 are no-ops; valid UIDs are upserted into the identity graph. Includes EC ID format validation to prevent dispatch on spoofed values, partner list_registered() for KV store enumeration, and configurable pull_sync_concurrency (default 3).
Implement Story 11 (#544): Viceroy-driven E2E tests covering full EC lifecycle (generation, pixel sync, identify, batch sync, consent withdrawal, auth rejection). Adds EC test helpers with manual cookie tracking, minimal origin server with graceful shutdown, and required KV store fixtures. Fixes integration build env vars.
Consolidate is_valid_ec_hash and current_timestamp into single canonical definitions to eliminate copy-paste drift across the ec/ module tree. Fix serialization error variants in admin and batch_sync to use Ec instead of Configuration. Add scaling and design-decision documentation for partner store enumeration, rate limiter burstiness, and plaintext pull token storage. Use test constructors consistently in identify and finalize tests.
- Rename ssc_hash → ec_hash in batch sync wire format (§9.3) - Strip x-ts-* prefix headers in copy_custom_headers (§15) - Strip dynamic x-ts-<partner_id> headers in clear_ec_on_response (§5.2) - Add PartnerNotFound and PartnerAuthFailed error variants (§16) - Rename Ec error variant → EdgeCookie (§16) - Validate EC IDs at read time, discard malformed values (§4.2) - Add rotating hourly offset for pull sync partner dispatch (§10.3) - Add _pull_enabled secondary index for O(1+N) pull sync reads (§13.1)
…nd cleanup - Add body size limit (64 KiB) to partner registration - Validate partner UID length (max 512 bytes) in batch sync and sync pixel - Replace linear scan with binary search in encode_eids_header - Use constant-time comparison inline in partner lookup, remove unused verify_api_key - Remove unused PartnerAuthFailed error variant, fix PartnerNotFound → 404 - Add Access-Control-Max-Age CORS header to identify endpoint - Tighten consent-denied integration test to expect only 403 - Add stability doc-comment to normalize_ip - Log warning instead of silent fallback on SystemTime failure
…ror variants Resolve integration issues from rebasing onto feature/ssc-update: - Restore prepare_runtime() and validate_cookie_domain() lost in conflict resolution - Add InsecureDefault error variant and wire reject_placeholder_secrets() into get_settings() - Add sha2/subtle imports for constant-time auth comparison - Fix error match arms (Ec → EdgeCookie, remove nonexistent PartnerAuthFailed) - Fix orchestrator error handling to use send_to_client() pattern - Remove dead cookie helpers superseded by ec/cookies module
Subresource requests (fonts, images, CSS) may omit the Sec-GPC header, causing the server to incorrectly generate ts-ec cookies when the user has opted out via Global Privacy Control. Gate generate_if_needed() on the request Accept header containing text/html so only navigations trigger EC identity creation.
Move admin route matching and basic-auth coverage to /_ts/admin for a hard cutover, and align tests and docs so operational guidance matches runtime behavior.
Addresses issue #612 - spec now correctly documents that the full EC ID ({64-hex}.{6-alnum}) is used as the KV store key, not just the 64-char hash prefix. Changes: - Updated §4.1: ec_hash() now documented as for logging/debugging only - Updated §7.2: KV key description changed from '64-character hex hash' to 'Full EC ID in {64-char hex}.{6-char alphanumeric} format' - Updated §7.3: All KvIdentityGraph method parameters renamed from ec_hash to ec_id with proper documentation - Updated §9.3: Batch sync request field renamed from ec_hash to ec_id - Updated §9.4: Validation and error reasons updated (invalid_ec_hash → invalid_ec_id, ec_hash_not_found → ec_id_not_found) - Updated §10.4: Pull sync URL parameter changed from ec_hash to ec_id - Updated consent pipeline integration throughout to use full EC ID - Updated all rate limiting descriptions (per EC ID, not per hash) Rationale: The random suffix provides uniqueness for users behind the same NAT/proxy infrastructure who would otherwise share identical IP-derived hash prefixes.
Extends EC KV schema for cross-property identity resolution: - Add asn field to GeoInfo (from Fastly geo.as_number()) - Add asn and dma fields to KvGeo for network identification - Add KvDomainVisit and KvPubProperties for consortium-level domain tracking - Add pub_properties field to KvEntry with 50-domain cap - Track publisher domain visits in KvEntry::new() and update_last_seen() - Respect existing 300s debounce for organic requests only All new fields use Option types or serde(default) for backward compatibility. Existing v1 entries continue to deserialize without error.
Implements cluster size evaluation to distinguish individual users from shared networks (VPNs, corporate offices): - Add KvNetwork struct with cluster_size and last_evaluated timestamp - Add network field to KvEntry with TTL-gated cluster rechecks - Add cluster_size to KvMetadata and IdentifyResponse - Implement count_hash_prefix_keys() to list keys with common prefix - Implement evaluate_cluster() on KvIdentityGraph (one-page, 100-key limit) - Call cluster evaluation in handle_identify endpoint - Return cluster_size in JSON body and x-ts-cluster-size header - Add cluster_trust_threshold (default 10) and cluster_recheck_secs (default 3600) config Cluster evaluation uses best-effort semantics: size unknown if list exceeds 100 keys. Cache hit avoids re-evaluation within recheck interval.
Derives coarse browser signals from TLS/H2/UA on every request to gate EC identity operations. Unrecognized clients (known_browser != true) are proxied normally but leave no trace in the identity graph. - Add KvDevice struct (is_mobile, ja4_class, platform_class, h2_fp_hash, known_browser) and device field on KvEntry, written once on creation - Add ec/device.rs with DeviceSignals::derive(), UA parsing, JA4 Section 1 extraction, H2 fingerprint hashing, known browser allowlist (Chrome/ Safari/Firefox) - Add is_mobile and known_browser to KvMetadata for fast propagation checks - Wire DeviceSignals through EcContext to KvEntry creation path - Add bot gate in Fastly adapter: suppress KV graph, ec_finalize_response, and pull sync when known_browser != Some(true)
…bot gate Document all KV schema additions implemented in the preceding commits: geo extensions (asn/dma), publisher domain tracking, network cluster evaluation, device signal derivation, and the bot gate architecture. - Add §7A Device Signals and Bot Gate (signal derivation, allowlist, bot gate behavior matrix, KvDevice write policy, privacy rationale) - Update §7.2 with full KvEntry schema including KvGeo, KvPubProperties, KvDomainVisit, KvDevice, KvNetwork, and extended KvMetadata - Update §2 architecture diagram with Phase 0 bot gate step - Update §4.3 EcContext with device_signals field - Update §5.4 lifecycle with Phase 0 and ec_finalize gating - Update §11 /identify with cluster_size in JSON and x-ts-cluster-size header - Update §14 config with cluster_trust_threshold and cluster_recheck_secs - Update §17.1 main.rs pseudocode with full bot gate wiring
The known_browser fingerprint allowlist (3 entries) was too narrow and blocked legitimate browsers whose JA4/H2 combinations were not listed. Replace the gate with DeviceSignals::looks_like_browser() which checks for signal presence: ja4_class.is_some() && platform_class.is_some(). Real browsers always produce both; raw HTTP clients typically lack one or both. known_browser is still computed and stored on KvDevice for analytics but no longer gates identity operations.
Implement Story 3 (#536): KV-backed identity graph with compare-and-swap concurrency, partner ID upserts, tombstone writes for consent withdrawal, and revive semantics. Includes schema types, metadata, 300s last-seen debounce, and comprehensive unit tests. Also incorporates earlier foundation work: EC module restructure, config migration from [edge_cookie] to [ec], cookie domain computation, consent gating fixes, and integration proxy revocation support.
Implement Story 4 (#537): partner KV store with API key hashing, POST /admin/partners/register with basic-auth protection, strict field validation (ID format, URL allowlists, domain normalization), and pull-sync config validation. Includes index-based API key lookup and comprehensive unit tests.
- Rename ssc_hash → ec_hash in batch sync wire format (§9.3) - Strip x-ts-* prefix headers in copy_custom_headers (§15) - Strip dynamic x-ts-<partner_id> headers in clear_ec_on_response (§5.2) - Add PartnerNotFound and PartnerAuthFailed error variants (§16) - Rename Ec error variant → EdgeCookie (§16) - Validate EC IDs at read time, discard malformed values (§4.2) - Add rotating hourly offset for pull sync partner dispatch (§10.3) - Add _pull_enabled secondary index for O(1+N) pull sync reads (§13.1)
Move admin route matching and basic-auth coverage to /_ts/admin for a hard cutover, and align tests and docs so operational guidance matches runtime behavior.
Prebid's liveIntentIdSystem.js uses a dynamic require() inside a build-flag-guarded branch that their gulp pipeline dead-codes via constant folding. esbuild leaves the require() in the output, causing ReferenceError: require is not defined at browser runtime. Remove from the bundle until we add an esbuild resolver plugin (or switch to Prebid's own build pipeline) — tracked as a follow-up in the design spec.
Introduces TSJS_PREBID_USER_IDS env var (mirroring TSJS_PREBID_ADAPTERS) to control which Prebid User ID submodules are bundled. The hardcoded imports in index.ts are replaced with a generated file written by build-all.mjs at build time, defaulting to the same 13-submodule set. - build-all.mjs: generatePrebidUserIds() validates names, denylists liveIntentIdSystem, and writes _user_ids.generated.ts. Existence check also probes dist/src/public/ to handle modules shipped as .ts in sources (sharedIdSystem). - index.ts: replaces 13 hardcoded submodule imports with import './_user_ids.generated' - _user_ids.generated.ts: committed default with all 13 submodules - Tests: updated mocks and regression guard; added 9 syncPrebidEidsCookie behavior tests - Docs: new "User ID Modules" section in prebid.md with TSJS_PREBID_USER_IDS usage; spec follow-up #1 marked complete
__gpp and __gpp_sid are read by the Rust server over HTTPS; they must be Secure. Also sets Max-Age=86400 (matching ts-eids) so stale consent state doesn't outlast the session, and replaces the legacy expires= deletion pattern with Max-Age=0.
3bbdb1d to
8a6df3a
Compare
9261993 to
d8c943d
Compare
aram356
left a comment
There was a problem hiding this comment.
Summary
Follow-up review focused on areas not covered in the prior pass. Sourcepoint flow (mirror + GPP US decoding + EC gating) is functional and well-tested, but two doc/file-stale issues from the scoping commit (8a6df3af) need correction before merge, and the always-shipped Sourcepoint module has cross-CMP failure modes worth designing around.
PR effective scope note: vs main the diff is 80 commits / 111 files, because the intended base (feature/edge-cookies-final) has merged. The Sourcepoint-specific changes are a small subset (~14 files); reviewers landing on this PR via GitHub should read the description with that in mind.
Blocking
🔧 wrench
- Stale Prebid User ID docs reference removed
TSJS_PREBID_USER_IDSenv var (docs/guide/integrations/prebid.md:226-261) _user_ids.generated.tsclaims to be auto-generated but is now hand-edited (crates/js/lib/src/integrations/prebid/_user_ids.generated.ts:1)
Non-blocking
🤔 thinking
- Sourcepoint hardcoded as always-shipped will clobber
__gppset by other CMPs (registry.rs:792+sourcepoint/index.ts:60-72) - First page load race: mirror runs before Sourcepoint CMP populates localStorage (
sourcepoint/index.ts:91-93) - TCF presence silently overrides explicit GPP US
sale_opt_out=Yesin US states (consent/mod.rs:506-515) - Session-scoped cookie + run-once mirror leaves stale
__gppafter mid-session retraction (sourcepoint/index.ts:39)
♻️ refactor
- Stale comment displaced by
us_sale_opt_outinsertion (consent/gpp.rs:74-75) - Make clearing logic CMP-safe by tracking write source (
sourcepoint/index.ts:42-46)
⛏ nitpick
- Unused/inconsistent default export (
sourcepoint/index.ts:95)
CI Status
- fmt: PASS
- clippy: PASS
- rust tests: PASS (1001 lib + 21 misc)
- js tests: PASS (294)
| ### How it works | ||
|
|
||
| 1. `userId.js` is statically imported in `index.ts` — always bundled, not operator-configurable. | ||
| 2. The set of ID submodules is controlled by `TSJS_PREBID_USER_IDS` at build time and emitted into `_user_ids.generated.ts`. |
There was a problem hiding this comment.
🔧 wrench — Stale Prebid User ID docs reference a removed env var.
The "Scope Sourcepoint consent PR" commit (8a6df3af) removed generatePrebidUserIds() from build-all.mjs along with the Prebid User ID design/plan docs. But this guide still claims:
- L233: "controlled by
TSJS_PREBID_USER_IDSat build time and emitted into_user_ids.generated.ts" - L247–250: examples of slim builds via the env var
- L253: "
build-all.mjsvalidates that each exists … and generates_user_ids.generated.ts" - L256: claims a
liveIntentIdSystemdenylist exists
None of this is true after the scoping commit. The env var is silently ignored. Operators will set it expecting a slim build and ship the full bundle instead.
Fix: delete the "User ID Modules" section (or at least the env var subsections + denylist warning), keeping only what's accurate post-scoping. If the build-time configurability is meant to come back, hold these docs until the generator returns.
| @@ -0,0 +1,22 @@ | |||
| // Auto-generated by build-all.mjs — manual edits will be overwritten at build time. | |||
There was a problem hiding this comment.
🔧 wrench — File header claims it is auto-generated, but it is not.
// Auto-generated by build-all.mjs — manual edits will be overwritten at build time.
The 8a6df3af scoping commit removed generatePrebidUserIds() from build-all.mjs. This file is now hand-edited but still warns future contributors that their edits will be overwritten — which will either discourage edits or cause confusion when the rebuild does not in fact overwrite them.
Fix: either (a) restore the generator in build-all.mjs (preferred, since the doc still pitches TSJS_PREBID_USER_IDS as a feature), or (b) rename to _user_ids.ts and replace the header with a static-file comment that documents the curated default and the rationale for liveIntentIdSystem exclusion.
| const JS_EXCLUDED: &[&str] = &["nextjs", "aps", "adserver_mock"]; | ||
| // JS-only modules always included (no Rust-side registration) | ||
| const JS_ALWAYS: &[&str] = &["creative"]; | ||
| const JS_ALWAYS: &[&str] = &["creative", "sourcepoint"]; |
There was a problem hiding this comment.
🤔 thinking — Sourcepoint hardcoded as always-shipped will clobber __gpp/__gpp_sid set by other CMPs.
const JS_ALWAYS: &[&str] = &["creative", "sourcepoint"];Every Trusted Server deployment ships and auto-runs the Sourcepoint module on every page load — regardless of which CMP the publisher uses. When no _sp_user_consent_* localStorage entry is found, the module unconditionally clears __gpp and __gpp_sid cookies (see crates/js/lib/src/integrations/sourcepoint/index.ts:60-72). Publishers using OneTrust, Didomi, or any CMP that writes __gpp directly to cookies (allowed by the GPP spec) will have those cookies erased on every page load.
The design doc says "same pattern as creative," but creative is universally applicable while sourcepoint is publisher-CMP-specific.
Suggested fixes:
- (a) Make sourcepoint opt-in via
[integrations.sourcepoint] enabled = truewith a small RustIntegrationRegistrationwhose only job is to gate JS inclusion. - (b) Only clear cookies the module previously wrote — track via a marker (e.g.
_ts_gpp_src=sp) written alongside__gppand check it before clearing.
| } | ||
|
|
||
| if (typeof window !== 'undefined') { | ||
| mirrorSourcepointConsent(); |
There was a problem hiding this comment.
🤔 thinking — First page load race: mirror runs at script load before Sourcepoint CMP populates localStorage.
The integration runs once at script load and is included in the immediate (synchronous) bundle. If the bundle loads in <head> before Sourcepoint's CMP script populates _sp_user_consent_*, the mirror reads empty localStorage, writes Max-Age=0 clearing cookies for __gpp/__gpp_sid, and gives up.
Combined with allows_ec_creation fail-closed for UsState with no signals (consent/mod.rs:520-523), users in regulated US states get EC blocked on first page load even though Sourcepoint is properly installed.
The design spec ("integration runs once — no polling or event listeners") acknowledges this but underestimates the impact for first-time visitors.
Suggested fixes:
- Register sourcepoint with
with_deferred_js()so it loads after the CMP. - Listen for Sourcepoint's
__sp__consent-ready event. - Retry once on
DOMContentLoaded/ short timer when initial scan finds nothing.
| // When a CMP uses TCF in the US (e.g. Didomi), respect the | ||
| // TCF Purpose 1 decision — this is an explicit opt-in signal. | ||
| if let Some(tcf) = effective_tcf(ctx) { | ||
| return tcf.has_storage_consent(); |
There was a problem hiding this comment.
🤔 thinking — TCF presence silently overrides explicit GPP US sale_opt_out=Yes in US states.
if let Some(tcf) = effective_tcf(ctx) {
return tcf.has_storage_consent();
}
// Check GPP US section for sale opt-out.In Jurisdiction::UsState, effective_tcf(ctx) short-circuits with tcf.has_storage_consent() and never falls through to the GPP US opt-out check. The test ec_us_state_tcf_takes_priority_over_gpp_us (L1228) asserts this is intentional.
It is defensible if you treat TCF Purpose 1 as the authoritative storage opt-in for the EC cookie itself — but the user has explicitly told the CMP "don't sell my info" via the GPP US section, and EC is itself a tracking identifier. Some CMPs (Didomi, Sourcepoint) will write both sections in the US for spec compliance. The current precedence may surprise privacy-counsel reviewers.
Suggested fixes:
- Document this precedence decision explicitly in
docs/superpowers/specs/2026-04-15-sourcepoint-gpp-consent-design.md(currently silent on the conflict). - Add a config knob:
consent.us_tcf_priority = strict|permissiveso publishers in stricter regimes can opt into US-section-wins behavior.
| } | ||
|
|
||
| function writeCookie(name: string, value: string): void { | ||
| document.cookie = `${name}=${value}; path=/; Secure; SameSite=Lax`; |
There was a problem hiding this comment.
🤔 thinking — Session-scoped cookie + run-once mirror leaves stale __gpp after mid-session retraction.
document.cookie = `${name}=${value}; path=/; Secure; SameSite=Lax`;writeCookie deliberately omits Max-Age (per spec). Combined with "runs once on script load," if the user retracts consent mid-session via the Sourcepoint CMP UI, the localStorage entry updates but __gpp does not refresh until the next full page navigation. Subsequent same-tab requests read stale consent.
Suggested fix: hook Sourcepoint's onConsentUpdate callback or rerun the mirror on visibilitychange so the cookie tracks localStorage within a session.
| let eu_tcf = decode_tcf_from_gpp(&parsed); | ||
|
|
||
| // The GPP header version is always 1 for current spec. | ||
| let us_sale_opt_out = decode_us_sale_opt_out(&parsed); |
There was a problem hiding this comment.
♻️ refactor — Stale comment displaced by us_sale_opt_out insertion.
// The GPP header version is always 1 for current spec.
let us_sale_opt_out = decode_us_sale_opt_out(&parsed);The comment was originally above version: 1 inside the struct literal, but the new us_sale_opt_out line was inserted between them. It now reads as if it documents us_sale_opt_out.
Fix: move the comment back next to version: 1, in the struct literal, or delete it — version: 1 is self-documenting.
| // Trusted Server is the only intended writer for these mirrored cookies, so | ||
| // clearing the origin-scoped cookie is sufficient for this integration. | ||
| document.cookie = `${name}=; path=/; Secure; SameSite=Lax; Max-Age=0`; | ||
| } |
There was a problem hiding this comment.
♻️ refactor — Make the clearing logic CMP-safe by tracking write source.
If finding #3 (always-shipped sourcepoint) cannot be addressed in this PR, at least make the clearing safe in mixed-CMP scenarios:
- On
writeCookie, also write a marker cookie:_ts_gpp_src=sp; path=/; Secure; SameSite=Lax. - In
clearCookie, checkdocument.cookiefor_ts_gpp_src=spand skip clearing if absent.
This way, a publisher using a different CMP that happens to ship this module will not have their CMP's __gpp cookies clobbered.
| mirrorSourcepointConsent(); | ||
| } | ||
|
|
||
| export default mirrorSourcepointConsent; |
There was a problem hiding this comment.
⛏ nitpick — Unused/inconsistent default export.
export default mirrorSourcepointConsent;Nothing imports the default — the module self-initializes via if (typeof window !== 'undefined') at L91-93, and tests use the named export. No other Trusted Server integration exports a default. Drop it for consistency.
Summary
_sp_user_consent_*from localStorage and mirrors GPP consent into__gpp/__gpp_sidcookiessale_opt_outfrom US GPP sections (IDs 7–23)allows_ec_creation()between existing TCF andus_privacychecksCloses #640
Test plan
cargo test --workspace— 992 tests including 8 new)npx vitest run— 288 tests including 6 new)🤖 Generated with Claude Code