Skip to content

Add Sourcepoint first-party CMP integration#625

Open
ChristianPavilonis wants to merge 19 commits intomainfrom
feature/sourcepoint-integration
Open

Add Sourcepoint first-party CMP integration#625
ChristianPavilonis wants to merge 19 commits intomainfrom
feature/sourcepoint-integration

Conversation

@ChristianPavilonis
Copy link
Copy Markdown
Collaborator

@ChristianPavilonis ChristianPavilonis commented Apr 8, 2026

Summary

  • Adds a new Sourcepoint CMP (Consent Management Platform) first-party proxy integration that routes cdn.privacy-mgmt.com and geo.privacymanager.io traffic through Trusted Server, eliminating third-party requests for consent management assets.
  • Implements four rewriting layers (HTML attributes, JS response bodies, runtime config trap, client-side DOM guard) to ensure comprehensive URL coverage across static markup, dynamically loaded scripts, and Next.js hydration payloads.
  • Ships disabled by default with full TOML configuration, integration documentation, 14 Rust unit tests, and 6 TypeScript tests.

Changes

File Change
crates/trusted-server-core/src/integrations/sourcepoint.rs New ~860-line integration module: route registration, CDN/geo proxy handlers, HTML attribute rewriter, JS body rewriter (rewrite_script_content), head injector (window._sp_ property trap), Accept-Encoding scoping, backend selection, and 14 unit tests
crates/trusted-server-core/src/integrations/mod.rs Register sourcepoint module and wire it into the integration registry
crates/js/lib/src/integrations/sourcepoint/index.ts TypeScript entry point that bootstraps the client-side script guard
crates/js/lib/src/integrations/sourcepoint/script_guard.ts Client-side DOM insertion dispatcher handler that intercepts dynamically inserted <script>/<link> elements pointing at Sourcepoint CDN and rewrites them to first-party paths
crates/js/lib/test/integrations/sourcepoint/script_guard.test.ts 6 Vitest tests covering the script guard (rewrites, skips, and teardown)
docs/guide/integrations/sourcepoint.md Integration documentation: overview, setup steps, config reference, architecture
docs/guide/integrations-overview.md Updated overview with Sourcepoint in the Mermaid flowchart, performance characteristics table, and environment variables section
trusted-server.toml Added [integrations.sourcepoint] config stanza (disabled by default)

Closes

Closes #145
Closes #344
Closes #345

Test plan

  • cargo test --workspace
  • cargo clippy --workspace --all-targets --all-features -- -D warnings
  • cargo fmt --all -- --check
  • JS tests: cd crates/js/lib && npx vitest run
  • JS format: cd crates/js/lib && npm run format
  • Docs format: cd docs && npm run format
  • WASM build: cargo build --package trusted-server-adapter-fastly --release --target wasm32-wasip1
  • Manual testing via fastly compute serve

Checklist

  • Changes follow CLAUDE.md conventions
  • No unwrap() in production code — use expect("should ...")
  • Uses log macros (not println!)
  • New code has tests
  • No secrets or credentials committed

@ChristianPavilonis ChristianPavilonis self-assigned this Apr 8, 2026
Comment thread crates/js/lib/src/integrations/sourcepoint/script_guard.ts Fixed
Comment thread crates/js/lib/src/integrations/sourcepoint/script_guard.ts Fixed
Comment thread crates/js/lib/src/integrations/sourcepoint/script_guard.ts Fixed
@aram356 aram356 requested review from aram356 and prk-Jr April 9, 2026 16:36
Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Adds a well-structured Sourcepoint CMP first-party proxy with four rewriting layers (HTML attributes, JS body, runtime config trap, client-side DOM guard). The implementation follows existing integration patterns closely, has 14 Rust + 6 JS tests, and ships disabled by default. A few items need attention before merge.

Blocking

🔧 wrench

  • take_body_str() panics on non-UTF-8 upstream responses: response.take_body_str() at line 400 will panic if the upstream CDN returns invalid UTF-8 with a JS Content-Type. Read bytes and attempt conversion with a fallback instead. (sourcepoint.rs:400)
  • CI failure — format-typescript: Prettier check is failing on test/integrations/sourcepoint/script_guard.test.ts. Must be fixed before merge.

❓ question

  • Redirect Location headers not rewritten: The JS body rewrite is gated on status == 200. If the CDN returns a 3xx redirect with a Location pointing to cdn.privacy-mgmt.com, the browser follows it back to the third-party CDN, defeating the proxy. (sourcepoint.rs:394)

Non-blocking

🤔 thinking

  • Upstream response headers forwarded verbatim: Set-Cookie and other headers from cdn.privacy-mgmt.com pass through to the client on the non-rewritten path. Same as Didomi, but worth confirming this is intended. (sourcepoint.rs:418)
  • Regex matches mismatched quotes: The CDN URL pattern can match 'url" (single open, double close). Extremely unlikely in minified JS but could produce malformed output. (sourcepoint.rs:65)

♻️ refactor

  • Redundant rewrite_sdk guard: Checked in both handles_attribute() and rewrite() — the second is unreachable. (sourcepoint.rs:437)
  • Head injector config property names: Hardcoded Sourcepoint config keys could go stale. A test asserting known property names appear in the generated script would catch omissions. (sourcepoint.rs:467)

🌱 seedling

  • No body size limit: The body is read entirely into memory with no upper bound. A Content-Length check would prevent OOM from unexpected large responses. (sourcepoint.rs:400)
  • Missing validation test for invalid cdn_origin: No test proves that register() fails when cdn_origin is not a valid URL. (sourcepoint.rs:153)

⛏ nitpick

  • Manual [sourcepoint] log prefix: Other integrations rely on the log crate's module path rather than a manual bracketed prefix. (sourcepoint.rs:355)

CI Status

  • cargo fmt: PASS
  • cargo test: PASS
  • vitest: PASS
  • format-typescript: FAIL (Prettier issue in script_guard.test.ts)
  • format-docs: PASS
  • browser integration tests: PASS
  • integration tests: PASS
  • CodeQL: FAIL (likely infra)

Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Copy link
Copy Markdown
Collaborator

@prk-Jr prk-Jr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed commit a211eb0. Three blocking issues require fixes before merge; the rest are non-blocking.


Blocking (must fix before merge)

B1 — CI: format-typescript still failing on current HEAD

The latest commit (a211eb02) re-introduced a Prettier violation in crates/js/lib/test/integrations/sourcepoint/script_guard.test.ts. CI's format-typescript check will fail on this commit. Fix with:

cd crates/js/lib && npm run format -- --write

then commit the result.

B2 — SSRF via cdn_origin (inline comment on line 96)

See inline comment. A syntactically valid but host-unrestricted cdn_origin allows proxying to any IP including 169.254.169.254 (cloud metadata). Needs a custom host validator.

B3 — PUT/PATCH unreachable (inline comment on line 337)

routes() only registers GET and POST; the PUT/PATCH arms in handle() are dead code. Either register the methods or remove the arms.


Important (non-blocking)

I1 — Single-quoted '/unified/' missed by SP_ORIGIN_UNIFIED_PATTERN (inline on line 81) — fix is a one-character change to the regex character class.

I2 — BackendConfig::from_url per request (inline on line 357) — pre-compute in new() following the pattern in aps.rs / prebid.rs.

I3 — PR checklist says "Uses tracing macros" but codebase uses log crate

The checklist item reads "Uses tracing macros (not println!)" but CLAUDE.md specifies log. The source correctly uses log::info! throughout. The checklist entry should read "Uses log macros (not println!)" to avoid confusion for future contributors.

I4 — JS-rewrite path drops all upstream headers (inline on line 403) — Response::new() discards Vary, CORS, and security headers. Start from response.clone_without_body() instead.

I5 — No test for build_target_url with a path-bearing cdn_origin (inline on line 337) — set_path silently drops any path prefix in cdn_origin. Should be tested and either fixed or documented as an explicit constraint.


Suggestions (non-blocking)

S1 — URL normalisation logic duplicated across Rust and TypeScript

parse_sourcepoint_url in sourcepoint.rs and normalizeSourcepointUrl in script_guard.ts implement identical protocol-relative URL normalisation in parallel. Not a defect, but a cross-reference comment in each file would prevent future fixes being applied to only one side.

S2 — Redundant rewrite_sdk guard in rewrite() (inline on line 437) — unreachable because handles_attribute already gates on it. Either remove with a comment or document the intentional belt-and-suspenders.

S3 — No test for single-quoted CDN URLs (inline on line 81)

S4 — Unchecked WASM build in PR checklist

The test plan has an unchecked - [ ] WASM build item, but CI's WASM build passes. The checkbox should be checked to accurately reflect build status.

S5 — register missing # Examples doc section (inline on line 314) — CLAUDE.md requires this for all public API functions.

Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
ChristianPavilonis and others added 3 commits April 10, 2026 10:48
- Replace #[validate(url)] with a custom validator that restricts
  cdn_origin to *.privacy-mgmt.com hosts, preventing SSRF via
  arbitrary origins (e.g. cloud metadata endpoints).
- Remove unreachable PUT/PATCH arms from the request body match
  since routes() only registers GET and POST.
- Fix Prettier formatting in script_guard.test.ts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace take_body_str() with take_body_bytes() + String::from_utf8()
  to avoid panicking on non-UTF-8 upstream responses.
- Rewrite Location headers on 3xx redirects that point to
  cdn.privacy-mgmt.com so browsers stay on the first-party proxy.
- Preserve upstream CORS headers on the JS-rewrite path instead of
  discarding them when building a fresh Response.
- Extend SP_ORIGIN_UNIFIED_PATTERN to match both single- and
  double-quoted "/unified/" chunk paths, preserving the original
  quote character in the replacement.
- Normalise log prefixes from [sourcepoint] to Sourcepoint: for
  consistency with APS/Prebid style.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove redundant rewrite_sdk check from rewrite() since
  handles_attribute() already gates on it; update test to verify
  the guard at the handles_attribute level.
- Add # Examples section to register() per documentation standards.
- Add tests for cdn_origin validation (rejects non-privacy-mgmt.com
  hosts, accepts valid origins).
- Add test for single-quoted origin+'/unified/' rewrite pattern.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ChristianPavilonis
Copy link
Copy Markdown
Collaborator Author

Review feedback addressed

All blocking and important findings have been fixed across three commits. Replies posted on each inline thread.

Blocking (fixed)

  • SSRF guard — Custom validate_cdn_origin() restricts host to *.privacy-mgmt.com
  • PUT/PATCH dead code — Removed unreachable arms
  • Prettier — Formatting fixed

Important non-blocking (fixed)

  • UTF-8 safetytake_body_bytes() + String::from_utf8() with graceful fallback
  • Redirect rewriting — 3xx Location headers containing CDN host are rewritten
  • CORS headers — JS-rewrite path now preserves upstream CORS headers
  • Single-quote regexSP_ORIGIN_UNIFIED_PATTERN handles both ' and " quotes
  • Log prefix — Normalised to Sourcepoint: style

Suggestions (fixed)

  • Removed redundant rewrite_sdk guard, added invariant comment
  • Added # Examples to register() docs
  • Added validation tests for cdn_origin
  • Added test for single-quoted unified path

Intentionally skipped (with rationale in thread replies)

  • Regex mismatched quotes — documented limitation, no real-world impact
  • BackendConfig::from_url per-request — matches APS/Prebid pattern
  • Body size limit — future enhancement
  • Config property name test — covered by existing head injector test

All CI checks pass locally: cargo fmt, clippy, cargo test, vitest, Prettier.

Refactor normalizeSourcepointUrl to remove the bare-domain startsWith
check that triggered CodeQL "Incomplete URL substring sanitization"
alerts. The host === exact match was already the security boundary;
now the normalization layer no longer references the CDN hostname at
all, eliminating the static analysis finding.

Add a Content-Length guard (5 MB) before reading upstream response
bodies into memory for JavaScript rewriting, preventing unbounded
memory consumption from unexpectedly large responses.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@aram356 aram356 linked an issue Apr 13, 2026 that may be closed by this pull request
Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Well-executed revision — every prior blocker is addressed and the four-layer rewriting architecture (HTML attrs / JS body / window._sp_ trap / client DOM guard) is cleanly separated. Two remaining correctness issues in the upstream response handling need fixes before merge: the body-size guard is bypassed when Content-Length is absent, and the UTF-8 fallback path silently drops upstream headers. A few non-blocking refinements around redirect rewriting, config scope, and test coverage follow.

Blocking

🔧 wrench

  • Body-size guard bypassed on chunked/unknown-length responsesMAX_REWRITE_BODY_SIZE only applies when Content-Length is present; otherwise take_body_bytes() reads unbounded (sourcepoint.rs:456-470)
  • UTF-8 fallback drops upstream headers — the fallback constructs a fresh Response with only status + Content-Type, discarding Set-Cookie, CORS, Cache-Control, Vary, etc., diverging from the passthrough path (sourcepoint.rs:480-488)

Non-blocking

♻️ refactor

  • Redirect Location rewrite is a literal String::replace — misses protocol-relative //cdn.privacy-mgmt.com/... values (sourcepoint.rs:430-438)
  • validate_cdn_origin does not check schemeftp://cdn.privacy-mgmt.com passes validation and only fails at the first proxied request (sourcepoint.rs:135-150)

🤔 thinking

  • Config/rewriter host scope mismatchvalidate_cdn_origin accepts *.privacy-mgmt.com but rewriters hardcode cdn.privacy-mgmt.com; configuring a different subdomain silently half-enables the integration (sourcepoint.rs:143, 69, 197, 598)
  • Rewritten-JS path hard-codes Cache-Control — diverges from the passthrough path's apply_cache_headers policy without explanation (sourcepoint.rs:498-501)
  • Property trap only catches reassignment of window._sp_ — nested mutation (window._sp_.config.X = ...) is not patched; also s.replace() in the JS helper is first-occurrence only (sourcepoint.rs:568-600)

🌱 seedling

  • copy_headers forwards Authorization — credential-leak footgun if publishers reuse bearer tokens across origins (sourcepoint.rs:222-233)
  • No direct test for handle() — redirect rewriting, UTF-8 fallback, Content-Length guard, Content-Type gating all lack unit coverage (sourcepoint.rs:372-522)

👍 praise

  • Solid response to the prior review — every prior 🔧/❓ addressed with well-chosen fixes; four-layer rewriting architecture is cleanly separated and well documented

CI Status

  • cargo fmt: PASS
  • cargo clippy: PASS
  • cargo test: PASS
  • vitest: PASS
  • format-typescript: PASS
  • format-docs: PASS
  • browser integration tests: PASS
  • CodeQL: PASS

Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
…t rewrite

- Skip JS rewrite when Content-Length is absent (chunked/unknown) to
  prevent unbounded memory reads
- Reuse original response on UTF-8 fallback to preserve upstream headers
- Extract redirect Location rewriting into a URL-parsing helper that
  handles absolute, protocol-relative, and relative redirects
- Tighten cdn_origin validation to exactly cdn.privacy-mgmt.com and
  reject non-HTTP(S) schemes
- Drop Authorization from forwarded headers to prevent credential leakage
- Document Cache-Control divergence and property trap limitations
- Add 7 new tests for validation and redirect rewriting
Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Round-3 blockers are all correctly addressed — UTF-8 fallback now reuses the original response (preserving headers), Content-Length-missing short-circuits the rewrite before take_body_bytes, rewrite_redirect_location handles absolute / protocol-relative / relative via Url::join, validate_cdn_origin checks scheme and pins the host, and Authorization is excluded from forwarded headers. No remaining 🔧/❓ findings. Submitting as change-request per author request; the items below are refinements (🤔/♻️/🌱/⛏) rather than correctness blockers — each can be resolved or deferred to a follow-up at the author's discretion.

Non-blocking

♻️ refactor

  • apply_cache_headers invoked at five return sites — redirect branch, CL-too-large branch, CL-missing branch, UTF-8-fallback branch, and the fallthrough (sourcepoint.rs:476, 505, 513, 528, 567). Restructuring handle() around a single exit point (or wrapping the proxied Response in a post-handle adapter) would remove the per-branch bookkeeping. Low priority.

🌱 seedling

  • No direct test for handle() (sourcepoint.rs:414-569) — redirect rewriting, UTF-8 fallback, CL-missing skip, CL-too-large skip, and non-JS passthrough branches all lack unit coverage. Acknowledged in the round-3 response; tracking here for follow-up.
  • CSP interaction not documented (docs/guide/integrations/sourcepoint.md) — publishers with Content-Security-Policy: script-src https://cdn.privacy-mgmt.com will see rewritten first-party scripts blocked the moment they enable the integration. A note under "HTML Rewriting" calling out that CSP must allow the first-party host would save an incident report.

CI Status

  • cargo fmt: PASS
  • cargo clippy: PASS
  • cargo test: PASS
  • vitest: PASS
  • format-typescript: PASS
  • format-docs: PASS
  • browser integration tests: PASS
  • CodeQL: PASS

Comment thread crates/js/lib/src/integrations/sourcepoint/index.ts
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/js/lib/src/integrations/sourcepoint/script_guard.ts
@ChristianPavilonis ChristianPavilonis removed the request for review from prk-Jr April 21, 2026 13:59
@ChristianPavilonis
Copy link
Copy Markdown
Collaborator Author

Implemented Sourcepoint follow-up updates and pushed commit.

What I changed:

  • Added __tsjs_sourcepoint runtime config bootstrap in Sourcepoint head inject and gated client-side guard installation with it.
  • Sourcepoint head injection now always emits config script; runtime trap is only included when rewrite_sdk is enabled.
  • Added strict cdn_origin validation to reject URL paths.
  • Preserved response headers on JS rewrite success path while normalizing encoding headers and content type.
  • Added regression coverage for the above behaviors.

I also resolved remaining open review threads and re-requested review from @prk-Jr.

Validation:

  • cargo test --workspace
  • cargo clippy --package trusted-server-core --all-targets --all-features -- -D warnings
  • cargo fmt --all -- --check
  • cd crates/js/lib && npx vitest run
  • cd crates/js/lib && npm run -s format -- --check

Commit: e2fdb0c

Copy link
Copy Markdown
Collaborator

@prk-Jr prk-Jr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

The Sourcepoint integration covers the expected rewrite layers, but I found two blocking issues around cookie semantics plus one follow-up consistency concern in the JS guard.

Blocking

🔧 wrench

  • Sourcepoint cookie state never makes the round trip: the proxy does not forward any Cookie header upstream, so Sourcepoint cookies previously set on the first-party endpoint are never sent back to Sourcepoint. That breaks the main baseEndpoint / CNAME flow and authenticated-consent-style cookie-backed state. (crates/trusted-server-core/src/integrations/sourcepoint.rs:256)

  • Cookie-setting Sourcepoint responses are still marked public-cacheable: both the passthrough path and the rewritten-JS path can return Cache-Control: public, max-age=... even when upstream sends Set-Cookie. If Sourcepoint uses server-set cookies on the first-party endpoint, that risks caching user-specific consent state. (crates/trusted-server-core/src/integrations/sourcepoint.rs:269, crates/trusted-server-core/src/integrations/sourcepoint.rs:377)

Non-blocking

🤔 thinking

  • JS and Rust host checks diverge on explicit ports: the client-side DOM guard compares URL.host, while the Rust rewriter compares hostname-only. That creates inconsistent behavior for URLs like https://cdn.privacy-mgmt.com:443/.... (crates/js/lib/src/integrations/sourcepoint/script_guard.ts:35, crates/trusted-server-core/src/integrations/sourcepoint.rs:228)

CI Status

  • fmt: PASS
  • clippy: FAIL
  • rust tests: FAIL
  • js tests: PASS
  • integration artifact prep: FAIL

Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/js/lib/src/integrations/sourcepoint/script_guard.ts Outdated
@ChristianPavilonis
Copy link
Copy Markdown
Collaborator Author

Addressed the latest review round in commit 1a40868 and pushed to feature/sourcepoint-integration.

What changed:

  • Forwarded a narrowly scoped Sourcepoint cookie allowlist upstream instead of dropping the Cookie header
  • Added optional auth_cookie_name support for authenticated-consent deployments
  • Forced cookie-bearing Sourcepoint responses to Cache-Control: private, no-store on both passthrough and rewritten-JS paths
  • Aligned the client-side Sourcepoint guard with Rust by switching from URL.host to URL.hostname
  • Added Rust + Vitest regression coverage and updated Sourcepoint docs/sample config

Validation:

  • cargo fmt --all -- --check
  • cargo clippy --workspace --all-targets --all-features -- -D warnings
  • cargo test --workspace
  • cd crates/js/lib && npx vitest run

I replied to and resolved the remaining review threads and re-requested review from @prk-Jr and @aram356.

Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Four-layer rewriting architecture (HTML attrs / JS body regex / window._sp_ trap / client DOM guard) is well-structured, and most prior 🔧 / ❓ findings have been addressed across rounds 1–4 (SSRF guard, UTF-8 safety, redirect rewriting, cookie allowlist, no-store on cookie responses, hostname-only host checks, scheme/host/path/query/fragment validation). However, a stale merge from main introduced a hard build break that fails three CI jobs.

Blocking

🔧 wrench

  • Build break: IntegrationProxy::handle signature drifted from the trait#581 added services: &RuntimeServices to the trait; the merge from main did not update Sourcepoint's impl. cargo fmt, cargo test, and prepare integration artifacts all fail with E0050. (sourcepoint.rs:540-549)
  • Use platform-agnostic client_ip from RuntimeServicescopy_headers calls Fastly-specific original_req.get_client_ip_addr(). With RuntimeServices now plumbed through, switch to services.client_info().client_ip to match Didomi's pattern and keep trusted-server-core portable. (sourcepoint.rs:269-272)

Non-blocking

🤔 thinking

  • Property-trap config field allowlist is hardcoded and silently goes stale — only baseEndpoint, mmsDomain, wrapperAPIOrigin, cmpOrigin, and metricUrl are patched. New Sourcepoint fields would silently leak. A snapshot test asserting the exact patched set would force review when the list changes. (sourcepoint.rs:735-742)
  • auth_cookie_name is unvalidated — empty string, whitespace, or ; / = characters are accepted. Empty string especially: a malformed cookie pair =value parses to name "", which auth_cookie_name = "" would forward. Add #[validate] (non-empty trimmed, length ≤ 64, [A-Za-z0-9_-]). (sourcepoint.rs:124-129)

♻️ refactor

  • is_likely_javascript_path matches non-JS API paths/wrapper/v2/* is JSON but is treated as JS-likely, forcing Accept-Encoding: identity for every CMP API hit. Tightening the heuristic (e.g. only .js and /unified/) preserves the optimisation only where applicable. (sourcepoint.rs:431-433)

📌 out of scope (carryover from earlier rounds, still open)

  • No direct unit test for handle() — redirect rewriting, UTF-8 fallback, Content-Length missing/too-large branches still lack coverage.
  • apply_cache_headers invoked at five return sites in handle() — restructuring around a single exit point would remove the per-branch bookkeeping.
  • CSP interaction undocumented — publishers with Content-Security-Policy: script-src https://cdn.privacy-mgmt.com will see rewritten first-party scripts blocked the moment they enable rewriting. A note in docs/guide/integrations/sourcepoint.md under "HTML Rewriting" would save an incident report.

CI Status

  • cargo fmt: FAIL (caused by trait-signature drift)
  • cargo test: FAIL (caused by trait-signature drift)
  • prepare integration artifacts: FAIL (caused by trait-signature drift)
  • vitest: PASS
  • format-typescript: PASS
  • format-docs: PASS
  • CodeQL: PASS
  • Analyze (rust / actions / javascript-typescript): PASS

Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Fifth-round review against main. CI is fully green and the prior 84 review comments across 4 rounds have all been addressed. New findings below are incremental hardening — one cache-poisoning concern that depends on upstream Cache-Control behavior and several non-blocking quality observations.

Blocking

🔧 wrench

  • Cache poisoning risk: forwarded Cookie not reflected in Vary for default cache headers (sourcepoint.rs:386)

Non-blocking

🤔 thinking

  • Inline <script> IIFE has no try/catch — can crash publisher hydration (sourcepoint.rs:784)
  • JS-rewrite success path leaves stale Vary: Accept-Encoding (sourcepoint.rs:499)
  • BackendConfig::from_url called per-request even though cdn_origin is now static (sourcepoint.rs:593)
  • apply_cache_headers is a no-op for redirects without Set-Cookie (sourcepoint.rs:637)

♻️ refactor

  • Duplicated Set-Cookie → private, no-store policy between apply_cache_headers and rewrite_javascript_response (sourcepoint.rs:492)
  • parse_sourcepoint_url's bare-host branch uses literal starts_with (sourcepoint.rs:516)

🌱 seedling

  • validate_auth_cookie_name does not reject reserved RFC 6265 attribute names (sourcepoint.rs:235)
  • is_likely_javascript_path does not include .mjs (sourcepoint.rs:467)

CI Status

  • fmt: PASS
  • clippy: PASS
  • rust tests: PASS
  • js tests: PASS
  • CodeQL: PASS
  • browser/integration tests: PASS

format!("public, max-age={}", self.config.cache_ttl_seconds),
);
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrench — Cache poisoning risk: forwarded Cookie not reflected in Vary for default cache headers

copy_headers (line 327-330) forwards an allowlisted Cookie header upstream. On the non-rewrite, non-3xx response path, when upstream omits both Set-Cookie and Cache-Control, this method stamps public, max-age=cache_ttl_seconds (default 3600). Without Vary: Cookie, Fastly's edge can cache a response that varied on consent state and serve it cross-user.

For /unified/*.js chunks this is fine (truly static), but /wrapper/v2/messages and similar endpoints can vary by consentUUID. Today the integration is shielded by Sourcepoint typically returning Cache-Control: no-store on personalized endpoints — so safety here depends on upstream behavior, not on our policy.

Fix: Track whether filtered_sourcepoint_cookie_header(...) returned Some(_) for the request, and either (a) skip the default public Cache-Control entirely on the cookie-forwarded path, or (b) stamp private, max-age=0 instead. Sketch:

fn apply_cache_headers(&self, response: &mut Response, forwarded_cookies: bool) {
    if Self::response_sets_cookie(response) {
        response.set_header(header::CACHE_CONTROL, "private, no-store");
        return;
    }
    if response.get_header(header::CACHE_CONTROL).is_none()
        && response.get_status().is_success()
    {
        if forwarded_cookies {
            response.set_header(header::CACHE_CONTROL, "private, max-age=0");
        } else {
            response.set_header(
                header::CACHE_CONTROL,
                format!("public, max-age={}", self.config.cache_ttl_seconds),
            );
        }
    }
}

"configurable:true,",
"get:function(){{return v}},",
"set:function(n){{v=p(n)}}",
"}});",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinking — Inline <script> IIFE has no try/catch — can crash publisher hydration

Object.defineProperty(window, "_sp_", …) will throw a TypeError if _sp_ was already defined as configurable: false by another script that executed earlier (publisher polyfill, browser extension, or a re-injection of this script). Since this IIFE is at the top of <head>, an unhandled throw aborts the script tag and may cascade through hydration.

The trap is a best-effort optimisation — failing closed (no rewrite) is preferable to throwing. Wrap the IIFE body (or just the defineProperty call) in try { … } catch (e) { /* swallow */ }.

);

response.set_body(rewritten);
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinking — JS-rewrite success path leaves stale Vary: Accept-Encoding

rewrite_javascript_response strips Content-Encoding but does not remove Vary: Accept-Encoding if upstream sent it. Caches keying on it will create unnecessary cache fragmentation. Minor performance concern — flagged for a future tightening, not strictly for this PR.

log::info!("Sourcepoint: proxying {method} {path} → {target_url}");

let backend_name = BackendConfig::from_url(&self.config.cdn_origin, true)
.change_context(Self::error("Failed to configure Sourcepoint backend"))?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinkingBackendConfig::from_url called per-request even though cdn_origin is now static

This was raised in round 2 and acknowledged. Since validate_cdn_origin now pins the host to a single value, the backend name is invariant per integration instance. A OnceLock<String> initialised in register() would eliminate the per-request URL parse + Fastly registration roundtrip without changing the contract. Worth revisiting because the constraint that made it expensive to fix no longer applies after the host-pinning fix.

response.set_header(header::LOCATION, &rewritten);
}
}
self.apply_cache_headers(&mut response);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinkingapply_cache_headers is a no-op for redirects without Set-Cookie

For 3xx responses, apply_cache_headers is called here, but its default-cache branch is gated on is_success() (line 379), which is false for 3xx. So a CDN redirect with no Cache-Control and no Set-Cookie falls through with no policy, and browsers/Fastly apply heuristic caching to a Location-rewritten redirect. Probably benign for Sourcepoint's traffic, but a one-line documenting comment here would prevent the next reader from second-guessing.

header::CACHE_CONTROL,
format!("public, max-age={}", self.config.cache_ttl_seconds),
);
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ refactor — Duplicated Set-Cookie → private, no-store policy between apply_cache_headers and rewrite_javascript_response

Both branches (here and at line 373-376) encode the same policy: Set-Cookie present → private, no-store. A small apply_cookie_safety(&mut response) -> bool helper that returns whether it short-circuited the default would let both call sites share the rule and let any new caller (e.g., redirect handling) get it for free.

} else if trimmed.starts_with("http://") || trimmed.starts_with("https://") {
trimmed.to_string()
} else if trimmed.starts_with(SOURCEPOINT_CDN_HOST) {
format!("https://{trimmed}")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ refactorparse_sourcepoint_url's bare-host branch uses literal starts_with

trimmed.starts_with(SOURCEPOINT_CDN_HOST) would also accept cdn.privacy-mgmt.com.evil.com/foo and prepend https://. The downstream host_str() != SOURCEPOINT_CDN_HOST check at line 282 catches it correctly, so there is no security regression — purely a defense-in-depth nit. Tighter would be: starts_with(SOURCEPOINT_CDN_HOST) AND next byte ∈ {':', '/'} or end-of-string.

}

Ok(())
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌱 seedlingvalidate_auth_cookie_name does not reject reserved RFC 6265 attribute names

A misconfigured auth_cookie_name = "Domain" (or Path, Secure, HttpOnly, SameSite, Max-Age, Expires) is currently accepted but would clash with cookie attribute parsing on the Sourcepoint side. Practically Sourcepoint will not name its authCookie after a reserved attribute, but a case-insensitive blacklist would make the validator more defensive.

/// `Accept-Encoding: identity` optimisation for that request.
fn is_likely_javascript_path(path: &str) -> bool {
path.ends_with(".js") || path.starts_with("/unified/")
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌱 seedlingis_likely_javascript_path does not include .mjs

If Sourcepoint ever ships native ES modules, the Accept-Encoding: identity preflight will be skipped, the response will come back compressed, and the UTF-8 / Content-Encoding handling on the JS-rewrite path will silently fail. Trivial preventive add: path.ends_with(".js") || path.ends_with(".mjs") || path.starts_with("/unified/").

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants