Skip to content

Add Sourcepoint first-party CMP integration#625

Open
ChristianPavilonis wants to merge 12 commits intomainfrom
feature/sourcepoint-integration
Open

Add Sourcepoint first-party CMP integration#625
ChristianPavilonis wants to merge 12 commits intomainfrom
feature/sourcepoint-integration

Conversation

@ChristianPavilonis
Copy link
Copy Markdown
Collaborator

@ChristianPavilonis ChristianPavilonis commented Apr 8, 2026

Summary

  • Adds a new Sourcepoint CMP (Consent Management Platform) first-party proxy integration that routes cdn.privacy-mgmt.com and geo.privacymanager.io traffic through Trusted Server, eliminating third-party requests for consent management assets.
  • Implements four rewriting layers (HTML attributes, JS response bodies, runtime config trap, client-side DOM guard) to ensure comprehensive URL coverage across static markup, dynamically loaded scripts, and Next.js hydration payloads.
  • Ships disabled by default with full TOML configuration, integration documentation, 14 Rust unit tests, and 6 TypeScript tests.

Changes

File Change
crates/trusted-server-core/src/integrations/sourcepoint.rs New ~860-line integration module: route registration, CDN/geo proxy handlers, HTML attribute rewriter, JS body rewriter (rewrite_script_content), head injector (window._sp_ property trap), Accept-Encoding scoping, backend selection, and 14 unit tests
crates/trusted-server-core/src/integrations/mod.rs Register sourcepoint module and wire it into the integration registry
crates/js/lib/src/integrations/sourcepoint/index.ts TypeScript entry point that bootstraps the client-side script guard
crates/js/lib/src/integrations/sourcepoint/script_guard.ts Client-side DOM insertion dispatcher handler that intercepts dynamically inserted <script>/<link> elements pointing at Sourcepoint CDN and rewrites them to first-party paths
crates/js/lib/test/integrations/sourcepoint/script_guard.test.ts 6 Vitest tests covering the script guard (rewrites, skips, and teardown)
docs/guide/integrations/sourcepoint.md Integration documentation: overview, setup steps, config reference, architecture
docs/guide/integrations-overview.md Updated overview with Sourcepoint in the Mermaid flowchart, performance characteristics table, and environment variables section
trusted-server.toml Added [integrations.sourcepoint] config stanza (disabled by default)

Closes

Closes #145
Closes #344
Closes #345

Test plan

  • cargo test --workspace
  • cargo clippy --workspace --all-targets --all-features -- -D warnings
  • cargo fmt --all -- --check
  • JS tests: cd crates/js/lib && npx vitest run
  • JS format: cd crates/js/lib && npm run format
  • Docs format: cd docs && npm run format
  • WASM build: cargo build --package trusted-server-adapter-fastly --release --target wasm32-wasip1
  • Manual testing via fastly compute serve

Checklist

  • Changes follow CLAUDE.md conventions
  • No unwrap() in production code — use expect("should ...")
  • Uses log macros (not println!)
  • New code has tests
  • No secrets or credentials committed

@ChristianPavilonis ChristianPavilonis self-assigned this Apr 8, 2026
Comment thread crates/js/lib/src/integrations/sourcepoint/script_guard.ts Fixed
Comment thread crates/js/lib/src/integrations/sourcepoint/script_guard.ts Fixed
Comment thread crates/js/lib/src/integrations/sourcepoint/script_guard.ts Fixed
@aram356 aram356 requested review from aram356 and prk-Jr April 9, 2026 16:36
Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Adds a well-structured Sourcepoint CMP first-party proxy with four rewriting layers (HTML attributes, JS body, runtime config trap, client-side DOM guard). The implementation follows existing integration patterns closely, has 14 Rust + 6 JS tests, and ships disabled by default. A few items need attention before merge.

Blocking

🔧 wrench

  • take_body_str() panics on non-UTF-8 upstream responses: response.take_body_str() at line 400 will panic if the upstream CDN returns invalid UTF-8 with a JS Content-Type. Read bytes and attempt conversion with a fallback instead. (sourcepoint.rs:400)
  • CI failure — format-typescript: Prettier check is failing on test/integrations/sourcepoint/script_guard.test.ts. Must be fixed before merge.

❓ question

  • Redirect Location headers not rewritten: The JS body rewrite is gated on status == 200. If the CDN returns a 3xx redirect with a Location pointing to cdn.privacy-mgmt.com, the browser follows it back to the third-party CDN, defeating the proxy. (sourcepoint.rs:394)

Non-blocking

🤔 thinking

  • Upstream response headers forwarded verbatim: Set-Cookie and other headers from cdn.privacy-mgmt.com pass through to the client on the non-rewritten path. Same as Didomi, but worth confirming this is intended. (sourcepoint.rs:418)
  • Regex matches mismatched quotes: The CDN URL pattern can match 'url" (single open, double close). Extremely unlikely in minified JS but could produce malformed output. (sourcepoint.rs:65)

♻️ refactor

  • Redundant rewrite_sdk guard: Checked in both handles_attribute() and rewrite() — the second is unreachable. (sourcepoint.rs:437)
  • Head injector config property names: Hardcoded Sourcepoint config keys could go stale. A test asserting known property names appear in the generated script would catch omissions. (sourcepoint.rs:467)

🌱 seedling

  • No body size limit: The body is read entirely into memory with no upper bound. A Content-Length check would prevent OOM from unexpected large responses. (sourcepoint.rs:400)
  • Missing validation test for invalid cdn_origin: No test proves that register() fails when cdn_origin is not a valid URL. (sourcepoint.rs:153)

⛏ nitpick

  • Manual [sourcepoint] log prefix: Other integrations rely on the log crate's module path rather than a manual bracketed prefix. (sourcepoint.rs:355)

CI Status

  • cargo fmt: PASS
  • cargo test: PASS
  • vitest: PASS
  • format-typescript: FAIL (Prettier issue in script_guard.test.ts)
  • format-docs: PASS
  • browser integration tests: PASS
  • integration tests: PASS
  • CodeQL: FAIL (likely infra)

Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Copy link
Copy Markdown
Collaborator

@prk-Jr prk-Jr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed commit a211eb0. Three blocking issues require fixes before merge; the rest are non-blocking.


Blocking (must fix before merge)

B1 — CI: format-typescript still failing on current HEAD

The latest commit (a211eb02) re-introduced a Prettier violation in crates/js/lib/test/integrations/sourcepoint/script_guard.test.ts. CI's format-typescript check will fail on this commit. Fix with:

cd crates/js/lib && npm run format -- --write

then commit the result.

B2 — SSRF via cdn_origin (inline comment on line 96)

See inline comment. A syntactically valid but host-unrestricted cdn_origin allows proxying to any IP including 169.254.169.254 (cloud metadata). Needs a custom host validator.

B3 — PUT/PATCH unreachable (inline comment on line 337)

routes() only registers GET and POST; the PUT/PATCH arms in handle() are dead code. Either register the methods or remove the arms.


Important (non-blocking)

I1 — Single-quoted '/unified/' missed by SP_ORIGIN_UNIFIED_PATTERN (inline on line 81) — fix is a one-character change to the regex character class.

I2 — BackendConfig::from_url per request (inline on line 357) — pre-compute in new() following the pattern in aps.rs / prebid.rs.

I3 — PR checklist says "Uses tracing macros" but codebase uses log crate

The checklist item reads "Uses tracing macros (not println!)" but CLAUDE.md specifies log. The source correctly uses log::info! throughout. The checklist entry should read "Uses log macros (not println!)" to avoid confusion for future contributors.

I4 — JS-rewrite path drops all upstream headers (inline on line 403) — Response::new() discards Vary, CORS, and security headers. Start from response.clone_without_body() instead.

I5 — No test for build_target_url with a path-bearing cdn_origin (inline on line 337) — set_path silently drops any path prefix in cdn_origin. Should be tested and either fixed or documented as an explicit constraint.


Suggestions (non-blocking)

S1 — URL normalisation logic duplicated across Rust and TypeScript

parse_sourcepoint_url in sourcepoint.rs and normalizeSourcepointUrl in script_guard.ts implement identical protocol-relative URL normalisation in parallel. Not a defect, but a cross-reference comment in each file would prevent future fixes being applied to only one side.

S2 — Redundant rewrite_sdk guard in rewrite() (inline on line 437) — unreachable because handles_attribute already gates on it. Either remove with a comment or document the intentional belt-and-suspenders.

S3 — No test for single-quoted CDN URLs (inline on line 81)

S4 — Unchecked WASM build in PR checklist

The test plan has an unchecked - [ ] WASM build item, but CI's WASM build passes. The checkbox should be checked to accurately reflect build status.

S5 — register missing # Examples doc section (inline on line 314) — CLAUDE.md requires this for all public API functions.

Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
ChristianPavilonis and others added 3 commits April 10, 2026 10:48
- Replace #[validate(url)] with a custom validator that restricts
  cdn_origin to *.privacy-mgmt.com hosts, preventing SSRF via
  arbitrary origins (e.g. cloud metadata endpoints).
- Remove unreachable PUT/PATCH arms from the request body match
  since routes() only registers GET and POST.
- Fix Prettier formatting in script_guard.test.ts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace take_body_str() with take_body_bytes() + String::from_utf8()
  to avoid panicking on non-UTF-8 upstream responses.
- Rewrite Location headers on 3xx redirects that point to
  cdn.privacy-mgmt.com so browsers stay on the first-party proxy.
- Preserve upstream CORS headers on the JS-rewrite path instead of
  discarding them when building a fresh Response.
- Extend SP_ORIGIN_UNIFIED_PATTERN to match both single- and
  double-quoted "/unified/" chunk paths, preserving the original
  quote character in the replacement.
- Normalise log prefixes from [sourcepoint] to Sourcepoint: for
  consistency with APS/Prebid style.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove redundant rewrite_sdk check from rewrite() since
  handles_attribute() already gates on it; update test to verify
  the guard at the handles_attribute level.
- Add # Examples section to register() per documentation standards.
- Add tests for cdn_origin validation (rejects non-privacy-mgmt.com
  hosts, accepts valid origins).
- Add test for single-quoted origin+'/unified/' rewrite pattern.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ChristianPavilonis
Copy link
Copy Markdown
Collaborator Author

Review feedback addressed

All blocking and important findings have been fixed across three commits. Replies posted on each inline thread.

Blocking (fixed)

  • SSRF guard — Custom validate_cdn_origin() restricts host to *.privacy-mgmt.com
  • PUT/PATCH dead code — Removed unreachable arms
  • Prettier — Formatting fixed

Important non-blocking (fixed)

  • UTF-8 safetytake_body_bytes() + String::from_utf8() with graceful fallback
  • Redirect rewriting — 3xx Location headers containing CDN host are rewritten
  • CORS headers — JS-rewrite path now preserves upstream CORS headers
  • Single-quote regexSP_ORIGIN_UNIFIED_PATTERN handles both ' and " quotes
  • Log prefix — Normalised to Sourcepoint: style

Suggestions (fixed)

  • Removed redundant rewrite_sdk guard, added invariant comment
  • Added # Examples to register() docs
  • Added validation tests for cdn_origin
  • Added test for single-quoted unified path

Intentionally skipped (with rationale in thread replies)

  • Regex mismatched quotes — documented limitation, no real-world impact
  • BackendConfig::from_url per-request — matches APS/Prebid pattern
  • Body size limit — future enhancement
  • Config property name test — covered by existing head injector test

All CI checks pass locally: cargo fmt, clippy, cargo test, vitest, Prettier.

Refactor normalizeSourcepointUrl to remove the bare-domain startsWith
check that triggered CodeQL "Incomplete URL substring sanitization"
alerts. The host === exact match was already the security boundary;
now the normalization layer no longer references the CDN hostname at
all, eliminating the static analysis finding.

Add a Content-Length guard (5 MB) before reading upstream response
bodies into memory for JavaScript rewriting, preventing unbounded
memory consumption from unexpectedly large responses.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@aram356 aram356 linked an issue Apr 13, 2026 that may be closed by this pull request
Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Well-executed revision — every prior blocker is addressed and the four-layer rewriting architecture (HTML attrs / JS body / window._sp_ trap / client DOM guard) is cleanly separated. Two remaining correctness issues in the upstream response handling need fixes before merge: the body-size guard is bypassed when Content-Length is absent, and the UTF-8 fallback path silently drops upstream headers. A few non-blocking refinements around redirect rewriting, config scope, and test coverage follow.

Blocking

🔧 wrench

  • Body-size guard bypassed on chunked/unknown-length responsesMAX_REWRITE_BODY_SIZE only applies when Content-Length is present; otherwise take_body_bytes() reads unbounded (sourcepoint.rs:456-470)
  • UTF-8 fallback drops upstream headers — the fallback constructs a fresh Response with only status + Content-Type, discarding Set-Cookie, CORS, Cache-Control, Vary, etc., diverging from the passthrough path (sourcepoint.rs:480-488)

Non-blocking

♻️ refactor

  • Redirect Location rewrite is a literal String::replace — misses protocol-relative //cdn.privacy-mgmt.com/... values (sourcepoint.rs:430-438)
  • validate_cdn_origin does not check schemeftp://cdn.privacy-mgmt.com passes validation and only fails at the first proxied request (sourcepoint.rs:135-150)

🤔 thinking

  • Config/rewriter host scope mismatchvalidate_cdn_origin accepts *.privacy-mgmt.com but rewriters hardcode cdn.privacy-mgmt.com; configuring a different subdomain silently half-enables the integration (sourcepoint.rs:143, 69, 197, 598)
  • Rewritten-JS path hard-codes Cache-Control — diverges from the passthrough path's apply_cache_headers policy without explanation (sourcepoint.rs:498-501)
  • Property trap only catches reassignment of window._sp_ — nested mutation (window._sp_.config.X = ...) is not patched; also s.replace() in the JS helper is first-occurrence only (sourcepoint.rs:568-600)

🌱 seedling

  • copy_headers forwards Authorization — credential-leak footgun if publishers reuse bearer tokens across origins (sourcepoint.rs:222-233)
  • No direct test for handle() — redirect rewriting, UTF-8 fallback, Content-Length guard, Content-Type gating all lack unit coverage (sourcepoint.rs:372-522)

👍 praise

  • Solid response to the prior review — every prior 🔧/❓ addressed with well-chosen fixes; four-layer rewriting architecture is cleanly separated and well documented

CI Status

  • cargo fmt: PASS
  • cargo clippy: PASS
  • cargo test: PASS
  • vitest: PASS
  • format-typescript: PASS
  • format-docs: PASS
  • browser integration tests: PASS
  • CodeQL: PASS

Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
…t rewrite

- Skip JS rewrite when Content-Length is absent (chunked/unknown) to
  prevent unbounded memory reads
- Reuse original response on UTF-8 fallback to preserve upstream headers
- Extract redirect Location rewriting into a URL-parsing helper that
  handles absolute, protocol-relative, and relative redirects
- Tighten cdn_origin validation to exactly cdn.privacy-mgmt.com and
  reject non-HTTP(S) schemes
- Drop Authorization from forwarded headers to prevent credential leakage
- Document Cache-Control divergence and property trap limitations
- Add 7 new tests for validation and redirect rewriting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants