Add Sourcepoint first-party CMP integration#625
Add Sourcepoint first-party CMP integration#625ChristianPavilonis wants to merge 12 commits intomainfrom
Conversation
… through first-party proxy
… URLs to first-party paths
…visibility, improve docs
aram356
left a comment
There was a problem hiding this comment.
Summary
Adds a well-structured Sourcepoint CMP first-party proxy with four rewriting layers (HTML attributes, JS body, runtime config trap, client-side DOM guard). The implementation follows existing integration patterns closely, has 14 Rust + 6 JS tests, and ships disabled by default. A few items need attention before merge.
Blocking
🔧 wrench
take_body_str()panics on non-UTF-8 upstream responses:response.take_body_str()at line 400 will panic if the upstream CDN returns invalid UTF-8 with a JS Content-Type. Read bytes and attempt conversion with a fallback instead. (sourcepoint.rs:400)- CI failure —
format-typescript: Prettier check is failing ontest/integrations/sourcepoint/script_guard.test.ts. Must be fixed before merge.
❓ question
- Redirect
Locationheaders not rewritten: The JS body rewrite is gated onstatus == 200. If the CDN returns a 3xx redirect with aLocationpointing tocdn.privacy-mgmt.com, the browser follows it back to the third-party CDN, defeating the proxy. (sourcepoint.rs:394)
Non-blocking
🤔 thinking
- Upstream response headers forwarded verbatim:
Set-Cookieand other headers fromcdn.privacy-mgmt.compass through to the client on the non-rewritten path. Same as Didomi, but worth confirming this is intended. (sourcepoint.rs:418) - Regex matches mismatched quotes: The CDN URL pattern can match
'url"(single open, double close). Extremely unlikely in minified JS but could produce malformed output. (sourcepoint.rs:65)
♻️ refactor
- Redundant
rewrite_sdkguard: Checked in bothhandles_attribute()andrewrite()— the second is unreachable. (sourcepoint.rs:437) - Head injector config property names: Hardcoded Sourcepoint config keys could go stale. A test asserting known property names appear in the generated script would catch omissions. (
sourcepoint.rs:467)
🌱 seedling
- No body size limit: The body is read entirely into memory with no upper bound. A
Content-Lengthcheck would prevent OOM from unexpected large responses. (sourcepoint.rs:400) - Missing validation test for invalid
cdn_origin: No test proves thatregister()fails whencdn_originis not a valid URL. (sourcepoint.rs:153)
⛏ nitpick
- Manual
[sourcepoint]log prefix: Other integrations rely on thelogcrate's module path rather than a manual bracketed prefix. (sourcepoint.rs:355)
CI Status
- cargo fmt: PASS
- cargo test: PASS
- vitest: PASS
- format-typescript: FAIL (Prettier issue in
script_guard.test.ts) - format-docs: PASS
- browser integration tests: PASS
- integration tests: PASS
- CodeQL: FAIL (likely infra)
prk-Jr
left a comment
There was a problem hiding this comment.
Reviewed commit a211eb0. Three blocking issues require fixes before merge; the rest are non-blocking.
Blocking (must fix before merge)
B1 — CI: format-typescript still failing on current HEAD
The latest commit (a211eb02) re-introduced a Prettier violation in crates/js/lib/test/integrations/sourcepoint/script_guard.test.ts. CI's format-typescript check will fail on this commit. Fix with:
cd crates/js/lib && npm run format -- --write
then commit the result.
B2 — SSRF via cdn_origin (inline comment on line 96)
See inline comment. A syntactically valid but host-unrestricted cdn_origin allows proxying to any IP including 169.254.169.254 (cloud metadata). Needs a custom host validator.
B3 — PUT/PATCH unreachable (inline comment on line 337)
routes() only registers GET and POST; the PUT/PATCH arms in handle() are dead code. Either register the methods or remove the arms.
Important (non-blocking)
I1 — Single-quoted '/unified/' missed by SP_ORIGIN_UNIFIED_PATTERN (inline on line 81) — fix is a one-character change to the regex character class.
I2 — BackendConfig::from_url per request (inline on line 357) — pre-compute in new() following the pattern in aps.rs / prebid.rs.
I3 — PR checklist says "Uses tracing macros" but codebase uses log crate
The checklist item reads "Uses tracing macros (not println!)" but CLAUDE.md specifies log. The source correctly uses log::info! throughout. The checklist entry should read "Uses log macros (not println!)" to avoid confusion for future contributors.
I4 — JS-rewrite path drops all upstream headers (inline on line 403) — Response::new() discards Vary, CORS, and security headers. Start from response.clone_without_body() instead.
I5 — No test for build_target_url with a path-bearing cdn_origin (inline on line 337) — set_path silently drops any path prefix in cdn_origin. Should be tested and either fixed or documented as an explicit constraint.
Suggestions (non-blocking)
S1 — URL normalisation logic duplicated across Rust and TypeScript
parse_sourcepoint_url in sourcepoint.rs and normalizeSourcepointUrl in script_guard.ts implement identical protocol-relative URL normalisation in parallel. Not a defect, but a cross-reference comment in each file would prevent future fixes being applied to only one side.
S2 — Redundant rewrite_sdk guard in rewrite() (inline on line 437) — unreachable because handles_attribute already gates on it. Either remove with a comment or document the intentional belt-and-suspenders.
S3 — No test for single-quoted CDN URLs (inline on line 81)
S4 — Unchecked WASM build in PR checklist
The test plan has an unchecked - [ ] WASM build item, but CI's WASM build passes. The checkbox should be checked to accurately reflect build status.
S5 — register missing # Examples doc section (inline on line 314) — CLAUDE.md requires this for all public API functions.
- Replace #[validate(url)] with a custom validator that restricts cdn_origin to *.privacy-mgmt.com hosts, preventing SSRF via arbitrary origins (e.g. cloud metadata endpoints). - Remove unreachable PUT/PATCH arms from the request body match since routes() only registers GET and POST. - Fix Prettier formatting in script_guard.test.ts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace take_body_str() with take_body_bytes() + String::from_utf8() to avoid panicking on non-UTF-8 upstream responses. - Rewrite Location headers on 3xx redirects that point to cdn.privacy-mgmt.com so browsers stay on the first-party proxy. - Preserve upstream CORS headers on the JS-rewrite path instead of discarding them when building a fresh Response. - Extend SP_ORIGIN_UNIFIED_PATTERN to match both single- and double-quoted "/unified/" chunk paths, preserving the original quote character in the replacement. - Normalise log prefixes from [sourcepoint] to Sourcepoint: for consistency with APS/Prebid style. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove redundant rewrite_sdk check from rewrite() since handles_attribute() already gates on it; update test to verify the guard at the handles_attribute level. - Add # Examples section to register() per documentation standards. - Add tests for cdn_origin validation (rejects non-privacy-mgmt.com hosts, accepts valid origins). - Add test for single-quoted origin+'/unified/' rewrite pattern. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Review feedback addressedAll blocking and important findings have been fixed across three commits. Replies posted on each inline thread. Blocking (fixed)
Important non-blocking (fixed)
Suggestions (fixed)
Intentionally skipped (with rationale in thread replies)
All CI checks pass locally: |
Refactor normalizeSourcepointUrl to remove the bare-domain startsWith check that triggered CodeQL "Incomplete URL substring sanitization" alerts. The host === exact match was already the security boundary; now the normalization layer no longer references the CDN hostname at all, eliminating the static analysis finding. Add a Content-Length guard (5 MB) before reading upstream response bodies into memory for JavaScript rewriting, preventing unbounded memory consumption from unexpectedly large responses. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
aram356
left a comment
There was a problem hiding this comment.
Summary
Well-executed revision — every prior blocker is addressed and the four-layer rewriting architecture (HTML attrs / JS body / window._sp_ trap / client DOM guard) is cleanly separated. Two remaining correctness issues in the upstream response handling need fixes before merge: the body-size guard is bypassed when Content-Length is absent, and the UTF-8 fallback path silently drops upstream headers. A few non-blocking refinements around redirect rewriting, config scope, and test coverage follow.
Blocking
🔧 wrench
- Body-size guard bypassed on chunked/unknown-length responses —
MAX_REWRITE_BODY_SIZEonly applies whenContent-Lengthis present; otherwisetake_body_bytes()reads unbounded (sourcepoint.rs:456-470) - UTF-8 fallback drops upstream headers — the fallback constructs a fresh
Responsewith only status + Content-Type, discarding Set-Cookie, CORS, Cache-Control, Vary, etc., diverging from the passthrough path (sourcepoint.rs:480-488)
Non-blocking
♻️ refactor
- Redirect
Locationrewrite is a literalString::replace— misses protocol-relative//cdn.privacy-mgmt.com/...values (sourcepoint.rs:430-438) validate_cdn_origindoes not check scheme —ftp://cdn.privacy-mgmt.compasses validation and only fails at the first proxied request (sourcepoint.rs:135-150)
🤔 thinking
- Config/rewriter host scope mismatch —
validate_cdn_originaccepts*.privacy-mgmt.combut rewriters hardcodecdn.privacy-mgmt.com; configuring a different subdomain silently half-enables the integration (sourcepoint.rs:143, 69, 197, 598) - Rewritten-JS path hard-codes
Cache-Control— diverges from the passthrough path'sapply_cache_headerspolicy without explanation (sourcepoint.rs:498-501) - Property trap only catches reassignment of
window._sp_— nested mutation (window._sp_.config.X = ...) is not patched; alsos.replace()in the JS helper is first-occurrence only (sourcepoint.rs:568-600)
🌱 seedling
copy_headersforwardsAuthorization— credential-leak footgun if publishers reuse bearer tokens across origins (sourcepoint.rs:222-233)- No direct test for
handle()— redirect rewriting, UTF-8 fallback, Content-Length guard, Content-Type gating all lack unit coverage (sourcepoint.rs:372-522)
👍 praise
- Solid response to the prior review — every prior 🔧/❓ addressed with well-chosen fixes; four-layer rewriting architecture is cleanly separated and well documented
CI Status
- cargo fmt: PASS
- cargo clippy: PASS
- cargo test: PASS
- vitest: PASS
- format-typescript: PASS
- format-docs: PASS
- browser integration tests: PASS
- CodeQL: PASS
…t rewrite - Skip JS rewrite when Content-Length is absent (chunked/unknown) to prevent unbounded memory reads - Reuse original response on UTF-8 fallback to preserve upstream headers - Extract redirect Location rewriting into a URL-parsing helper that handles absolute, protocol-relative, and relative redirects - Tighten cdn_origin validation to exactly cdn.privacy-mgmt.com and reject non-HTTP(S) schemes - Drop Authorization from forwarded headers to prevent credential leakage - Document Cache-Control divergence and property trap limitations - Add 7 new tests for validation and redirect rewriting
Summary
cdn.privacy-mgmt.comandgeo.privacymanager.iotraffic through Trusted Server, eliminating third-party requests for consent management assets.Changes
crates/trusted-server-core/src/integrations/sourcepoint.rsrewrite_script_content), head injector (window._sp_property trap), Accept-Encoding scoping, backend selection, and 14 unit testscrates/trusted-server-core/src/integrations/mod.rssourcepointmodule and wire it into the integration registrycrates/js/lib/src/integrations/sourcepoint/index.tscrates/js/lib/src/integrations/sourcepoint/script_guard.ts<script>/<link>elements pointing at Sourcepoint CDN and rewrites them to first-party pathscrates/js/lib/test/integrations/sourcepoint/script_guard.test.tsdocs/guide/integrations/sourcepoint.mddocs/guide/integrations-overview.mdtrusted-server.toml[integrations.sourcepoint]config stanza (disabled by default)Closes
Closes #145
Closes #344
Closes #345
Test plan
cargo test --workspacecargo clippy --workspace --all-targets --all-features -- -D warningscargo fmt --all -- --checkcd crates/js/lib && npx vitest runcd crates/js/lib && npm run formatcd docs && npm run formatcargo build --package trusted-server-adapter-fastly --release --target wasm32-wasip1fastly compute serveChecklist
unwrap()in production code — useexpect("should ...")logmacros (notprintln!)