draft: D1 strawman — system-reference primitive + fidelity enum (RFC #4616)#4622
draft: D1 strawman — system-reference primitive + fidelity enum (RFC #4616)#4622EvgenyAndroid wants to merge 8 commits into
Conversation
…rawman for adcontextprotocol#4616 NOT A MERGE BID — discussion artifact for the WG to react to on the shape of decision D1 (shared {system, id, version?} primitive vs per-dimension types) from the signal epistemic-model umbrella issue adcontextprotocol#4616. Two files added, no existing schema modified: static/schemas/source/core/system-reference.json The canonical {system, id, version?, name?} shape for any value defined against an external identity / taxonomy / geographic / measurement system. `system` is intentionally an open string at the primitive level — per-use constraints (closed enums, vendor allowlists) belong in the consuming schema's oneOf or enum, not here. `version` is optional but recommended whenever the system has a versioned definition history. `name` is informational only. static/schemas/source/enums/system-reference-fidelity.json exact | approximated | unsupported. Generalizes the market_fidelity enum proposed in adcontextprotocol#4475 to all reference-system axes (markets, ID graphs, taxonomies, measurement currencies). Used as a property on deployments[] entries or analogous per-destination structures. Plus changeset entry following adcontextprotocol#2506 / adcontextprotocol#4271 precedent. Non-normative on its own: neither primitive is referenced by any existing schema in this PR. Adoption happens row-by-row in the follow-up RFCs (adcontextprotocol#4472, adcontextprotocol#4475, identity-substrate) against whatever shape the WG settles on. If the WG counter-proposes, this PR is one file + one enum + this changeset — close and re-draft. No sunk cost. Closes (when decided): the D1 thread of adcontextprotocol#4616. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
I have read the IPR Policy |
| "title": "System Reference Fidelity", | ||
| "description": "How faithfully a deployment can honor a value defined against an external reference system (see /schemas/core/system-reference.json). Surfaces the cross-system fidelity question at the deployment layer — buyer agents read this BEFORE activating a signal so they can make an informed decision about geographic, taxonomic, or identity drift introduced by the destination's mapping. Generalizes the `market_fidelity` mechanism proposed in #4475 to all reference-system axes (markets, ID graphs, taxonomies, measurement currencies). Used as a property on `deployments[]` entries or analogous per-destination structures; not used on the signal definition itself (the signal's reference is canonical; fidelity is a function of where it's activated).", | ||
| "type": "string", | ||
| "enum": [ |
There was a problem hiding this comment.
Should there be a converted (or translated) option i.e. if nielsen has been (or can be) converted exactly into comscore, or UID2 has been translated to ID5, similar to temp from C to F. I'm a bit torn, but I don't think approximated covers this, especially if you can guarantee it or carry high degree of confidence.
There was a problem hiding this comment.
Pushed converted + a new system-reference-conversion.json schema in 5a0ebfe. Factored your two questions ("should there be a converted option?" and "from what to what / how was it done?") as separate concerns:
- Fidelity enum answers "how preserved?":
exact | converted | approximated | unsupported.converted= different system, deterministic & lossless (your C→F analogy). system-reference-conversion.jsonanswers "via what?":{from, to, method, method_details?}withmethodenum.
The conversion structure is REQUIRED when fidelity is converted, OPTIONAL when approximated (so buyers can see WHY a deployment is approximated, not just that it is). Splitting the two means buyers who only need the simple signal stop at the enum; buyers who need to reason about the federation chain read the conversion block.
One question on the method enum — strawman picks four values:
id_graph— translation via identity-graph link (UID2 → ID5 via LiveRamp / TTD / similar)name_match— shared external identifier present in both systems (e.g. hashed email)crosswalk— published 1:1 mapping with no information loss (Nielsen DMA → Comscore Market via the canonical county crosswalk; IAB v2 → v3 via published migration map)custom— vendor-specific mapping
Does that cover the ID-graph cases you have in mind? Anything missing that you'd want represented — e.g. inferred (ML-predicted match), aggregation (geographic upscaling), sampled (panel projection)? Easier to lock the method-enum scope now than after row-level RFCs adopt the primitive.
There was a problem hiding this comment.
Thanks @EvgenyAndroid, AdCP never sleeps ;) To answer your follow up questions:
The upscaled for geographic aggregation is a good one for sure, inferred and projected (for sampled data) as well.
For the ID-graph case I would consider adding one more optional i.e. method_provider? -> {from, to, method, method_provider?, method_details?} to allow to specify the vendor.
There was a problem hiding this comment.
All four accepted, pushed in ac4c618. Method enum extends to 7 values, ordered by semantic grouping:
| Group | Methods |
|---|---|
| Identity-domain | id_graph, name_match |
| Structural-mapping | crosswalk, upscaled (new) |
| Probabilistic | inferred (new), projected (new) |
| Escape hatch | custom |
Semantics for the three new values (in the schema's enumDescriptions):
upscaled— deterministic aggregation of finer-grained units into a coarser system (zip → DMA, county → state, postcode → MSA). Forward-lossless; inverse undefined. Important nuance I added in the description: for modeled audiences whose statistical validity depends on the finer granularity, deploymentfidelitySHOULD beapproximatedeven when the method isupscaled. The method describes the transform; fidelity describes preservation for the use case.inferred— ML-predicted or rule-based inference (device graph → IAB taxonomy node; behavior → demographic membership). Probabilistic, no per-record correctness guarantee.method_providerSHOULD identify the model or vendor.projected— statistical projection from a sample to a population (panel-projected reach, currency-projected impression counts, audience extrapolation). Used primarily for measurement currencies and modeled audiences.method_detailsSHOULD describe sample size, weighting methodology, and confidence intervals where relevant.
method_provider? added as an optional opaque-by-convention string. Consumers MAY branch on well-known names (LiveRamp, ID5, IAB, Nielsen, Comscore, TTD) without it being a closed enum — new providers enter additively, same pattern as the top-level system axis. Most meaningful for id_graph / inferred where the vendor IS the meaningful identifier; less load-bearing for crosswalk where the published mapping is canonical regardless of who hosts it.
Updated examples now cover all 7 method values with realistic method_provider values — see commit ac4c6183 for the diff.
Anything else missing? With these additions the strawman is starting to feel covered for the ID-graph + measurement-currency cases; happy to keep iterating if you can think of method types the seven values still don't capture cleanly.
| "omb_msa" | ||
| ] | ||
| }, | ||
| "id": { |
There was a problem hiding this comment.
I have a feeling that using the word id is confusing, should this be simply value?
There was a problem hiding this comment.
Agreed — pushed the rename in 5a0ebfe. Same reasoning at three levels:
- AdCP's existing
*_idfields all point at AdCP-issued entities (account_id,brand_id,media_buy_id,signal_agent_segment_id) with AdCP lifecycle. Reusingidfor external-system values invites buyer-agent code to conflate the two — same word, materially different semantics. - For taxonomies (IAB Audience Taxonomy paths like
4-1-2-3) and measurement currencies (methodology labels),valuereads more naturally thanid. IAB docs themselves say "values" / "terms." valueis cross-axis-neutral;idis identity-axis-coded and prejudices the primitive's read for non-identity uses.
The (system, value) tuple now replaces (system, id) throughout the schema description, the version description, the name description, the required array, and all examples.
Responding to @lukasz-pubx's review on this PR and his D1 vote in adcontextprotocol#4616: 1. id→value rename — "I have a feeling that using the word `id` is confusing" (PR comment r3253676569 + adcontextprotocol#4616 vote). Agreed. The primitive is intentionally cross-axis (markets, taxonomies, ID graphs, measurement currencies); `id` is identity-axis-coded and creates a connotation collision with existing AdCP `*_id` fields that point at AdCP-issued entities. `value` is system-axis-neutral and matches how taxonomy authors (IAB) refer to taxonomy terms. 2. Add `converted` to the fidelity enum (PR comment r3253674753 + adcontextprotocol#4616 vote). Covers the case where the destination uses a different system but the conversion is deterministic and lossless — Nielsen DMA → Comscore Market via canonical crosswalk, UID2 → ID5 via direct identity-graph link. Materially different from `approximated` (which implies drift) and from `exact` (which requires same system). Lukasz's framing: "similar to temp from C to F." 3. New `system-reference-conversion.json` (response to "I'm not sure if in this case it should be indicated somehow from what to what was the conversion and/or how was it done i.e. ID Graph / by name / custom" in adcontextprotocol#4616). Factors the question correctly: the fidelity enum says "how preserved?"; the conversion structure says "via what?". Buyers who want the simple signal stop at the enum; buyers who want to reason about the federation chain read the conversion block. Method enum: id_graph | name_match | crosswalk | custom. REQUIRED when fidelity is `converted`; OPTIONAL surfacing when `approximated` (so buyers can see WHY). All three changes preserve the strawman's "shape only, no row-level adoption" framing. Still a draft / discussion artifact, still cheap to re-draft if the WG counter-proposes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lukasz-pubx Round 2 of @lukasz-pubx's feedback on adcontextprotocol#4616 / adcontextprotocol#4622 — three new method enum values + an optional method_provider field on the conversion structure. Method enum extends from 4 → 7 values, ordered by semantic grouping: identity-domain: id_graph, name_match structural-mapping: crosswalk, upscaled (new) probabilistic: inferred (new), projected (new) escape hatch: custom New value semantics: upscaled — Deterministic aggregation of finer-grained units into a coarser system (zip→DMA, county→state). Forward- lossless; inverse undefined. For modeled audiences where statistical validity depends on the finer granularity, deployment fidelity SHOULD be `approximated` even when the method is `upscaled`. inferred — ML-predicted or rule-based inference (device-graph → IAB taxonomy node; behavior → demographic membership). Probabilistic; no per-record correctness guarantee. method_provider SHOULD identify the model or vendor. projected — Statistical projection from a sample to a population (panel-projected reach, currency-projected impressions, audience extrapolation). Used primarily for measurement currencies and modeled audiences. method_details SHOULD describe sample size + weighting + confidence intervals where relevant. method_provider (new optional field): Opaque-by-convention string identifying the vendor or organization providing the conversion method (e.g. `LiveRamp`, `ID5`, `IAB`, `Nielsen`). Consumers MAY branch on well-known provider names but the field is not constrained to a closed enum — new providers enter the ecosystem additively. Particularly meaningful for `id_graph` and `inferred` methods; less load-bearing for `crosswalk` where the published mapping is canonical regardless of who hosts it. Updated examples cover all 7 method values with realistic method_provider values. Strawman remains shape-only — no row-level adoption, still draft, still cheap to re-draft. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…umbrella update Round 3 of @bokelley's third-sibling framing on adcontextprotocol#4616 (issuecomment-4470049814): the epistemic-model table maps 1:1 onto delivery measurement, not just signals. The strawman's shape was already cross-axis, but the descriptions buried measurement as a secondary axis. Promoting it to first-class. Changes are description + examples only — no structural change to the primitive, the fidelity enum, or the conversion schema. system-reference.json: - Description rewritten to explicitly enumerate the four primary axes (identity / taxonomy / geographic / measurement). Calls out measurement currencies (Nielsen P18-49) AND methodology sources (panel / set_top_box / ACR / census / server_logs / SDK per the construction-methodology row in adcontextprotocol#4616) as distinct measurement- side use cases. - `system` examples list extended with `nielsen_p_18_49` and `measurement_source` to surface the measurement axis. - Two new top-level examples: a measurement currency (`nielsen_p_18_49:P18-49:2025`) and a measurement-source row (`measurement_source:set_top_box`) — the latter is the canonical adcontextprotocol#2041 use case. system-reference-fidelity.json: - Description expanded to name two consumer types: buyer agents activating signals, and measurement consumers accepting delivery reports. Adds "methodology sources" to the generalized axis list. system-reference-conversion.json: - Description expanded to name the two surfaces (signal deployments + delivery measurement). References adcontextprotocol#3877's `completion_source` qualifier as shipped prior art for the D3 seller-attested vs vendor-attested split. - New example: STB measurement projected to Nielsen P18-49 currency via Samba TV methodology — the measurement-side analog of the existing signal-side panel projection example. Changeset updated to reflect the broadened scope. Strawman remains shape-only — no row-level adoption, no structural change to the primitive. adcontextprotocol#2041 / adcontextprotocol#4472 / adcontextprotocol#4475 / identity-substrate RFCs still adopt independently against whatever shape D1 settles on. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@SimonaNemes endorsed round 2 of the strawman on adcontextprotocol#4616 (issuecomment-4493606229) and offered a sharpening of the inferred / projected distinction. Both methods can use ML; the distinction is the LEVEL at which uncertainty operates: inferred = ENTITY-LEVEL attribution from observed signals. Answers: "given these clues, who/what is this entity?" Uncertainty lies in the correctness of attributes assigned to individual entities. projected = POPULATION-LEVEL estimation from a sample. Answers: "given this sample, what should we expect at scale?" Uncertainty lies in the estimate itself, NOT in any individual entity's attributes. Same underlying data may drive both — the distinction is the question being answered and where the uncertainty lives. Changes: - enumDescriptions for `inferred` + `projected` rewritten with the entity-level vs population-level framing and explicit cross- reference to each other so consumers see the contrast. - New paired example: same device_graph input data routed through `inferred` (entity-level attribution with per-record confidence 0.78) and through `projected` (population-level estimate with 95% CI ±0.4pp). Makes the distinction concrete. - Changeset updated to reflect the round-4 sharpening + cite @SimonaNemes's framing. Strawman remains shape-only — no row-level adoption, no structural change. Sharpening the prose so the distinction survives consumer interpretation across implementer teams. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bokelley posted WG-level acceptance of D1 on adcontextprotocol#4616 (issuecomment-4526559566): "Yes on D1 as a shared primitive for newly-modeled epistemic rows" — with four substantive tightening notes before he's comfortable merging. All four addressed in this revision; the changes are normative tightening, not structural. 1. VERSION SEMANTICS REVERSED (system-reference.json) Strawman previously said "treat omitted version as a wildcard match against any version." Bokelley: this recreates the fuzzy matching problem D1 was supposed to avoid. Reversed. Omitted `version` now means UNKNOWN / unpinned — NOT a wildcard. Exact equality requires (system, value, version) to all match for versioned systems. Row-level schemas MAY declare a system version-insensitive (UID2 / RampID etc.); otherwise omitted version is a buyer-decision point. 2. CONVERTED FIDELITY TIGHTENED (system-reference-fidelity.json) Reserved for deterministic AND row-semantics-preserving mappings. Deterministic ≠ lossless preservation: a crosswalk may be mathematically deterministic but lose semantic information the row's downstream consumers depend on (DMA vs Comscore Market methodologies; IAB v2 nodes that split or merge in v3). Sellers MUST NOT advertise `converted` when the conversion changes the row's meaningful semantics. 3. UPSCALED + CROSSWALK CAUTIONS (system-reference-conversion.json) `upscaled` typically pairs with `approximated` fidelity — undefined inverse means lost granularity. Only `converted` if the row explicitly declares granularity doesn't matter. Same caution for `crosswalk` — deterministic mapping is not automatically lossless semantic preservation. 4. INTEROP CAVEAT (system-reference.json description) Explicit note: the primitive alone does NOT create interop. Consuming row-level schemas MUST constrain or document which `system` values are meaningful for that row. D1's value is consistent SHAPE across rows, not a universal vocabulary. Plus updated the description to scope D1 explicitly to newly-modeled rows (per bokelley's "do not replace per-dimension schemas that already work" guardrail — geo_metros / geo_postal_areas stay where they are) and added @tescoboy's product-level audience-construction- metadata row as the fifth D1 customer per bokelley's endorsement. Strawman remains shape-only — no row-level adoption. After this round bokelley said he's "comfortable using it as the D1 foundation for adcontextprotocol#2041, the identity-substrate RFC, and the adcontextprotocol#4472 split." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round 6 picks up the ads.txt-pattern thread on adcontextprotocol#4616 between @bokelley, @lukasz-pubx, @SimonaNemes, and Addie. Brian's philosophical question: "Should the protocol carry the weight of fully describing signals and methodology OR should this be something that the signal provider handles separately (thinking about how a media kit carries weight on the inventory side)?" Addie's recommendation in the thread: - Geo identifiers + activation fidelity: KEEP structured (bilaterally verifiable; ads.txt pattern). - Methodology fields: COLLAPSE to a pointer (link out to attested document). One field like methodology_url or audience_model_ref. The strawman primitives (system-reference + fidelity enum + conversion structure) already sit in the "keep structured" half of Addie's recommendation — bilaterally verifiable identifiers and activation- time facts. The conversion structure adds the round-6 anchor field for the "link out" half: method_doc_url (optional URI) — points at the seller's published methodology document (vendor's identity-graph page, published crosswalk specification, IAB v2-to-v3 migration map, etc.). Strictly informational; buyer agents MAY follow out-of-band to verify but MUST NOT branch on its content programmatically. Why at the primitive layer: - Gives downstream row-level RFCs (adcontextprotocol#4472, adcontextprotocol#2041, identity-substrate, tescoboy product-level) a canonical place to anchor link-out fields. Without it, each row reinvents the field name. - Picks up Addie's "collapse to a pointer" recommendation as an OPTION at the primitive layer; row-level RFCs that want to require methodology disclosure mark this required in their binding. - Complements existing method_details (free-text inline) — doc_url is the canonical-source pointer; method_details is the triage-time elaboration. Plus a description note that consuming rows MAY add their own row-level last_updated field on the row itself (per @SimonaNemes: signal-record freshness IS verifiable even though underlying methodology freshness isn't). Updated one example (LiveRamp UID2→ID5 conversion) to demonstrate the field in context. Strawman remains shape-only; no row-level adoption. Net delta: +1 optional field, +1 description sentence, +1 example field. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| "method_details": "Canonical county-union mapping per Nielsen DMA 2024 spec + Comscore Market 2025-Q1 definition." | ||
| }, | ||
| { | ||
| "from": { "system": "uid2", "value": "AAAA..." }, |
There was a problem hiding this comment.
this conversion is not per-signal, it's happening upon signal activation. The problem is that there is not necessarily a signal conversion happening - in the programmatic context, LiveRamp has a graph, the publisher may have a graph, the SSP has a graph, the DSP has a graph. Crap, the agency has a graph. We don't have any idea how many times this conversion is happening, and honestly neither do the participants.
Main point is that we need to separate "what is this signal" from "how is this signal being connected to the delivery surface".
There was a problem hiding this comment.
I'm not sure if I understand why does it matter that SSP, DSP and agency have a graph? isn't this information from singals provider's perspective? We can't control all the hops.
There was a problem hiding this comment.
You're right that real chains have multiple parties (LiveRamp graph + publisher graph + SSP graph + DSP graph + agency graph), and no single party sees them all. The strawman as written reads like it's trying to model the full chain — which it can't, and shouldn't pretend to.
Round 7 (ad712795) tightens the description to single-party observable scope explicitly: this structure describes ONE party's observable conversion claim — typically the signals seller's in-agent translation, or the measurement vendor's projection. Downstream conversions are out of scope; the protocol intentionally doesn't pretend the seller can speak to what happens after activation handoff.
Lukasz's framing in this thread is the right one — "we can't control all the hops" is exactly the design constraint. The structure is useful for the conversions one party CAN describe (signal-side identity translation before destination handoff, measurement vendor's panel-to-currency projection). It's not useful for, and shouldn't try to model, the multi-hop chain that follows.
If even single-party conversion descriptions don't earn their keep, then this entire schema collapses to "trust the seller's free-text" and the structure is dead weight. I think the single-party case is genuinely useful (especially for measurement projection and signal-provider in-agent translation), but if you read the multi-hop reality as making even that case suspect, happy to drop the entire conversion structure in round 8 and keep just system-reference + system-reference-fidelity.
| "description": "Reference to a value within a named external system. Four primary axes of use: (1) **identity** — UID2, ID5, RampID, hashed-email, MAID; (2) **taxonomy** — IAB Audience Taxonomy, IAB Content Taxonomy, vendor-specific category trees; (3) **geographic** — Nielsen DMA, Comscore Market, OMB MSA, postcode aggregates (NOTE: where per-dimension geographic schemas already exist in AdCP — `geo_metros`, `geo_postal_areas` — they are the right shape for current geo targeting; D1 does not replace them. D1 applies to geographic surfaces that lack a per-dimension schema today); (4) **measurement** — measurement currencies (Nielsen P18-49), methodology sources (panel / set_top_box / ACR / census / server_logs / SDK per the construction-methodology row in #4616), vendor measurement schemes. Provides a single canonical shape `{system, value, version?, name?}` reused across signals, deployments, buy-terms, delivery reporting, and measurement attribution wherever a value is defined against an external reference frame. The `system` axis is intentionally an open string at the primitive level; per-use constraints (closed enums, vendor allowlists) belong in the consuming schema's `oneOf` or `enum`, not here. **Important: the primitive shape alone does NOT create interoperability — consuming row-level schemas MUST constrain or document which `system` values are meaningful for that row.** D1's value is consistent SHAPE across rows, not a universal cross-row vocabulary. Decision D1 from the signal-and-measurement epistemic-model umbrella (see issue #4616); per @bokelley's WG-acceptance comment, D1 is scoped to newly-modeled rows (identity substrate, taxonomy / ranking references, measurement source / methodology, product-level seller-built audience metadata) and does NOT replace per-dimension schemas that already work.", | ||
| "type": "object", | ||
| "properties": { | ||
| "system": { |
There was a problem hiding this comment.
calling these "systems" is confusing. Ramp ID is clearly an identifier and is defined elsewhere in the protocol. Taxonomies like are too. Not clear to me how this generalized system helps us. We do have enums with the various geo metros providers which makes sense: we know that geo metro must be a supported type there.
There was a problem hiding this comment.
We could rename system to type? But I think system illustrates the fact that it's "more complicated" than just a label.
There was a problem hiding this comment.
Fair pushback. The primitive's value isn't in wrapping RampID — it's in giving union-axis rows a single shape when the row can carry any of N systems.
Round 7 (ad712795) leads the description with that:
The primitive's purpose is union-axis rows (identity substrate, measurement source, PBA taxonomy) where a single field can carry any of N systems with the same comparator semantics. For single-axis rows where only one system applies, inline per-dimension fields remain simpler — the primitive is overhead. The strawman explicitly does NOT replace per-dimension schemas like
geo_metros,geo_postal_areas, or existingramp_id.
Where the primitive earns its keep (and where it doesn't):
| Case | Without primitive | With primitive |
|---|---|---|
Single-axis row: signal always references RampID → ramp_id field, full stop |
inline ramp_id: "..." — simple ✅ |
identity: {system: "ramp_id", value: "..."} — wrapper overhead ❌ |
| Union-axis row: identity substrate can be UID2 OR RampID OR ID5 OR custom | oneOf: [{ uid2: ... }, { ramp_id: ... }, { id5: ... }, { custom: ... }] — N inline shapes per row |
identity: SystemReference — ONE shape ✅ |
| Union-axis row: measurement source can be set_top_box OR ACR OR census OR panel | Same N-way oneOf repeated in #2041 |
Same SystemReference ✅ |
| Union-axis row: PBA ranking reference can be IAB v3 OR vendor-X OR custom audience model | Same N-way oneOf repeated in #4472 |
Same SystemReference ✅ |
The motivating cases that opened #4616 (#4472, #2041, identity-substrate, @tescoboy's product-level) are all union-axis. If the union-axis case turns out to be thinner than the thread assumed — i.e., if each row really wants a single dominant system and the union case is hypothetical — then the primitive's surface area isn't justified and the right answer is per-row inline shapes plus the row defining its own enum. In that case I'd close this PR and let row-RFCs proceed independently.
But I think the union case is real for at least identity-substrate (no consensus on which graph wins) and measurement-source (the construction-methodology row is genuinely heterogeneous). Curious whether you read those rows as union-axis or as eventually-single-axis when the WG picks winners.
On @lukasz-pubx's system → type rename suggestion: round 7 acknowledges it in a naming note but keeps system — type is overloaded across AdCP for general type discrimination on discriminated unions, and system carries the connotation we want (named external reference frame with its own lifecycle).
There was a problem hiding this comment.
Agreed system carries the "more complicated than a label" connotation. Keeping it — round 7 (ad712795) adds a naming note in the description acknowledging the type alternative was considered but type is overloaded across AdCP for general type discrimination on discriminated unions, while system cleanly says "named external reference frame with its own lifecycle." Thanks for the framing.
…views (round 7) @bokelley posted two substantive line-level reviews on 2026-05-25 that went deeper than the round-5 tightening notes — they raised abstraction-level questions about whether the primitive justifies its surface area. 1. system-reference.json:8 — "calling these 'systems' is confusing. RampID is clearly an identifier and is defined elsewhere in the protocol. Taxonomies like are too. Not clear to me how this generalized system helps us." 2. system-reference-conversion.json:78 — "this conversion is not per-signal, it's happening upon signal activation. The problem is that there is not necessarily a signal conversion happening - in the programmatic context, LiveRamp has a graph, the publisher may have a graph, the SSP has a graph, the DSP has a graph. Crap, the agency has a graph. We don't have any idea how many times this conversion is happening, and honestly neither do they." Description-only sharpening (no schema shape changes): system-reference.json — Description rewritten to lead with the UNION-AXIS value proposition. The primitive's purpose is union- axis rows (identity substrate, measurement source, PBA taxonomy) where a single field can carry any of N systems with the same comparator semantics. Per @bokelley's "RampID is defined elsewhere" point: yes, and that's fine — D1 does NOT replace per-dimension schemas where they exist (geo_metros, ramp_id, existing taxonomy schemas). For single-axis rows, inline per- dimension fields are simpler. D1 earns its keep specifically on union-axis rows that don't yet have a shape. Without it, each union-axis row independently reinvents oneOf discriminators across N inline shapes; with it, ONE comparator + ONE extension story + ONE schema slot. Also addresses @lukasz-pubx's system → type rename suggestion in a naming note — keeping `system` because the connotation of "named external reference frame with its own lifecycle" is what we want vs. `type` which is overloaded across AdCP for general type discrimination on discriminated unions. system-reference-conversion.json — Description rewritten to clarify SINGLE-PARTY OBSERVABLE SCOPE. Real programmatic chains have multiple parties (publisher / SSP / DSP / agency / vendor) each potentially performing their own conversions; no single party observes the full chain. The structure describes ONE party's observable conversion (signals seller's in-agent translation in the deployment case; measurement vendor's projection in the reporting case), NOT the multi-hop chain. Downstream conversions are out of scope, observed by other parties; the protocol intentionally doesn't pretend the seller can speak to them. Buyer agents reading this should understand they're seeing one party's view, not the full provenance from origin to activation. Strawman remains shape-only — no row-level adoption, no field additions / removals / renames. The schema is unchanged; descriptions catch up to what the structure actually means. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This is a draft / discussion artifact, NOT a merge bid. Filed proactively per the process proposed in #4616 (comment-4468168781) so the WG has something concrete to react to on the shape of decision D1 (shared primitive vs per-dimension types).
Closing this and re-drafting is the expected path if the WG counter-proposes a fundamentally different shape. No sunk cost — three files + a changeset.
What's in the PR
Three reusable schema primitives, no row-level adoption:
1.
static/schemas/source/core/system-reference.jsonThe canonical
{system, value, version?, name?}shape for any value defined against an external identity / taxonomy / geographic / measurement system. Used wherever a row-level RFC needs to reference a Nielsen DMA market, an IAB Audience Taxonomy node, a UID2 identity, etc.Design choices spelled out for review:
value, notid. The primitive is cross-axis: identity systems issue IDs, taxonomies issue values/terms, measurement systems issue methodology labels.idis identity-axis-coded and collides with AdCP's existing*_idfields that point at AdCP-issued entities with AdCP lifecycle.valueis the cross-vocabulary least common denominator.systemis an open string at the primitive level. Per-use constraints (closed enums, vendor allowlists) belong in the consuming schema'soneOforenum, not here. Rationale: a closed enum at the primitive level forces every new system addition through a primitive bump; per-use constraints let row-RFCs add systems independently. Recommended convention: kebab_case or snake_case stable identifiers.version?is RECOMMENDED, not REQUIRED. Some systems (UID2, RampID) are version-less; others (Nielsen DMA, IAB Audience Taxonomy) have boundary/semantic drift between versions. Omittingversionimplies "latest stable as known to the emitting party." Buyer-side comparators SHOULD treat omittedversionas a wildcard against any version. Required wherever the system has version history is the kind of constraint that belongs in the row-RFC, not the primitive.name?is informational only. Never used for equality or routing. Strictly for UI display. The canonical reference is(system, value, version?).additionalProperties: false— primitives are tight; extensions belong on the consuming schema, not on the reference itself.2.
static/schemas/source/enums/system-reference-fidelity.jsonexact | converted | approximated | unsupported— generalizes themarket_fidelitymechanism originally proposed in #4475 (now closed; mechanism preserved here as a reusable primitive for the remaining customer rows) to all reference-system axes. Used as a property ondeployments[]entries or analogous per-destination structures.exact— destination uses the samesystem(andversionwhen both are present); reference frame preserved byte-for-byte.converted— destination uses a DIFFERENTsystembut the seller asserts the conversion is deterministic and lossless. Examples: Nielsen DMA → Comscore Market via canonical county crosswalk; UID2 → ID5 via direct identity-graph link; IAB v2 → v3 via published migration map. Functionally equivalent toexactfor buyers who only need preservation; structurally distinct because the system axis changed. Buyers reasoning about the federation chain read the deployment'sconversionblock (file 3 below).approximated— destination maps to a differentsystemor differentversion; close but not byte-equivalent. Buyer-decision point. For modeled audiences whose statistical validity depends on the reference frame,approximatedactivation may invalidate the model.unsupported— destination cannot resolve thesystem. Activation will fail or silently degrade. Buyer MUST NOT proceed without explicit opt-in to seller's advertised fallback.Fidelity is per-deployment, not per-signal — the signal's reference is canonical; fidelity is a function of where it's activated.
3.
static/schemas/source/core/system-reference-conversion.json{from, to, method, method_details?}describing how a deployment converts between systems. REQUIRED when fidelity isconverted; OPTIONAL surfacing whenapproximated(lets buyers see WHY a deployment is approximated, not just that it is).from— signal's original reference system (a fullsystem-reference)to— system the destination natively uses (a fullsystem-reference)methodenum —id_graph | name_match | crosswalk | upscaled | inferred | projected | custom(semantics in the schema'senumDescriptions)method_provider?— optional opaque-by-convention vendor identity (LiveRamp,ID5,IAB,Nielsen, etc.). Consumers MAY branch on well-known providers; not a closed enummethod_details?— free-text vendor-specific elaboration; strictly informational; buyer agents MUST NOT branch on this fieldFactors the question correctly: the enum says "how preserved?"; the conversion structure says "via what?". Buyers who want the simple signal stop at the enum; buyers who want to reason about the federation chain (e.g. for ID-graph translation correctness) read the conversion block.
What's NOT in this PR (deliberately)
system-referencefortaxonomyorranking_reference. Measurement source attribution on delivery reports #2041 doesn't adopt it forsource_type. Identity-substrate RFC isn't filed yet. Each row adopts independently in its own RFC against whatever shape D1 settles on. (RFC: Structured geographic market identifiers for signals with market-bounded audiences #4475, the originally cited market-identifier customer, was closed on 5/17 in favor of extending existinggeo_metro/geo_postal_areas— see customer-set update above.)nielsen_dma | comscore_market | …. That's a per-row decision (some rows want closed enums for safety; others want open strings for extensibility).attestations: [...]as a sibling field later without breaking, but doesn't ship it.Open questions for WG review
{system, value, version?, name?}the right field set? Alternatives raised in adjacent work include{kind, identifier, vocabulary?}and{type, code, scheme?}. Field names matter for downstream codegen.systemas open string vs closed enum at the primitive level. Open string is the strawman's choice (extensibility at row-RFC level); a closed enum at the primitive level is the more restrictive alternative. Trade-off explained above.methodenum scope. Round 2 extended the enum from 4 → 7 values per @lukasz-pubx feedback. Current set:id_graph | name_match | crosswalk | upscaled | inferred | projected | custom. Anything still missing for ID-graph / measurement / probabilistic cases?system-reference-fidelityis the strawman's choice.system-fidelity,reference-fidelity, or domain-prefixed names (market-fidelity,taxonomy-fidelityper-dimension) are alternatives. Strawman picks the canonical one. (Originally mirrored RFC: Structured geographic market identifiers for signals with market-bounded audiences #4475'smarket_fidelitynaming; that RFC is now closed but the mechanism generalizes cleanly to the remaining customer rows.)Test plan
Cross-references
Signals row-level RFCs (D1 customers, pending):
system-referencefortaxonomy/ranking_referencein its own PR once D1 lands)system-referenceforidentity+ use the fidelity enum + conversion structure for ID-graph translation)Closed / handled by existing per-dimension schemas (no longer a D1 customer):
geo_metroalready covers DMA/MSA;geo_postal_areascovers zip aggregates. Per-dimension geo schemas already exist, so D1 doesn't apply to this row.Measurement row-level RFC (D1 customer, pending — added per @bokelley's third-sibling framing 5/17):
system-referenceforsourceinstead of a flatsource_typeenum; methodology blend becomes aconversionblock)Measurement prior art (already shipped, informs D3 direction):
completion_source: seller_attested | vendor_attested— pattern for "who attested?" as a structured field; precedent for the D3 attestation hook)delivery_measurement.vendorsasBrandRef[]— publishing/authorization-half infrastructure on the measurement side)Adjacent (not blocked here):
Precedents (PR shape):
I have read the IPR Policy