From 1c86df20333c73f9e28b61f3ad2d534336e0cfcd Mon Sep 17 00:00:00 2001 From: evgen Date: Sat, 16 May 2026 17:27:37 -0400 Subject: [PATCH 1/8] =?UTF-8?q?draft(schema):=20add=20system-reference=20p?= =?UTF-8?q?rimitive=20+=20fidelity=20enum=20=E2=80=94=20D1=20strawman=20fo?= =?UTF-8?q?r=20#4616?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit NOT A MERGE BID — discussion artifact for the WG to react to on the shape of decision D1 (shared {system, id, version?} primitive vs per-dimension types) from the signal epistemic-model umbrella issue #4616. Two files added, no existing schema modified: static/schemas/source/core/system-reference.json The canonical {system, id, version?, name?} shape for any value defined against an external identity / taxonomy / geographic / measurement system. `system` is intentionally an open string at the primitive level — per-use constraints (closed enums, vendor allowlists) belong in the consuming schema's oneOf or enum, not here. `version` is optional but recommended whenever the system has a versioned definition history. `name` is informational only. static/schemas/source/enums/system-reference-fidelity.json exact | approximated | unsupported. Generalizes the market_fidelity enum proposed in #4475 to all reference-system axes (markets, ID graphs, taxonomies, measurement currencies). Used as a property on deployments[] entries or analogous per-destination structures. Plus changeset entry following #2506 / #4271 precedent. Non-normative on its own: neither primitive is referenced by any existing schema in this PR. Adoption happens row-by-row in the follow-up RFCs (#4472, #4475, identity-substrate) against whatever shape the WG settles on. If the WG counter-proposes, this PR is one file + one enum + this changeset — close and re-draft. No sunk cost. Closes (when decided): the D1 thread of #4616. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../4616-strawman-system-reference-d1.md | 14 ++++ .../schemas/source/core/system-reference.json | 72 +++++++++++++++++++ .../enums/system-reference-fidelity.json | 17 +++++ 3 files changed, 103 insertions(+) create mode 100644 .changeset/4616-strawman-system-reference-d1.md create mode 100644 static/schemas/source/core/system-reference.json create mode 100644 static/schemas/source/enums/system-reference-fidelity.json diff --git a/.changeset/4616-strawman-system-reference-d1.md b/.changeset/4616-strawman-system-reference-d1.md new file mode 100644 index 0000000000..6a24885e6c --- /dev/null +++ b/.changeset/4616-strawman-system-reference-d1.md @@ -0,0 +1,14 @@ +--- +"adcontextprotocol": minor +--- + +Add `system-reference` primitive + `system-reference-fidelity` enum — strawman for decision D1 of the signal epistemic-model umbrella (#4616). + +Adds two reusable schema primitives that future row-level RFCs (#4472 PBA audience_model, #4475 structured market identifiers, @lszczesiak's pending ID-graph RFC) can adopt without re-litigating the shape per-RFC: + +- `core/system-reference.json` — the canonical `{system, id, version?, name?}` shape for a value defined against an external identity, taxonomy, geographic, or measurement system. `system` is intentionally an open string at the primitive level; per-use constraints live in consuming schemas. +- `enums/system-reference-fidelity.json` — `exact | approximated | unsupported` for deployment-side fidelity. Generalizes #4475's `market_fidelity` mechanism to all reference-system axes. + +**Non-normative on its own.** Neither primitive is referenced by any existing schema in this PR. Adoption happens row-by-row in the follow-up RFCs (#4472 / #4475 / identity-substrate) against whatever D1 shape the WG settles on. If the WG counter-proposes a different shape, this PR is one file + one enum + this changeset — close and re-draft. No sunk cost. + +Discussion: see https://github.com/adcontextprotocol/adcp/issues/4616 diff --git a/static/schemas/source/core/system-reference.json b/static/schemas/source/core/system-reference.json new file mode 100644 index 0000000000..9b0dd0816e --- /dev/null +++ b/static/schemas/source/core/system-reference.json @@ -0,0 +1,72 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema#", + "$id": "/schemas/core/system-reference.json", + "title": "System Reference", + "description": "Reference to a value within a named identity, taxonomy, geographic, or measurement system — e.g., a Nielsen DMA market, an IAB Audience Taxonomy node, a UID2 identity, a Comscore audience. Provides a single canonical shape `{system, id, version?, name?}` reused across signals, deployments, buy-terms, and delivery reporting wherever a value is defined against an external reference frame. The `system` axis is intentionally an open string at the primitive level; per-use constraints (closed enums, vendor allowlists) belong in the consuming schema's `oneOf` or `enum`, not here. Decision D1 from the signal epistemic-model umbrella (see issue #4616).", + "type": "object", + "properties": { + "system": { + "type": "string", + "description": "Identifier for the reference system. Open string — consumers SHOULD use kebab_case or snake_case stable identifiers (e.g. `nielsen_dma`, `iab_audience`, `uid2`, `ramp_id`, `comscore_market`, `omb_msa`). Custom or vendor-specific values are permitted but interoperate only with parties that recognize the same `system` string.", + "minLength": 1, + "examples": [ + "nielsen_dma", + "iab_audience", + "uid2", + "ramp_id", + "comscore_market", + "omb_msa" + ] + }, + "id": { + "type": "string", + "description": "Identifier within the system. Format is system-dependent — Nielsen DMA codes are numeric strings (e.g. `501`), IAB Audience Taxonomy nodes are dotted paths (e.g. `4-1-2-3`), UID2 values are opaque tokens. The pair `(system, id)` is the durable wire reference; consumers MUST compare on both axes.", + "minLength": 1 + }, + "version": { + "type": "string", + "description": "Optional version of the reference system this `id` is defined against. RECOMMENDED whenever the system has a versioned definition history where boundaries, IDs, or semantics drift between versions (e.g. `nielsen_dma` boundaries differ between 2020 and 2024 releases; IAB Audience Taxonomy v2 ≠ v3). Omitting `version` implies the latest stable release as known to the emitting party. Buyer-side comparators SHOULD treat omitted `version` as a wildcard match against any version.", + "minLength": 1, + "examples": [ + "2024", + "v3", + "2.1.0" + ] + }, + "name": { + "type": "string", + "description": "Optional human-readable label for UI display (e.g. `\"New York\"` for `nielsen_dma:501`). Strictly informational — never used for equality or routing; the canonical reference is `(system, id, version?)`.", + "minLength": 1 + } + }, + "required": ["system", "id"], + "additionalProperties": false, + "examples": [ + { + "system": "nielsen_dma", + "id": "501", + "name": "New York" + }, + { + "system": "nielsen_dma", + "id": "501", + "version": "2024", + "name": "New York" + }, + { + "system": "iab_audience", + "id": "4-1-2-3", + "version": "v3", + "name": "Affluent Households" + }, + { + "system": "uid2", + "id": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=" + }, + { + "system": "comscore_market", + "id": "NYC-100", + "version": "2025-Q1" + } + ] +} diff --git a/static/schemas/source/enums/system-reference-fidelity.json b/static/schemas/source/enums/system-reference-fidelity.json new file mode 100644 index 0000000000..b6c2a1c299 --- /dev/null +++ b/static/schemas/source/enums/system-reference-fidelity.json @@ -0,0 +1,17 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema#", + "$id": "/schemas/enums/system-reference-fidelity.json", + "title": "System Reference Fidelity", + "description": "How faithfully a deployment can honor a value defined against an external reference system (see /schemas/core/system-reference.json). Surfaces the cross-system fidelity question at the deployment layer — buyer agents read this BEFORE activating a signal so they can make an informed decision about geographic, taxonomic, or identity drift introduced by the destination's mapping. Generalizes the `market_fidelity` mechanism proposed in #4475 to all reference-system axes (markets, ID graphs, taxonomies, measurement currencies). Used as a property on `deployments[]` entries or analogous per-destination structures; not used on the signal definition itself (the signal's reference is canonical; fidelity is a function of where it's activated).", + "type": "string", + "enum": [ + "exact", + "approximated", + "unsupported" + ], + "enumDescriptions": { + "exact": "Destination uses the same `system` (and `version` when both are present) — activation preserves the reference frame byte-for-byte.", + "approximated": "Destination maps to a different `system` or a different `version` of the same system; the resulting activation is geographically, taxonomically, or identity-wise close but NOT byte-equivalent. Buyer agents SHOULD treat `approximated` as a buyer-decision point — for modeled audiences whose statistical validity depends on the reference frame, `approximated` activation may invalidate the model.", + "unsupported": "Destination cannot resolve the `system` at all. Activation against this destination will fail or silently degrade. Buyer agents MUST NOT proceed without an explicit opt-in to whatever fallback behavior the seller advertises." + } +} From 5a0ebfe758d8bd60f6a7a2c89f4aa7e256b67fe5 Mon Sep 17 00:00:00 2001 From: evgen Date: Sat, 16 May 2026 19:06:18 -0400 Subject: [PATCH 2/8] =?UTF-8?q?review:=20rename=20id=E2=86=92value,=20add?= =?UTF-8?q?=20`converted`=20fidelity=20+=20conversion=20structure?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Responding to @lukasz-pubx's review on this PR and his D1 vote in #4616: 1. id→value rename — "I have a feeling that using the word `id` is confusing" (PR comment r3253676569 + #4616 vote). Agreed. The primitive is intentionally cross-axis (markets, taxonomies, ID graphs, measurement currencies); `id` is identity-axis-coded and creates a connotation collision with existing AdCP `*_id` fields that point at AdCP-issued entities. `value` is system-axis-neutral and matches how taxonomy authors (IAB) refer to taxonomy terms. 2. Add `converted` to the fidelity enum (PR comment r3253674753 + #4616 vote). Covers the case where the destination uses a different system but the conversion is deterministic and lossless — Nielsen DMA → Comscore Market via canonical crosswalk, UID2 → ID5 via direct identity-graph link. Materially different from `approximated` (which implies drift) and from `exact` (which requires same system). Lukasz's framing: "similar to temp from C to F." 3. New `system-reference-conversion.json` (response to "I'm not sure if in this case it should be indicated somehow from what to what was the conversion and/or how was it done i.e. ID Graph / by name / custom" in #4616). Factors the question correctly: the fidelity enum says "how preserved?"; the conversion structure says "via what?". Buyers who want the simple signal stop at the enum; buyers who want to reason about the federation chain read the conversion block. Method enum: id_graph | name_match | crosswalk | custom. REQUIRED when fidelity is `converted`; OPTIONAL surfacing when `approximated` (so buyers can see WHY). All three changes preserve the strawman's "shape only, no row-level adoption" framing. Still a draft / discussion artifact, still cheap to re-draft if the WG counter-proposes. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../4616-strawman-system-reference-d1.md | 11 ++-- .../core/system-reference-conversion.json | 65 +++++++++++++++++++ .../schemas/source/core/system-reference.json | 22 +++---- .../enums/system-reference-fidelity.json | 4 +- 4 files changed, 85 insertions(+), 17 deletions(-) create mode 100644 static/schemas/source/core/system-reference-conversion.json diff --git a/.changeset/4616-strawman-system-reference-d1.md b/.changeset/4616-strawman-system-reference-d1.md index 6a24885e6c..c6e366c3be 100644 --- a/.changeset/4616-strawman-system-reference-d1.md +++ b/.changeset/4616-strawman-system-reference-d1.md @@ -2,13 +2,14 @@ "adcontextprotocol": minor --- -Add `system-reference` primitive + `system-reference-fidelity` enum — strawman for decision D1 of the signal epistemic-model umbrella (#4616). +Add `system-reference` primitive + `system-reference-fidelity` enum + `system-reference-conversion` schema — strawman for decision D1 of the signal epistemic-model umbrella (#4616). -Adds two reusable schema primitives that future row-level RFCs (#4472 PBA audience_model, #4475 structured market identifiers, @lszczesiak's pending ID-graph RFC) can adopt without re-litigating the shape per-RFC: +Adds three reusable schema primitives that future row-level RFCs (#4472 PBA audience_model, #4475 structured market identifiers, @lszczesiak's pending ID-graph RFC) can adopt without re-litigating the shape per-RFC: -- `core/system-reference.json` — the canonical `{system, id, version?, name?}` shape for a value defined against an external identity, taxonomy, geographic, or measurement system. `system` is intentionally an open string at the primitive level; per-use constraints live in consuming schemas. -- `enums/system-reference-fidelity.json` — `exact | approximated | unsupported` for deployment-side fidelity. Generalizes #4475's `market_fidelity` mechanism to all reference-system axes. +- `core/system-reference.json` — the canonical `{system, value, version?, name?}` shape for a value defined against an external identity, taxonomy, geographic, or measurement system. `system` is intentionally an open string at the primitive level; per-use constraints live in consuming schemas. Field named `value` (not `id`) because the primitive is cross-axis: identity systems issue IDs, taxonomies issue values/terms, measurement systems issue methodology labels. +- `enums/system-reference-fidelity.json` — `exact | converted | approximated | unsupported` for deployment-side fidelity. `converted` covers the case where the destination uses a different system but the conversion is deterministic and lossless (e.g. Nielsen DMA → Comscore Market via crosswalk, UID2 → ID5 via identity graph). Generalizes #4475's `market_fidelity` mechanism to all reference-system axes. +- `core/system-reference-conversion.json` — `{from, to, method, method_details?}` structure describing how a deployment converts between systems. REQUIRED when fidelity is `converted`, OPTIONAL when `approximated`. `method` enum covers `id_graph | name_match | crosswalk | custom`. -**Non-normative on its own.** Neither primitive is referenced by any existing schema in this PR. Adoption happens row-by-row in the follow-up RFCs (#4472 / #4475 / identity-substrate) against whatever D1 shape the WG settles on. If the WG counter-proposes a different shape, this PR is one file + one enum + this changeset — close and re-draft. No sunk cost. +**Non-normative on its own.** None of the primitives is referenced by any existing schema in this PR. Adoption happens row-by-row in the follow-up RFCs (#4472 / #4475 / identity-substrate) against whatever D1 shape the WG settles on. If the WG counter-proposes a different shape, this PR is three files + one changeset — close and re-draft. No sunk cost. Discussion: see https://github.com/adcontextprotocol/adcp/issues/4616 diff --git a/static/schemas/source/core/system-reference-conversion.json b/static/schemas/source/core/system-reference-conversion.json new file mode 100644 index 0000000000..e6116b2d77 --- /dev/null +++ b/static/schemas/source/core/system-reference-conversion.json @@ -0,0 +1,65 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema#", + "$id": "/schemas/core/system-reference-conversion.json", + "title": "System Reference Conversion", + "description": "Describes the conversion from a signal's reference system to the destination's native system. REQUIRED when the deployment's fidelity is `converted`; OPTIONAL surfacing when fidelity is `approximated` (lets buyers see WHY a deployment is approximated, not just that it is). Lets buyer agents reason about source, destination, and conversion method without parsing free-text descriptions. The conversion claim itself (`from` system → `to` system, by `method`) is self-reported by the seller; third-party verification of the claim is a future attestation concern (decision D3 in #4616).", + "type": "object", + "properties": { + "from": { + "$ref": "/schemas/core/system-reference.json", + "description": "The signal's original reference system as defined by the seller. Identifies what the seller advertises in the signal definition before conversion." + }, + "to": { + "$ref": "/schemas/core/system-reference.json", + "description": "The system the destination natively uses. After conversion, the activated value lives in this system." + }, + "method": { + "type": "string", + "enum": [ + "id_graph", + "name_match", + "crosswalk", + "custom" + ], + "enumDescriptions": { + "id_graph": "Translation via an identity-graph link (e.g. UID2 → ID5 via LiveRamp, TTD, or similar graph provider). Lossless when the graph link is direct; buyer should inspect graph coverage for edge cases (universe shrinkage, link freshness, regional gaps).", + "name_match": "Match based on a shared external identifier present in both systems (e.g. hashed email common to both sides). Match rate is finite — buyer-decision point on accepted match-rate floor.", + "crosswalk": "Published 1:1 mapping with no information loss (e.g. Nielsen DMA → Comscore Market via the canonical county crosswalk; IAB Audience Taxonomy v2 → v3 via the published migration mapping). When `method` is `crosswalk`, the seller asserts the conversion is mathematically reversible.", + "custom": "Vendor-specific mapping; buyer should consult the destination's documentation for the conversion's semantics, coverage, and edge cases. Buyer agents SHOULD treat `custom` as the weakest preservation claim and inspect `method_details`." + } + }, + "method_details": { + "type": "string", + "description": "Free-text vendor-specific elaboration on the method (e.g. graph provider name, crosswalk version, match-rate floor). Strictly informational; buyer agents MUST NOT branch on this field. Useful for human triage and audit log surfacing.", + "maxLength": 1024 + } + }, + "required": ["from", "to", "method"], + "additionalProperties": false, + "examples": [ + { + "from": { "system": "nielsen_dma", "value": "501", "version": "2024", "name": "New York" }, + "to": { "system": "comscore_market", "value": "NYC-100", "version": "2025-Q1" }, + "method": "crosswalk", + "method_details": "Canonical county-union mapping per Nielsen DMA 2024 spec + Comscore Market 2025-Q1 definition." + }, + { + "from": { "system": "uid2", "value": "AAAA..." }, + "to": { "system": "ramp_id", "value": "RA-9876" }, + "method": "id_graph", + "method_details": "LiveRamp graph v2026.05. Coverage ~94% in US, ~71% in EU." + }, + { + "from": { "system": "iab_audience", "value": "4-1-2-3", "version": "v2" }, + "to": { "system": "iab_audience", "value": "4-1-2-3", "version": "v3" }, + "method": "crosswalk", + "method_details": "IAB published v2→v3 migration map. Direct 1:1 for this node; some sibling nodes are merged in v3." + }, + { + "from": { "system": "publisher_panel", "value": "panel_A_18-49" }, + "to": { "system": "nielsen_p_18_49", "value": "P18-49" }, + "method": "custom", + "method_details": "Publisher's panel reweighted to Nielsen P18-49 distribution. Match-rate floor 80%; see publisher methodology docs." + } + ] +} diff --git a/static/schemas/source/core/system-reference.json b/static/schemas/source/core/system-reference.json index 9b0dd0816e..53f9ac64ae 100644 --- a/static/schemas/source/core/system-reference.json +++ b/static/schemas/source/core/system-reference.json @@ -2,7 +2,7 @@ "$schema": "http://json-schema.org/draft-07/schema#", "$id": "/schemas/core/system-reference.json", "title": "System Reference", - "description": "Reference to a value within a named identity, taxonomy, geographic, or measurement system — e.g., a Nielsen DMA market, an IAB Audience Taxonomy node, a UID2 identity, a Comscore audience. Provides a single canonical shape `{system, id, version?, name?}` reused across signals, deployments, buy-terms, and delivery reporting wherever a value is defined against an external reference frame. The `system` axis is intentionally an open string at the primitive level; per-use constraints (closed enums, vendor allowlists) belong in the consuming schema's `oneOf` or `enum`, not here. Decision D1 from the signal epistemic-model umbrella (see issue #4616).", + "description": "Reference to a value within a named identity, taxonomy, geographic, or measurement system — e.g., a Nielsen DMA market, an IAB Audience Taxonomy node, a UID2 identity, a Comscore audience. Provides a single canonical shape `{system, value, version?, name?}` reused across signals, deployments, buy-terms, and delivery reporting wherever a value is defined against an external reference frame. The `system` axis is intentionally an open string at the primitive level; per-use constraints (closed enums, vendor allowlists) belong in the consuming schema's `oneOf` or `enum`, not here. Decision D1 from the signal epistemic-model umbrella (see issue #4616).", "type": "object", "properties": { "system": { @@ -18,14 +18,14 @@ "omb_msa" ] }, - "id": { + "value": { "type": "string", - "description": "Identifier within the system. Format is system-dependent — Nielsen DMA codes are numeric strings (e.g. `501`), IAB Audience Taxonomy nodes are dotted paths (e.g. `4-1-2-3`), UID2 values are opaque tokens. The pair `(system, id)` is the durable wire reference; consumers MUST compare on both axes.", + "description": "The value within the system that this reference points to. Format is system-dependent — Nielsen DMA codes are numeric strings (e.g. `501`), IAB Audience Taxonomy nodes are dotted paths (e.g. `4-1-2-3`), UID2 values are opaque tokens, measurement currencies use methodology labels. Named `value` rather than `id` because the primitive is cross-axis: identity systems issue IDs, taxonomies issue values/terms, measurement systems issue methodology labels. The pair `(system, value)` is the durable wire reference; consumers MUST compare on both axes.", "minLength": 1 }, "version": { "type": "string", - "description": "Optional version of the reference system this `id` is defined against. RECOMMENDED whenever the system has a versioned definition history where boundaries, IDs, or semantics drift between versions (e.g. `nielsen_dma` boundaries differ between 2020 and 2024 releases; IAB Audience Taxonomy v2 ≠ v3). Omitting `version` implies the latest stable release as known to the emitting party. Buyer-side comparators SHOULD treat omitted `version` as a wildcard match against any version.", + "description": "Optional version of the reference system this `value` is defined against. RECOMMENDED whenever the system has a versioned definition history where boundaries, values, or semantics drift between versions (e.g. `nielsen_dma` boundaries differ between 2020 and 2024 releases; IAB Audience Taxonomy v2 ≠ v3). Omitting `version` implies the latest stable release as known to the emitting party. Buyer-side comparators SHOULD treat omitted `version` as a wildcard match against any version.", "minLength": 1, "examples": [ "2024", @@ -35,37 +35,37 @@ }, "name": { "type": "string", - "description": "Optional human-readable label for UI display (e.g. `\"New York\"` for `nielsen_dma:501`). Strictly informational — never used for equality or routing; the canonical reference is `(system, id, version?)`.", + "description": "Optional human-readable label for UI display (e.g. `\"New York\"` for `nielsen_dma:501`). Strictly informational — never used for equality or routing; the canonical reference is `(system, value, version?)`.", "minLength": 1 } }, - "required": ["system", "id"], + "required": ["system", "value"], "additionalProperties": false, "examples": [ { "system": "nielsen_dma", - "id": "501", + "value": "501", "name": "New York" }, { "system": "nielsen_dma", - "id": "501", + "value": "501", "version": "2024", "name": "New York" }, { "system": "iab_audience", - "id": "4-1-2-3", + "value": "4-1-2-3", "version": "v3", "name": "Affluent Households" }, { "system": "uid2", - "id": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=" + "value": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=" }, { "system": "comscore_market", - "id": "NYC-100", + "value": "NYC-100", "version": "2025-Q1" } ] diff --git a/static/schemas/source/enums/system-reference-fidelity.json b/static/schemas/source/enums/system-reference-fidelity.json index b6c2a1c299..f1d5ce3781 100644 --- a/static/schemas/source/enums/system-reference-fidelity.json +++ b/static/schemas/source/enums/system-reference-fidelity.json @@ -2,15 +2,17 @@ "$schema": "http://json-schema.org/draft-07/schema#", "$id": "/schemas/enums/system-reference-fidelity.json", "title": "System Reference Fidelity", - "description": "How faithfully a deployment can honor a value defined against an external reference system (see /schemas/core/system-reference.json). Surfaces the cross-system fidelity question at the deployment layer — buyer agents read this BEFORE activating a signal so they can make an informed decision about geographic, taxonomic, or identity drift introduced by the destination's mapping. Generalizes the `market_fidelity` mechanism proposed in #4475 to all reference-system axes (markets, ID graphs, taxonomies, measurement currencies). Used as a property on `deployments[]` entries or analogous per-destination structures; not used on the signal definition itself (the signal's reference is canonical; fidelity is a function of where it's activated).", + "description": "How faithfully a deployment can honor a value defined against an external reference system (see /schemas/core/system-reference.json). Surfaces the cross-system fidelity question at the deployment layer — buyer agents read this BEFORE activating a signal so they can make an informed decision about geographic, taxonomic, or identity drift introduced by the destination's mapping. Generalizes the `market_fidelity` mechanism proposed in #4475 to all reference-system axes (markets, ID graphs, taxonomies, measurement currencies). Used as a property on `deployments[]` entries or analogous per-destination structures; not used on the signal definition itself (the signal's reference is canonical; fidelity is a function of where it's activated). When fidelity is `converted` or `approximated`, the deployment SHOULD also surface a structured conversion descriptor (see /schemas/core/system-reference-conversion.json) so buyer agents can reason about the source, destination, and method without parsing free text.", "type": "string", "enum": [ "exact", + "converted", "approximated", "unsupported" ], "enumDescriptions": { "exact": "Destination uses the same `system` (and `version` when both are present) — activation preserves the reference frame byte-for-byte.", + "converted": "Destination uses a different `system` from the signal but the seller asserts the conversion is deterministic and lossless — e.g. Nielsen DMA → Comscore Market via the canonical county crosswalk, or UID2 → ID5 via a direct identity-graph link. Functionally equivalent to `exact` for buyers who only need preservation; structurally distinct because the system axis changed. Buyer agents that need to inspect the conversion chain (source / destination / method) read the deployment's `conversion` descriptor.", "approximated": "Destination maps to a different `system` or a different `version` of the same system; the resulting activation is geographically, taxonomically, or identity-wise close but NOT byte-equivalent. Buyer agents SHOULD treat `approximated` as a buyer-decision point — for modeled audiences whose statistical validity depends on the reference frame, `approximated` activation may invalidate the model.", "unsupported": "Destination cannot resolve the `system` at all. Activation against this destination will fail or silently degrade. Buyer agents MUST NOT proceed without an explicit opt-in to whatever fallback behavior the seller advertises." } From ac4c6183ad8aec559d2b22a6fa7146444e28a4ce Mon Sep 17 00:00:00 2001 From: evgen Date: Sat, 16 May 2026 19:25:27 -0400 Subject: [PATCH 3/8] review: add upscaled/inferred/projected methods + method_provider per @lukasz-pubx MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Round 2 of @lukasz-pubx's feedback on #4616 / #4622 — three new method enum values + an optional method_provider field on the conversion structure. Method enum extends from 4 → 7 values, ordered by semantic grouping: identity-domain: id_graph, name_match structural-mapping: crosswalk, upscaled (new) probabilistic: inferred (new), projected (new) escape hatch: custom New value semantics: upscaled — Deterministic aggregation of finer-grained units into a coarser system (zip→DMA, county→state). Forward- lossless; inverse undefined. For modeled audiences where statistical validity depends on the finer granularity, deployment fidelity SHOULD be `approximated` even when the method is `upscaled`. inferred — ML-predicted or rule-based inference (device-graph → IAB taxonomy node; behavior → demographic membership). Probabilistic; no per-record correctness guarantee. method_provider SHOULD identify the model or vendor. projected — Statistical projection from a sample to a population (panel-projected reach, currency-projected impressions, audience extrapolation). Used primarily for measurement currencies and modeled audiences. method_details SHOULD describe sample size + weighting + confidence intervals where relevant. method_provider (new optional field): Opaque-by-convention string identifying the vendor or organization providing the conversion method (e.g. `LiveRamp`, `ID5`, `IAB`, `Nielsen`). Consumers MAY branch on well-known provider names but the field is not constrained to a closed enum — new providers enter the ecosystem additively. Particularly meaningful for `id_graph` and `inferred` methods; less load-bearing for `crosswalk` where the published mapping is canonical regardless of who hosts it. Updated examples cover all 7 method values with realistic method_provider values. Strawman remains shape-only — no row-level adoption, still draft, still cheap to re-draft. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../4616-strawman-system-reference-d1.md | 2 +- .../core/system-reference-conversion.json | 47 +++++++++++++++++-- 2 files changed, 43 insertions(+), 6 deletions(-) diff --git a/.changeset/4616-strawman-system-reference-d1.md b/.changeset/4616-strawman-system-reference-d1.md index c6e366c3be..5ad1329f30 100644 --- a/.changeset/4616-strawman-system-reference-d1.md +++ b/.changeset/4616-strawman-system-reference-d1.md @@ -8,7 +8,7 @@ Adds three reusable schema primitives that future row-level RFCs (#4472 PBA audi - `core/system-reference.json` — the canonical `{system, value, version?, name?}` shape for a value defined against an external identity, taxonomy, geographic, or measurement system. `system` is intentionally an open string at the primitive level; per-use constraints live in consuming schemas. Field named `value` (not `id`) because the primitive is cross-axis: identity systems issue IDs, taxonomies issue values/terms, measurement systems issue methodology labels. - `enums/system-reference-fidelity.json` — `exact | converted | approximated | unsupported` for deployment-side fidelity. `converted` covers the case where the destination uses a different system but the conversion is deterministic and lossless (e.g. Nielsen DMA → Comscore Market via crosswalk, UID2 → ID5 via identity graph). Generalizes #4475's `market_fidelity` mechanism to all reference-system axes. -- `core/system-reference-conversion.json` — `{from, to, method, method_details?}` structure describing how a deployment converts between systems. REQUIRED when fidelity is `converted`, OPTIONAL when `approximated`. `method` enum covers `id_graph | name_match | crosswalk | custom`. +- `core/system-reference-conversion.json` — `{from, to, method, method_provider?, method_details?}` structure describing how a deployment converts between systems. REQUIRED when fidelity is `converted`, OPTIONAL when `approximated`. `method` enum covers `id_graph | name_match | crosswalk | upscaled | inferred | projected | custom`. `method_provider?` surfaces vendor identity (e.g. `LiveRamp`, `ID5`, `IAB`) as an opaque-by-convention string so buyer agents can branch on well-known providers without parsing free-text. **Non-normative on its own.** None of the primitives is referenced by any existing schema in this PR. Adoption happens row-by-row in the follow-up RFCs (#4472 / #4475 / identity-substrate) against whatever D1 shape the WG settles on. If the WG counter-proposes a different shape, this PR is three files + one changeset — close and re-draft. No sunk cost. diff --git a/static/schemas/source/core/system-reference-conversion.json b/static/schemas/source/core/system-reference-conversion.json index e6116b2d77..cd5436ec56 100644 --- a/static/schemas/source/core/system-reference-conversion.json +++ b/static/schemas/source/core/system-reference-conversion.json @@ -19,18 +19,38 @@ "id_graph", "name_match", "crosswalk", + "upscaled", + "inferred", + "projected", "custom" ], "enumDescriptions": { - "id_graph": "Translation via an identity-graph link (e.g. UID2 → ID5 via LiveRamp, TTD, or similar graph provider). Lossless when the graph link is direct; buyer should inspect graph coverage for edge cases (universe shrinkage, link freshness, regional gaps).", + "id_graph": "Translation via an identity-graph link (e.g. UID2 → ID5 via LiveRamp, TTD, or similar graph provider). Lossless when the graph link is direct; buyer should inspect graph coverage for edge cases (universe shrinkage, link freshness, regional gaps). Vendor SHOULD be surfaced in `method_provider`.", "name_match": "Match based on a shared external identifier present in both systems (e.g. hashed email common to both sides). Match rate is finite — buyer-decision point on accepted match-rate floor.", "crosswalk": "Published 1:1 mapping with no information loss (e.g. Nielsen DMA → Comscore Market via the canonical county crosswalk; IAB Audience Taxonomy v2 → v3 via the published migration mapping). When `method` is `crosswalk`, the seller asserts the conversion is mathematically reversible.", - "custom": "Vendor-specific mapping; buyer should consult the destination's documentation for the conversion's semantics, coverage, and edge cases. Buyer agents SHOULD treat `custom` as the weakest preservation claim and inspect `method_details`." + "upscaled": "Aggregation of finer-grained units into a coarser system (e.g. zip codes → DMA, counties → state, postcodes → MSA). Deterministic and forward-lossless — every fine-grained unit is fully accounted for in the coarser unit. Inverse is undefined: the upscaled value cannot recover the original granularity. For modeled audiences where statistical validity depends on the finer granularity, the deployment's `fidelity` SHOULD be `approximated` even when the method is `upscaled`.", + "inferred": "ML-predicted or rule-based inference (e.g. predicting a buyer's IAB taxonomy node from browsing behavior; predicting demographic membership from device-graph signals). Probabilistic — no deterministic guarantee that any individual value is correctly mapped. Buyer-decision point on accepted confidence floor; `method_provider` SHOULD identify the model or vendor.", + "projected": "Statistical projection from a sample to a population (e.g. panel-projected reach, currency-projected impression counts, audience extrapolation from a measured sub-sample). Used primarily for measurement currencies and modeled audiences. Buyer should inspect `method_details` for sample size, weighting methodology, and confidence intervals where relevant.", + "custom": "Vendor-specific mapping not covered by the named methods. Buyer should consult the destination's documentation for the conversion's semantics, coverage, and edge cases. Buyer agents SHOULD treat `custom` as the weakest preservation claim and inspect `method_provider` + `method_details`." } }, + "method_provider": { + "type": "string", + "description": "Optional vendor or organization providing the conversion method. Opaque string by convention — consumers MAY branch on well-known provider names (e.g. `LiveRamp`, `ID5`, `IAB`, `Nielsen`, `Comscore`) but the field is not constrained to a closed enum, since new providers enter the ecosystem additively. Useful for ID-graph and inference methods where the vendor IS the meaningful identifier; less useful for crosswalk methods where the published mapping is canonical regardless of who hosts it.", + "minLength": 1, + "maxLength": 128, + "examples": [ + "LiveRamp", + "ID5", + "IAB", + "Nielsen", + "Comscore", + "TTD" + ] + }, "method_details": { "type": "string", - "description": "Free-text vendor-specific elaboration on the method (e.g. graph provider name, crosswalk version, match-rate floor). Strictly informational; buyer agents MUST NOT branch on this field. Useful for human triage and audit log surfacing.", + "description": "Free-text vendor-specific elaboration on the method (e.g. graph version, crosswalk publication date, match-rate floor, model name, sample size). Strictly informational; buyer agents MUST NOT branch on this field. Useful for human triage and audit log surfacing.", "maxLength": 1024 } }, @@ -41,25 +61,42 @@ "from": { "system": "nielsen_dma", "value": "501", "version": "2024", "name": "New York" }, "to": { "system": "comscore_market", "value": "NYC-100", "version": "2025-Q1" }, "method": "crosswalk", + "method_provider": "IAB", "method_details": "Canonical county-union mapping per Nielsen DMA 2024 spec + Comscore Market 2025-Q1 definition." }, { "from": { "system": "uid2", "value": "AAAA..." }, "to": { "system": "ramp_id", "value": "RA-9876" }, "method": "id_graph", + "method_provider": "LiveRamp", "method_details": "LiveRamp graph v2026.05. Coverage ~94% in US, ~71% in EU." }, { "from": { "system": "iab_audience", "value": "4-1-2-3", "version": "v2" }, "to": { "system": "iab_audience", "value": "4-1-2-3", "version": "v3" }, "method": "crosswalk", + "method_provider": "IAB", "method_details": "IAB published v2→v3 migration map. Direct 1:1 for this node; some sibling nodes are merged in v3." }, + { + "from": { "system": "us_zip", "value": "10001" }, + "to": { "system": "nielsen_dma", "value": "501", "version": "2024", "name": "New York" }, + "method": "upscaled", + "method_details": "Zip 10001 lies entirely within DMA 501. Aggregation is deterministic forward; inverse is undefined." + }, + { + "from": { "system": "device_graph", "value": "device_abc123" }, + "to": { "system": "iab_audience", "value": "4-1-2-3", "version": "v3" }, + "method": "inferred", + "method_provider": "Lotame", + "method_details": "ML model v2026.04. Predicted membership confidence 0.78." + }, { "from": { "system": "publisher_panel", "value": "panel_A_18-49" }, "to": { "system": "nielsen_p_18_49", "value": "P18-49" }, - "method": "custom", - "method_details": "Publisher's panel reweighted to Nielsen P18-49 distribution. Match-rate floor 80%; see publisher methodology docs." + "method": "projected", + "method_provider": "Nielsen", + "method_details": "Publisher's panel reweighted to Nielsen P18-49 distribution via post-stratification. Sample n=4,800, design effect 1.3." } ] } From 9716bb15ed918fb3d49c439dc95499eb725fc36c Mon Sep 17 00:00:00 2001 From: evgen Date: Sun, 17 May 2026 07:08:16 -0400 Subject: [PATCH 4/8] =?UTF-8?q?review:=20promote=20measurement=20methodolo?= =?UTF-8?q?gy=20=E2=80=94=20third=20sibling=20per=20@bokelley=20umbrella?= =?UTF-8?q?=20update?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Round 3 of @bokelley's third-sibling framing on #4616 (issuecomment-4470049814): the epistemic-model table maps 1:1 onto delivery measurement, not just signals. The strawman's shape was already cross-axis, but the descriptions buried measurement as a secondary axis. Promoting it to first-class. Changes are description + examples only — no structural change to the primitive, the fidelity enum, or the conversion schema. system-reference.json: - Description rewritten to explicitly enumerate the four primary axes (identity / taxonomy / geographic / measurement). Calls out measurement currencies (Nielsen P18-49) AND methodology sources (panel / set_top_box / ACR / census / server_logs / SDK per the construction-methodology row in #4616) as distinct measurement- side use cases. - `system` examples list extended with `nielsen_p_18_49` and `measurement_source` to surface the measurement axis. - Two new top-level examples: a measurement currency (`nielsen_p_18_49:P18-49:2025`) and a measurement-source row (`measurement_source:set_top_box`) — the latter is the canonical #2041 use case. system-reference-fidelity.json: - Description expanded to name two consumer types: buyer agents activating signals, and measurement consumers accepting delivery reports. Adds "methodology sources" to the generalized axis list. system-reference-conversion.json: - Description expanded to name the two surfaces (signal deployments + delivery measurement). References #3877's `completion_source` qualifier as shipped prior art for the D3 seller-attested vs vendor-attested split. - New example: STB measurement projected to Nielsen P18-49 currency via Samba TV methodology — the measurement-side analog of the existing signal-side panel projection example. Changeset updated to reflect the broadened scope. Strawman remains shape-only — no row-level adoption, no structural change to the primitive. #2041 / #4472 / #4475 / identity-substrate RFCs still adopt independently against whatever shape D1 settles on. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../4616-strawman-system-reference-d1.md | 4 ++-- .../core/system-reference-conversion.json | 9 ++++++++- .../schemas/source/core/system-reference.json | 19 ++++++++++++++++--- .../enums/system-reference-fidelity.json | 2 +- 4 files changed, 27 insertions(+), 7 deletions(-) diff --git a/.changeset/4616-strawman-system-reference-d1.md b/.changeset/4616-strawman-system-reference-d1.md index 5ad1329f30..cd253205d5 100644 --- a/.changeset/4616-strawman-system-reference-d1.md +++ b/.changeset/4616-strawman-system-reference-d1.md @@ -2,9 +2,9 @@ "adcontextprotocol": minor --- -Add `system-reference` primitive + `system-reference-fidelity` enum + `system-reference-conversion` schema — strawman for decision D1 of the signal epistemic-model umbrella (#4616). +Add `system-reference` primitive + `system-reference-fidelity` enum + `system-reference-conversion` schema — strawman for decision D1 of the signal-AND-measurement epistemic-model umbrella (#4616). -Adds three reusable schema primitives that future row-level RFCs (#4472 PBA audience_model, #4475 structured market identifiers, @lszczesiak's pending ID-graph RFC) can adopt without re-litigating the shape per-RFC: +Adds three reusable schema primitives that future row-level RFCs can adopt without re-litigating the shape per-RFC. After @bokelley's 2026-05-17 third-sibling framing (#4616 issuecomment-4470049814), D1 covers BOTH the signals track (#4472 PBA audience_model, #4475 structured market identifiers, @lszczesiak's pending ID-graph RFC) AND the measurement track (#2041 measurement source attribution, with #3885 / #3652 / #3877 as already-shipped publishing/authorization-half prior art). - `core/system-reference.json` — the canonical `{system, value, version?, name?}` shape for a value defined against an external identity, taxonomy, geographic, or measurement system. `system` is intentionally an open string at the primitive level; per-use constraints live in consuming schemas. Field named `value` (not `id`) because the primitive is cross-axis: identity systems issue IDs, taxonomies issue values/terms, measurement systems issue methodology labels. - `enums/system-reference-fidelity.json` — `exact | converted | approximated | unsupported` for deployment-side fidelity. `converted` covers the case where the destination uses a different system but the conversion is deterministic and lossless (e.g. Nielsen DMA → Comscore Market via crosswalk, UID2 → ID5 via identity graph). Generalizes #4475's `market_fidelity` mechanism to all reference-system axes. diff --git a/static/schemas/source/core/system-reference-conversion.json b/static/schemas/source/core/system-reference-conversion.json index cd5436ec56..78de87c996 100644 --- a/static/schemas/source/core/system-reference-conversion.json +++ b/static/schemas/source/core/system-reference-conversion.json @@ -2,7 +2,7 @@ "$schema": "http://json-schema.org/draft-07/schema#", "$id": "/schemas/core/system-reference-conversion.json", "title": "System Reference Conversion", - "description": "Describes the conversion from a signal's reference system to the destination's native system. REQUIRED when the deployment's fidelity is `converted`; OPTIONAL surfacing when fidelity is `approximated` (lets buyers see WHY a deployment is approximated, not just that it is). Lets buyer agents reason about source, destination, and conversion method without parsing free-text descriptions. The conversion claim itself (`from` system → `to` system, by `method`) is self-reported by the seller; third-party verification of the claim is a future attestation concern (decision D3 in #4616).", + "description": "Describes the conversion from a source reference system to the destination's native system. Used by two surfaces: (1) signal deployments — when a destination remaps a signal's market / identity / taxonomy / measurement reference; (2) delivery measurement — when a measurement vendor projects, infers, or upscales from one methodology source to another (e.g. set-top-box panel → P18-49 currency). REQUIRED when the surrounding fidelity is `converted`; OPTIONAL surfacing when fidelity is `approximated` (lets consumers see WHY a deployment or report is approximated, not just that it is). Lets buyer agents and measurement consumers reason about source, destination, and conversion method without parsing free-text. The conversion claim itself (`from` system → `to` system, by `method`) is self-reported by the seller / measurement vendor; third-party verification of the claim is a future attestation concern (decision D3 in #4616; see #3877 `completion_source` for shipped prior art on the seller-attested vs vendor-attested split).", "type": "object", "properties": { "from": { @@ -97,6 +97,13 @@ "method": "projected", "method_provider": "Nielsen", "method_details": "Publisher's panel reweighted to Nielsen P18-49 distribution via post-stratification. Sample n=4,800, design effect 1.3." + }, + { + "from": { "system": "measurement_source", "value": "set_top_box", "name": "Samba STB universe" }, + "to": { "system": "nielsen_p_18_49", "value": "P18-49", "version": "2025" }, + "method": "projected", + "method_provider": "Samba TV", + "method_details": "STB tuning data projected to Nielsen P18-49 currency via Nielsen-licensed reweighting; methodology blend with census-level demos. Measurement-side analog of the signal-side panel projection — #2041 row use case." } ] } diff --git a/static/schemas/source/core/system-reference.json b/static/schemas/source/core/system-reference.json index 53f9ac64ae..8e2d16db40 100644 --- a/static/schemas/source/core/system-reference.json +++ b/static/schemas/source/core/system-reference.json @@ -2,12 +2,12 @@ "$schema": "http://json-schema.org/draft-07/schema#", "$id": "/schemas/core/system-reference.json", "title": "System Reference", - "description": "Reference to a value within a named identity, taxonomy, geographic, or measurement system — e.g., a Nielsen DMA market, an IAB Audience Taxonomy node, a UID2 identity, a Comscore audience. Provides a single canonical shape `{system, value, version?, name?}` reused across signals, deployments, buy-terms, and delivery reporting wherever a value is defined against an external reference frame. The `system` axis is intentionally an open string at the primitive level; per-use constraints (closed enums, vendor allowlists) belong in the consuming schema's `oneOf` or `enum`, not here. Decision D1 from the signal epistemic-model umbrella (see issue #4616).", + "description": "Reference to a value within a named external system. Four primary axes of use: (1) **identity** — UID2, ID5, RampID, hashed-email, MAID; (2) **taxonomy** — IAB Audience Taxonomy, IAB Content Taxonomy, vendor-specific category trees; (3) **geographic** — Nielsen DMA, Comscore Market, OMB MSA, postcode aggregates; (4) **measurement** — measurement currencies (Nielsen P18-49), methodology sources (panel / set_top_box / ACR / census / server_logs / SDK per the construction-methodology row in #4616), vendor measurement schemes. Provides a single canonical shape `{system, value, version?, name?}` reused across signals, deployments, buy-terms, delivery reporting, and measurement attribution wherever a value is defined against an external reference frame. The `system` axis is intentionally an open string at the primitive level; per-use constraints (closed enums, vendor allowlists) belong in the consuming schema's `oneOf` or `enum`, not here. Decision D1 from the signal-and-measurement epistemic-model umbrella (see issue #4616).", "type": "object", "properties": { "system": { "type": "string", - "description": "Identifier for the reference system. Open string — consumers SHOULD use kebab_case or snake_case stable identifiers (e.g. `nielsen_dma`, `iab_audience`, `uid2`, `ramp_id`, `comscore_market`, `omb_msa`). Custom or vendor-specific values are permitted but interoperate only with parties that recognize the same `system` string.", + "description": "Identifier for the reference system. Open string — consumers SHOULD use kebab_case or snake_case stable identifiers (e.g. `nielsen_dma`, `iab_audience`, `uid2`, `ramp_id`, `comscore_market`, `omb_msa`, `nielsen_p_18_49`, `measurement_source`). Custom or vendor-specific values are permitted but interoperate only with parties that recognize the same `system` string.", "minLength": 1, "examples": [ "nielsen_dma", @@ -15,7 +15,9 @@ "uid2", "ramp_id", "comscore_market", - "omb_msa" + "omb_msa", + "nielsen_p_18_49", + "measurement_source" ] }, "value": { @@ -67,6 +69,17 @@ "system": "comscore_market", "value": "NYC-100", "version": "2025-Q1" + }, + { + "system": "nielsen_p_18_49", + "value": "P18-49", + "version": "2025", + "name": "Persons 18-49" + }, + { + "system": "measurement_source", + "value": "set_top_box", + "name": "Set-top box" } ] } diff --git a/static/schemas/source/enums/system-reference-fidelity.json b/static/schemas/source/enums/system-reference-fidelity.json index f1d5ce3781..9d170f5c1c 100644 --- a/static/schemas/source/enums/system-reference-fidelity.json +++ b/static/schemas/source/enums/system-reference-fidelity.json @@ -2,7 +2,7 @@ "$schema": "http://json-schema.org/draft-07/schema#", "$id": "/schemas/enums/system-reference-fidelity.json", "title": "System Reference Fidelity", - "description": "How faithfully a deployment can honor a value defined against an external reference system (see /schemas/core/system-reference.json). Surfaces the cross-system fidelity question at the deployment layer — buyer agents read this BEFORE activating a signal so they can make an informed decision about geographic, taxonomic, or identity drift introduced by the destination's mapping. Generalizes the `market_fidelity` mechanism proposed in #4475 to all reference-system axes (markets, ID graphs, taxonomies, measurement currencies). Used as a property on `deployments[]` entries or analogous per-destination structures; not used on the signal definition itself (the signal's reference is canonical; fidelity is a function of where it's activated). When fidelity is `converted` or `approximated`, the deployment SHOULD also surface a structured conversion descriptor (see /schemas/core/system-reference-conversion.json) so buyer agents can reason about the source, destination, and method without parsing free text.", + "description": "How faithfully a deployment (or delivery measurement attribution) can honor a value defined against an external reference system (see /schemas/core/system-reference.json). Surfaces the cross-system fidelity question at the destination layer — buyer agents read this BEFORE activating a signal, and measurement consumers read it BEFORE accepting a delivery report, so they can make an informed decision about geographic, taxonomic, identity, or measurement-methodology drift introduced by the destination's mapping. Generalizes the `market_fidelity` mechanism proposed in #4475 to all reference-system axes — markets, ID graphs, taxonomies, measurement currencies, and methodology sources (the third-sibling measurement track per #4616). Used as a property on `deployments[]` entries or analogous per-destination / per-measurement-row structures; not used on the signal or measurement definition itself (the source's reference is canonical; fidelity is a function of where it's activated or measured). When fidelity is `converted` or `approximated`, the deployment / report SHOULD also surface a structured conversion descriptor (see /schemas/core/system-reference-conversion.json) so consumers can reason about the source, destination, and method without parsing free text.", "type": "string", "enum": [ "exact", From ed382184de23d349092d750f5381c64fcd029e39 Mon Sep 17 00:00:00 2001 From: evgen Date: Wed, 20 May 2026 09:18:04 -0300 Subject: [PATCH 5/8] review: sharpen inferred vs projected per @SimonaNemes (round 4) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit @SimonaNemes endorsed round 2 of the strawman on #4616 (issuecomment-4493606229) and offered a sharpening of the inferred / projected distinction. Both methods can use ML; the distinction is the LEVEL at which uncertainty operates: inferred = ENTITY-LEVEL attribution from observed signals. Answers: "given these clues, who/what is this entity?" Uncertainty lies in the correctness of attributes assigned to individual entities. projected = POPULATION-LEVEL estimation from a sample. Answers: "given this sample, what should we expect at scale?" Uncertainty lies in the estimate itself, NOT in any individual entity's attributes. Same underlying data may drive both — the distinction is the question being answered and where the uncertainty lives. Changes: - enumDescriptions for `inferred` + `projected` rewritten with the entity-level vs population-level framing and explicit cross- reference to each other so consumers see the contrast. - New paired example: same device_graph input data routed through `inferred` (entity-level attribution with per-record confidence 0.78) and through `projected` (population-level estimate with 95% CI ±0.4pp). Makes the distinction concrete. - Changeset updated to reflect the round-4 sharpening + cite @SimonaNemes's framing. Strawman remains shape-only — no row-level adoption, no structural change. Sharpening the prose so the distinction survives consumer interpretation across implementer teams. Co-Authored-By: Claude Opus 4.7 (1M context) --- .changeset/4616-strawman-system-reference-d1.md | 2 +- .../source/core/system-reference-conversion.json | 13 ++++++++++--- 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/.changeset/4616-strawman-system-reference-d1.md b/.changeset/4616-strawman-system-reference-d1.md index cd253205d5..34a9dff78b 100644 --- a/.changeset/4616-strawman-system-reference-d1.md +++ b/.changeset/4616-strawman-system-reference-d1.md @@ -8,7 +8,7 @@ Adds three reusable schema primitives that future row-level RFCs can adopt witho - `core/system-reference.json` — the canonical `{system, value, version?, name?}` shape for a value defined against an external identity, taxonomy, geographic, or measurement system. `system` is intentionally an open string at the primitive level; per-use constraints live in consuming schemas. Field named `value` (not `id`) because the primitive is cross-axis: identity systems issue IDs, taxonomies issue values/terms, measurement systems issue methodology labels. - `enums/system-reference-fidelity.json` — `exact | converted | approximated | unsupported` for deployment-side fidelity. `converted` covers the case where the destination uses a different system but the conversion is deterministic and lossless (e.g. Nielsen DMA → Comscore Market via crosswalk, UID2 → ID5 via identity graph). Generalizes #4475's `market_fidelity` mechanism to all reference-system axes. -- `core/system-reference-conversion.json` — `{from, to, method, method_provider?, method_details?}` structure describing how a deployment converts between systems. REQUIRED when fidelity is `converted`, OPTIONAL when `approximated`. `method` enum covers `id_graph | name_match | crosswalk | upscaled | inferred | projected | custom`. `method_provider?` surfaces vendor identity (e.g. `LiveRamp`, `ID5`, `IAB`) as an opaque-by-convention string so buyer agents can branch on well-known providers without parsing free-text. +- `core/system-reference-conversion.json` — `{from, to, method, method_provider?, method_details?}` structure describing how a deployment converts between systems. REQUIRED when fidelity is `converted`, OPTIONAL when `approximated`. `method` enum covers `id_graph | name_match | crosswalk | upscaled | inferred | projected | custom`. **`inferred` vs `projected` sharpening per @SimonaNemes (#4616 issuecomment-4493606229):** `inferred` is **entity-level attribution** ("given clues, who/what is this entity?" — uncertainty in per-record correctness); `projected` is **population-level estimation** ("given this sample, what should we expect at scale?" — uncertainty in the estimate, not individuals). Same underlying data can drive both; the distinction is the level at which uncertainty operates. `method_provider?` surfaces vendor identity (e.g. `LiveRamp`, `ID5`, `IAB`) as an opaque-by-convention string so buyer agents can branch on well-known providers without parsing free-text. **Non-normative on its own.** None of the primitives is referenced by any existing schema in this PR. Adoption happens row-by-row in the follow-up RFCs (#4472 / #4475 / identity-substrate) against whatever D1 shape the WG settles on. If the WG counter-proposes a different shape, this PR is three files + one changeset — close and re-draft. No sunk cost. diff --git a/static/schemas/source/core/system-reference-conversion.json b/static/schemas/source/core/system-reference-conversion.json index 78de87c996..0d7a0f3282 100644 --- a/static/schemas/source/core/system-reference-conversion.json +++ b/static/schemas/source/core/system-reference-conversion.json @@ -29,8 +29,8 @@ "name_match": "Match based on a shared external identifier present in both systems (e.g. hashed email common to both sides). Match rate is finite — buyer-decision point on accepted match-rate floor.", "crosswalk": "Published 1:1 mapping with no information loss (e.g. Nielsen DMA → Comscore Market via the canonical county crosswalk; IAB Audience Taxonomy v2 → v3 via the published migration mapping). When `method` is `crosswalk`, the seller asserts the conversion is mathematically reversible.", "upscaled": "Aggregation of finer-grained units into a coarser system (e.g. zip codes → DMA, counties → state, postcodes → MSA). Deterministic and forward-lossless — every fine-grained unit is fully accounted for in the coarser unit. Inverse is undefined: the upscaled value cannot recover the original granularity. For modeled audiences where statistical validity depends on the finer granularity, the deployment's `fidelity` SHOULD be `approximated` even when the method is `upscaled`.", - "inferred": "ML-predicted or rule-based inference (e.g. predicting a buyer's IAB taxonomy node from browsing behavior; predicting demographic membership from device-graph signals). Probabilistic — no deterministic guarantee that any individual value is correctly mapped. Buyer-decision point on accepted confidence floor; `method_provider` SHOULD identify the model or vendor.", - "projected": "Statistical projection from a sample to a population (e.g. panel-projected reach, currency-projected impression counts, audience extrapolation from a measured sub-sample). Used primarily for measurement currencies and modeled audiences. Buyer should inspect `method_details` for sample size, weighting methodology, and confidence intervals where relevant.", + "inferred": "**Entity-level attribution** from observed signals (ML model or rule-based). Answers: *given these clues, who/what is this entity?* Uncertainty lies in the correctness of attributes assigned to individual entities — each `from` value is mapped to a `to` value with finite per-record confidence (e.g. predicting a single buyer's IAB taxonomy node from browsing behavior; predicting demographic membership from device-graph signals). Same input data may also be used for `projected`; the distinction is the level at which uncertainty operates. `method_provider` SHOULD identify the model or vendor; `method_details` SHOULD describe the confidence floor or per-record uncertainty.", + "projected": "**Population-level estimation** from a sample to a population (panel / subset → universe). Answers: *given this sample, what should we expect at scale?* Uncertainty lies in the population-level estimate itself — NOT in any individual entity's attributes (e.g. panel-projected reach, currency-projected impression counts, audience extrapolation from a measured sub-sample). Used primarily for measurement currencies and modeled audiences. Same underlying data may also be used for `inferred`; the distinction is the level at which uncertainty operates. `method_details` SHOULD describe sample size, weighting methodology, and confidence intervals.", "custom": "Vendor-specific mapping not covered by the named methods. Buyer should consult the destination's documentation for the conversion's semantics, coverage, and edge cases. Buyer agents SHOULD treat `custom` as the weakest preservation claim and inspect `method_provider` + `method_details`." } }, @@ -89,7 +89,14 @@ "to": { "system": "iab_audience", "value": "4-1-2-3", "version": "v3" }, "method": "inferred", "method_provider": "Lotame", - "method_details": "ML model v2026.04. Predicted membership confidence 0.78." + "method_details": "ML model v2026.04. Entity-level attribution: this particular device assigned to IAB node 4-1-2-3 with per-record confidence 0.78." + }, + { + "from": { "system": "device_graph", "value": "device_population_sample" }, + "to": { "system": "iab_audience", "value": "4-1-2-3", "version": "v3" }, + "method": "projected", + "method_provider": "Lotame", + "method_details": "Same underlying device-graph data as the `inferred` example above — but operating at population level. Projects sample-level membership counts to a universe estimate for IAB node 4-1-2-3 with 95% CI ±0.4pp. Uncertainty is in the population estimate, not in any individual device's assignment." }, { "from": { "system": "publisher_panel", "value": "panel_A_18-49" }, From 40cedb4fa8212405a9a2688667abfc13185b5323 Mon Sep 17 00:00:00 2001 From: evgen Date: Sun, 24 May 2026 11:23:37 -0400 Subject: [PATCH 6/8] review: address @bokelley WG-acceptance tightening (round 5) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit @bokelley posted WG-level acceptance of D1 on #4616 (issuecomment-4526559566): "Yes on D1 as a shared primitive for newly-modeled epistemic rows" — with four substantive tightening notes before he's comfortable merging. All four addressed in this revision; the changes are normative tightening, not structural. 1. VERSION SEMANTICS REVERSED (system-reference.json) Strawman previously said "treat omitted version as a wildcard match against any version." Bokelley: this recreates the fuzzy matching problem D1 was supposed to avoid. Reversed. Omitted `version` now means UNKNOWN / unpinned — NOT a wildcard. Exact equality requires (system, value, version) to all match for versioned systems. Row-level schemas MAY declare a system version-insensitive (UID2 / RampID etc.); otherwise omitted version is a buyer-decision point. 2. CONVERTED FIDELITY TIGHTENED (system-reference-fidelity.json) Reserved for deterministic AND row-semantics-preserving mappings. Deterministic ≠ lossless preservation: a crosswalk may be mathematically deterministic but lose semantic information the row's downstream consumers depend on (DMA vs Comscore Market methodologies; IAB v2 nodes that split or merge in v3). Sellers MUST NOT advertise `converted` when the conversion changes the row's meaningful semantics. 3. UPSCALED + CROSSWALK CAUTIONS (system-reference-conversion.json) `upscaled` typically pairs with `approximated` fidelity — undefined inverse means lost granularity. Only `converted` if the row explicitly declares granularity doesn't matter. Same caution for `crosswalk` — deterministic mapping is not automatically lossless semantic preservation. 4. INTEROP CAVEAT (system-reference.json description) Explicit note: the primitive alone does NOT create interop. Consuming row-level schemas MUST constrain or document which `system` values are meaningful for that row. D1's value is consistent SHAPE across rows, not a universal vocabulary. Plus updated the description to scope D1 explicitly to newly-modeled rows (per bokelley's "do not replace per-dimension schemas that already work" guardrail — geo_metros / geo_postal_areas stay where they are) and added @tescoboy's product-level audience-construction- metadata row as the fifth D1 customer per bokelley's endorsement. Strawman remains shape-only — no row-level adoption. After this round bokelley said he's "comfortable using it as the D1 foundation for #2041, the identity-substrate RFC, and the #4472 split." Co-Authored-By: Claude Opus 4.7 (1M context) --- .changeset/4616-strawman-system-reference-d1.md | 11 ++++++++++- .../source/core/system-reference-conversion.json | 4 ++-- static/schemas/source/core/system-reference.json | 4 ++-- .../source/enums/system-reference-fidelity.json | 2 +- 4 files changed, 15 insertions(+), 6 deletions(-) diff --git a/.changeset/4616-strawman-system-reference-d1.md b/.changeset/4616-strawman-system-reference-d1.md index 34a9dff78b..9ee8ad992a 100644 --- a/.changeset/4616-strawman-system-reference-d1.md +++ b/.changeset/4616-strawman-system-reference-d1.md @@ -4,7 +4,16 @@ Add `system-reference` primitive + `system-reference-fidelity` enum + `system-reference-conversion` schema — strawman for decision D1 of the signal-AND-measurement epistemic-model umbrella (#4616). -Adds three reusable schema primitives that future row-level RFCs can adopt without re-litigating the shape per-RFC. After @bokelley's 2026-05-17 third-sibling framing (#4616 issuecomment-4470049814), D1 covers BOTH the signals track (#4472 PBA audience_model, #4475 structured market identifiers, @lszczesiak's pending ID-graph RFC) AND the measurement track (#2041 measurement source attribution, with #3885 / #3652 / #3877 as already-shipped publishing/authorization-half prior art). +Adds three reusable schema primitives that future row-level RFCs can adopt without re-litigating the shape per-RFC. After @bokelley's 2026-05-17 third-sibling framing (#4616 issuecomment-4470049814), D1 covers BOTH the signals track (#4472 PBA audience_model, @lszczesiak's pending ID-graph RFC) AND the measurement track (#2041 measurement source attribution, with #3885 / #3652 / #3877 as already-shipped publishing/authorization-half prior art) AND the product-level audience-construction-metadata row surfaced by @tescoboy on 2026-05-19. + +**Per @bokelley's 2026-05-24 WG-acceptance comment (issuecomment-4526559566):** D1 is scoped to newly-modeled rows ONLY — it does NOT replace per-dimension schemas that already work (`geo_metros`, `geo_postal_areas`). The primitive shape alone does NOT create interoperability — consuming row-level schemas MUST constrain or document which `system` values are meaningful for that row. + +**Round 5 normative tightening per @bokelley:** + +1. **Version semantics REVERSED:** omitted `version` now means UNKNOWN / unpinned, NOT a wildcard. Exact equality requires `(system, value, version)` to all match for versioned systems. Row-level schemas MAY declare a system version-insensitive; otherwise omitted version is a buyer-decision point. +2. **`converted` fidelity tightened:** reserved for deterministic AND row-semantics-preserving mappings. Deterministic ≠ lossless preservation; sellers MUST NOT advertise `converted` when the conversion changes the row's meaningful semantics even if per-record mapping is deterministic. +3. **`upscaled` and `crosswalk` cautions:** `upscaled` should typically pair with `approximated` fidelity (undefined inverse → lost granularity); only `converted` if the row explicitly says granularity does not matter. Same caution for `crosswalk` — deterministic mapping is not automatically lossless semantic preservation. +4. **Interop caveat:** explicit note in `system-reference.json` description that the primitive alone doesn't create interop; row-level schemas MUST constrain or document recognized systems. - `core/system-reference.json` — the canonical `{system, value, version?, name?}` shape for a value defined against an external identity, taxonomy, geographic, or measurement system. `system` is intentionally an open string at the primitive level; per-use constraints live in consuming schemas. Field named `value` (not `id`) because the primitive is cross-axis: identity systems issue IDs, taxonomies issue values/terms, measurement systems issue methodology labels. - `enums/system-reference-fidelity.json` — `exact | converted | approximated | unsupported` for deployment-side fidelity. `converted` covers the case where the destination uses a different system but the conversion is deterministic and lossless (e.g. Nielsen DMA → Comscore Market via crosswalk, UID2 → ID5 via identity graph). Generalizes #4475's `market_fidelity` mechanism to all reference-system axes. diff --git a/static/schemas/source/core/system-reference-conversion.json b/static/schemas/source/core/system-reference-conversion.json index 0d7a0f3282..7195856e1d 100644 --- a/static/schemas/source/core/system-reference-conversion.json +++ b/static/schemas/source/core/system-reference-conversion.json @@ -27,8 +27,8 @@ "enumDescriptions": { "id_graph": "Translation via an identity-graph link (e.g. UID2 → ID5 via LiveRamp, TTD, or similar graph provider). Lossless when the graph link is direct; buyer should inspect graph coverage for edge cases (universe shrinkage, link freshness, regional gaps). Vendor SHOULD be surfaced in `method_provider`.", "name_match": "Match based on a shared external identifier present in both systems (e.g. hashed email common to both sides). Match rate is finite — buyer-decision point on accepted match-rate floor.", - "crosswalk": "Published 1:1 mapping with no information loss (e.g. Nielsen DMA → Comscore Market via the canonical county crosswalk; IAB Audience Taxonomy v2 → v3 via the published migration mapping). When `method` is `crosswalk`, the seller asserts the conversion is mathematically reversible.", - "upscaled": "Aggregation of finer-grained units into a coarser system (e.g. zip codes → DMA, counties → state, postcodes → MSA). Deterministic and forward-lossless — every fine-grained unit is fully accounted for in the coarser unit. Inverse is undefined: the upscaled value cannot recover the original granularity. For modeled audiences where statistical validity depends on the finer granularity, the deployment's `fidelity` SHOULD be `approximated` even when the method is `upscaled`.", + "crosswalk": "Published mapping between two reference systems (e.g. Nielsen DMA → Comscore Market via the canonical county crosswalk; IAB Audience Taxonomy v2 → v3 via the published migration mapping). Mapping may be 1:1 or many-to-one; sellers SHOULD specify the granularity in `method_details`. **A deterministic crosswalk is NOT automatically lossless semantic preservation:** even when the per-record mapping is deterministic, the source and destination systems may define their values differently in ways that matter for the row's downstream consumers (e.g. DMA vs Comscore Market boundaries share counties but apply different methodologies; IAB v2 → v3 may merge nodes that downstream consumers treated as distinct). Sellers SHOULD pair `crosswalk` with `converted` fidelity only when the row's use case is unaffected by these semantic differences; otherwise pair with `approximated`.", + "upscaled": "Aggregation of finer-grained units into a coarser system (e.g. zip codes → DMA, counties → state, postcodes → MSA). Deterministic forward — every fine-grained unit is fully accounted for in the coarser unit — but **the inverse is undefined**: the upscaled value cannot recover the original granularity. **The deployment's `fidelity` SHOULD typically be `approximated`, NOT `converted`**, because the row's downstream consumers usually depend on the finer granularity. The narrow exception is rows where the consuming schema explicitly declares that the lost granularity does not matter for the row's use case; in those rows, `converted` may be appropriate. Buyers MUST NOT assume `upscaled` + `converted` means semantically lossless without inspecting the row's explicit rule.", "inferred": "**Entity-level attribution** from observed signals (ML model or rule-based). Answers: *given these clues, who/what is this entity?* Uncertainty lies in the correctness of attributes assigned to individual entities — each `from` value is mapped to a `to` value with finite per-record confidence (e.g. predicting a single buyer's IAB taxonomy node from browsing behavior; predicting demographic membership from device-graph signals). Same input data may also be used for `projected`; the distinction is the level at which uncertainty operates. `method_provider` SHOULD identify the model or vendor; `method_details` SHOULD describe the confidence floor or per-record uncertainty.", "projected": "**Population-level estimation** from a sample to a population (panel / subset → universe). Answers: *given this sample, what should we expect at scale?* Uncertainty lies in the population-level estimate itself — NOT in any individual entity's attributes (e.g. panel-projected reach, currency-projected impression counts, audience extrapolation from a measured sub-sample). Used primarily for measurement currencies and modeled audiences. Same underlying data may also be used for `inferred`; the distinction is the level at which uncertainty operates. `method_details` SHOULD describe sample size, weighting methodology, and confidence intervals.", "custom": "Vendor-specific mapping not covered by the named methods. Buyer should consult the destination's documentation for the conversion's semantics, coverage, and edge cases. Buyer agents SHOULD treat `custom` as the weakest preservation claim and inspect `method_provider` + `method_details`." diff --git a/static/schemas/source/core/system-reference.json b/static/schemas/source/core/system-reference.json index 8e2d16db40..7fd3c15a79 100644 --- a/static/schemas/source/core/system-reference.json +++ b/static/schemas/source/core/system-reference.json @@ -2,7 +2,7 @@ "$schema": "http://json-schema.org/draft-07/schema#", "$id": "/schemas/core/system-reference.json", "title": "System Reference", - "description": "Reference to a value within a named external system. Four primary axes of use: (1) **identity** — UID2, ID5, RampID, hashed-email, MAID; (2) **taxonomy** — IAB Audience Taxonomy, IAB Content Taxonomy, vendor-specific category trees; (3) **geographic** — Nielsen DMA, Comscore Market, OMB MSA, postcode aggregates; (4) **measurement** — measurement currencies (Nielsen P18-49), methodology sources (panel / set_top_box / ACR / census / server_logs / SDK per the construction-methodology row in #4616), vendor measurement schemes. Provides a single canonical shape `{system, value, version?, name?}` reused across signals, deployments, buy-terms, delivery reporting, and measurement attribution wherever a value is defined against an external reference frame. The `system` axis is intentionally an open string at the primitive level; per-use constraints (closed enums, vendor allowlists) belong in the consuming schema's `oneOf` or `enum`, not here. Decision D1 from the signal-and-measurement epistemic-model umbrella (see issue #4616).", + "description": "Reference to a value within a named external system. Four primary axes of use: (1) **identity** — UID2, ID5, RampID, hashed-email, MAID; (2) **taxonomy** — IAB Audience Taxonomy, IAB Content Taxonomy, vendor-specific category trees; (3) **geographic** — Nielsen DMA, Comscore Market, OMB MSA, postcode aggregates (NOTE: where per-dimension geographic schemas already exist in AdCP — `geo_metros`, `geo_postal_areas` — they are the right shape for current geo targeting; D1 does not replace them. D1 applies to geographic surfaces that lack a per-dimension schema today); (4) **measurement** — measurement currencies (Nielsen P18-49), methodology sources (panel / set_top_box / ACR / census / server_logs / SDK per the construction-methodology row in #4616), vendor measurement schemes. Provides a single canonical shape `{system, value, version?, name?}` reused across signals, deployments, buy-terms, delivery reporting, and measurement attribution wherever a value is defined against an external reference frame. The `system` axis is intentionally an open string at the primitive level; per-use constraints (closed enums, vendor allowlists) belong in the consuming schema's `oneOf` or `enum`, not here. **Important: the primitive shape alone does NOT create interoperability — consuming row-level schemas MUST constrain or document which `system` values are meaningful for that row.** D1's value is consistent SHAPE across rows, not a universal cross-row vocabulary. Decision D1 from the signal-and-measurement epistemic-model umbrella (see issue #4616); per @bokelley's WG-acceptance comment, D1 is scoped to newly-modeled rows (identity substrate, taxonomy / ranking references, measurement source / methodology, product-level seller-built audience metadata) and does NOT replace per-dimension schemas that already work.", "type": "object", "properties": { "system": { @@ -27,7 +27,7 @@ }, "version": { "type": "string", - "description": "Optional version of the reference system this `value` is defined against. RECOMMENDED whenever the system has a versioned definition history where boundaries, values, or semantics drift between versions (e.g. `nielsen_dma` boundaries differ between 2020 and 2024 releases; IAB Audience Taxonomy v2 ≠ v3). Omitting `version` implies the latest stable release as known to the emitting party. Buyer-side comparators SHOULD treat omitted `version` as a wildcard match against any version.", + "description": "Optional version of the reference system this `value` is defined against. RECOMMENDED whenever the system has a versioned definition history where boundaries, values, or semantics drift between versions (e.g. `nielsen_dma` boundaries differ between 2020 and 2024 releases; IAB Audience Taxonomy v2 ≠ v3). **Omitted `version` means UNKNOWN / unpinned — NOT a wildcard.** Buyer-side comparators MUST NOT treat omitted version as 'matches any version.' For versioned systems, exact equality requires `(system, value, version)` to all match. Consuming row-level schemas MAY declare that a particular `system` is version-insensitive (e.g. version-less identity systems like UID2 / RampID), in which case omitted version is the only valid form for that system and comparators ignore the field. Otherwise omitted version is a buyer-decision point — the buyer MAY accept the reference, reject it, or treat the missing version as a freshness signal. The 'unknown' framing exists specifically to avoid recreating the fuzzy matching problem D1 is supposed to prevent.", "minLength": 1, "examples": [ "2024", diff --git a/static/schemas/source/enums/system-reference-fidelity.json b/static/schemas/source/enums/system-reference-fidelity.json index 9d170f5c1c..5da4875db6 100644 --- a/static/schemas/source/enums/system-reference-fidelity.json +++ b/static/schemas/source/enums/system-reference-fidelity.json @@ -12,7 +12,7 @@ ], "enumDescriptions": { "exact": "Destination uses the same `system` (and `version` when both are present) — activation preserves the reference frame byte-for-byte.", - "converted": "Destination uses a different `system` from the signal but the seller asserts the conversion is deterministic and lossless — e.g. Nielsen DMA → Comscore Market via the canonical county crosswalk, or UID2 → ID5 via a direct identity-graph link. Functionally equivalent to `exact` for buyers who only need preservation; structurally distinct because the system axis changed. Buyer agents that need to inspect the conversion chain (source / destination / method) read the deployment's `conversion` descriptor.", + "converted": "Destination uses a different `system` from the signal AND the seller asserts the conversion is **deterministic AND row-semantics-preserving** — the conversion preserves the meaningful interpretation of the value within the consuming row's use case, not merely that the per-record mapping is mathematically deterministic. **Deterministic ≠ lossless preservation.** A crosswalk may be mathematically deterministic but lose semantic information that the row's downstream consumers depend on (e.g. Nielsen DMA boundaries vs Comscore Market boundaries share counties but apply different cross-market reach methodologies; IAB v2 → v3 may merge nodes that downstream consumers treated as distinct). Sellers MUST NOT advertise `converted` when the conversion changes the row's meaningful semantics, even if the per-record mapping is deterministic — use `approximated` in that case. Examples that typically qualify for `converted`: UID2 → ID5 via direct identity-graph link, IAB v2 → v3 for nodes with explicit 1:1 published migration. Examples that typically do NOT qualify (use `approximated`): geographic upscaling via the `upscaled` method, currency crosswalks where definitions diverge, taxonomy migrations where v2 nodes split or merge in v3. Buyer agents that need to inspect the conversion chain (source / destination / method) read the deployment's `conversion` descriptor.", "approximated": "Destination maps to a different `system` or a different `version` of the same system; the resulting activation is geographically, taxonomically, or identity-wise close but NOT byte-equivalent. Buyer agents SHOULD treat `approximated` as a buyer-decision point — for modeled audiences whose statistical validity depends on the reference frame, `approximated` activation may invalidate the model.", "unsupported": "Destination cannot resolve the `system` at all. Activation against this destination will fail or silently degrade. Buyer agents MUST NOT proceed without an explicit opt-in to whatever fallback behavior the seller advertises." } From b49275923866193301544ec2527c16a92e200085 Mon Sep 17 00:00:00 2001 From: evgen Date: Sun, 24 May 2026 11:32:08 -0400 Subject: [PATCH 7/8] review: add method_doc_url ads.txt-pattern anchor (round 6) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Round 6 picks up the ads.txt-pattern thread on #4616 between @bokelley, @lukasz-pubx, @SimonaNemes, and Addie. Brian's philosophical question: "Should the protocol carry the weight of fully describing signals and methodology OR should this be something that the signal provider handles separately (thinking about how a media kit carries weight on the inventory side)?" Addie's recommendation in the thread: - Geo identifiers + activation fidelity: KEEP structured (bilaterally verifiable; ads.txt pattern). - Methodology fields: COLLAPSE to a pointer (link out to attested document). One field like methodology_url or audience_model_ref. The strawman primitives (system-reference + fidelity enum + conversion structure) already sit in the "keep structured" half of Addie's recommendation — bilaterally verifiable identifiers and activation- time facts. The conversion structure adds the round-6 anchor field for the "link out" half: method_doc_url (optional URI) — points at the seller's published methodology document (vendor's identity-graph page, published crosswalk specification, IAB v2-to-v3 migration map, etc.). Strictly informational; buyer agents MAY follow out-of-band to verify but MUST NOT branch on its content programmatically. Why at the primitive layer: - Gives downstream row-level RFCs (#4472, #2041, identity-substrate, tescoboy product-level) a canonical place to anchor link-out fields. Without it, each row reinvents the field name. - Picks up Addie's "collapse to a pointer" recommendation as an OPTION at the primitive layer; row-level RFCs that want to require methodology disclosure mark this required in their binding. - Complements existing method_details (free-text inline) — doc_url is the canonical-source pointer; method_details is the triage-time elaboration. Plus a description note that consuming rows MAY add their own row-level last_updated field on the row itself (per @SimonaNemes: signal-record freshness IS verifiable even though underlying methodology freshness isn't). Updated one example (LiveRamp UID2→ID5 conversion) to demonstrate the field in context. Strawman remains shape-only; no row-level adoption. Net delta: +1 optional field, +1 description sentence, +1 example field. Co-Authored-By: Claude Opus 4.7 (1M context) --- .changeset/4616-strawman-system-reference-d1.md | 6 ++++++ .../source/core/system-reference-conversion.json | 15 +++++++++++++-- 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/.changeset/4616-strawman-system-reference-d1.md b/.changeset/4616-strawman-system-reference-d1.md index 9ee8ad992a..966e6420ae 100644 --- a/.changeset/4616-strawman-system-reference-d1.md +++ b/.changeset/4616-strawman-system-reference-d1.md @@ -15,6 +15,12 @@ Adds three reusable schema primitives that future row-level RFCs can adopt witho 3. **`upscaled` and `crosswalk` cautions:** `upscaled` should typically pair with `approximated` fidelity (undefined inverse → lost granularity); only `converted` if the row explicitly says granularity does not matter. Same caution for `crosswalk` — deterministic mapping is not automatically lossless semantic preservation. 4. **Interop caveat:** explicit note in `system-reference.json` description that the primitive alone doesn't create interop; row-level schemas MUST constrain or document recognized systems. +**Round 6 — ads.txt-pattern anchor added per the @bokelley / @lukasz-pubx / @SimonaNemes / Addie thread on #4616:** + +5. **`method_doc_url?`** added to `system-reference-conversion.json` — optional URI pointing at an attested methodology document (vendor's identity-graph methodology page, published crosswalk specification, etc.). Picks up Addie's "collapse to a pointer" recommendation at the primitive layer so downstream row-level RFCs have a canonical place to anchor link-out fields. Strictly informational on the wire; buyer agents MAY follow the link out-of-band to verify but MUST NOT branch on its content programmatically. Description notes that consuming row-level schemas MAY require this field in their row's binding if methodology disclosure matters for the row. + +Plus a description-only note that consuming row-level schemas adopting this primitive MAY add their own row-level `last_updated` field on the row itself to surface signal-record freshness (which IS verifiable, even if underlying methodology freshness isn't — per Simona's framing). + - `core/system-reference.json` — the canonical `{system, value, version?, name?}` shape for a value defined against an external identity, taxonomy, geographic, or measurement system. `system` is intentionally an open string at the primitive level; per-use constraints live in consuming schemas. Field named `value` (not `id`) because the primitive is cross-axis: identity systems issue IDs, taxonomies issue values/terms, measurement systems issue methodology labels. - `enums/system-reference-fidelity.json` — `exact | converted | approximated | unsupported` for deployment-side fidelity. `converted` covers the case where the destination uses a different system but the conversion is deterministic and lossless (e.g. Nielsen DMA → Comscore Market via crosswalk, UID2 → ID5 via identity graph). Generalizes #4475's `market_fidelity` mechanism to all reference-system axes. - `core/system-reference-conversion.json` — `{from, to, method, method_provider?, method_details?}` structure describing how a deployment converts between systems. REQUIRED when fidelity is `converted`, OPTIONAL when `approximated`. `method` enum covers `id_graph | name_match | crosswalk | upscaled | inferred | projected | custom`. **`inferred` vs `projected` sharpening per @SimonaNemes (#4616 issuecomment-4493606229):** `inferred` is **entity-level attribution** ("given clues, who/what is this entity?" — uncertainty in per-record correctness); `projected` is **population-level estimation** ("given this sample, what should we expect at scale?" — uncertainty in the estimate, not individuals). Same underlying data can drive both; the distinction is the level at which uncertainty operates. `method_provider?` surfaces vendor identity (e.g. `LiveRamp`, `ID5`, `IAB`) as an opaque-by-convention string so buyer agents can branch on well-known providers without parsing free-text. diff --git a/static/schemas/source/core/system-reference-conversion.json b/static/schemas/source/core/system-reference-conversion.json index 7195856e1d..9d28f246f4 100644 --- a/static/schemas/source/core/system-reference-conversion.json +++ b/static/schemas/source/core/system-reference-conversion.json @@ -2,7 +2,7 @@ "$schema": "http://json-schema.org/draft-07/schema#", "$id": "/schemas/core/system-reference-conversion.json", "title": "System Reference Conversion", - "description": "Describes the conversion from a source reference system to the destination's native system. Used by two surfaces: (1) signal deployments — when a destination remaps a signal's market / identity / taxonomy / measurement reference; (2) delivery measurement — when a measurement vendor projects, infers, or upscales from one methodology source to another (e.g. set-top-box panel → P18-49 currency). REQUIRED when the surrounding fidelity is `converted`; OPTIONAL surfacing when fidelity is `approximated` (lets consumers see WHY a deployment or report is approximated, not just that it is). Lets buyer agents and measurement consumers reason about source, destination, and conversion method without parsing free-text. The conversion claim itself (`from` system → `to` system, by `method`) is self-reported by the seller / measurement vendor; third-party verification of the claim is a future attestation concern (decision D3 in #4616; see #3877 `completion_source` for shipped prior art on the seller-attested vs vendor-attested split).", + "description": "Describes the conversion from a source reference system to the destination's native system. Used by two surfaces: (1) signal deployments — when a destination remaps a signal's market / identity / taxonomy / measurement reference; (2) delivery measurement — when a measurement vendor projects, infers, or upscales from one methodology source to another (e.g. set-top-box panel → P18-49 currency). REQUIRED when the surrounding fidelity is `converted`; OPTIONAL surfacing when fidelity is `approximated` (lets consumers see WHY a deployment or report is approximated, not just that it is). Lets buyer agents and measurement consumers reason about source, destination, and conversion method without parsing free-text. The conversion claim itself (`from` system → `to` system, by `method`) is self-reported by the seller / measurement vendor; third-party verification of the claim is a future attestation concern (decision D3 in #4616; see #3877 `completion_source` for shipped prior art on the seller-attested vs vendor-attested split). **Per the #4616 thread consensus (@bokelley / @lukasz-pubx / @SimonaNemes / Addie):** the protocol carries structured-where-verifiable fields (source / destination / method) and link-out fields (`method_doc_url`) for claims that need external attestation; consuming row-level schemas that adopt this primitive MAY add their own row-level `last_updated` field on the row itself to surface signal-record freshness (which IS verifiable: the seller published this record on this date, even if the underlying methodology freshness isn't).", "type": "object", "properties": { "from": { @@ -48,9 +48,19 @@ "TTD" ] }, + "method_doc_url": { + "type": "string", + "format": "uri", + "description": "Optional URI of an attested document describing the conversion methodology (e.g. the seller's published mapping reference, the vendor's identity-graph methodology page, the published crosswalk specification). The 'link out to attested document' anchor per @bokelley's #4616 framing and the ads.txt-pattern recommendation that emerged from the @bokelley / @lukasz-pubx / @SimonaNemes thread: structured-where-verifiable, link-to-doc-where-it's-a-claim. Strictly informational on the wire — buyer agents MAY treat its presence as a transparency signal and follow the link out-of-band to verify the seller's conversion claim, but MUST NOT branch on its content programmatically. Consuming row-level schemas that want to require methodology disclosure SHOULD mark this field required in their row's binding; the primitive keeps it optional so it doesn't burden adopters who don't need it.", + "examples": [ + "https://docs.liveramp.com/abilitec-graph/methodology", + "https://iabtechlab.com/wp-content/uploads/audience-taxonomy/v2-to-v3-migration.pdf", + "https://www.nielsen.com/measurement/local-tv/dma-comscore-crosswalk-2024.pdf" + ] + }, "method_details": { "type": "string", - "description": "Free-text vendor-specific elaboration on the method (e.g. graph version, crosswalk publication date, match-rate floor, model name, sample size). Strictly informational; buyer agents MUST NOT branch on this field. Useful for human triage and audit log surfacing.", + "description": "Free-text vendor-specific elaboration on the method (e.g. graph version, crosswalk publication date, match-rate floor, model name, sample size). Strictly informational; buyer agents MUST NOT branch on this field. Complements `method_doc_url` — use `method_doc_url` to point at the canonical methodology document; use `method_details` for short inline elaboration buyers can read at triage time without following the link.", "maxLength": 1024 } }, @@ -69,6 +79,7 @@ "to": { "system": "ramp_id", "value": "RA-9876" }, "method": "id_graph", "method_provider": "LiveRamp", + "method_doc_url": "https://docs.liveramp.com/abilitec-graph/methodology", "method_details": "LiveRamp graph v2026.05. Coverage ~94% in US, ~71% in EU." }, { From ad7127958897dc72322b9b2a172889de8a8436d2 Mon Sep 17 00:00:00 2001 From: evgen Date: Mon, 25 May 2026 07:08:26 -0400 Subject: [PATCH 8/8] review: sharpen value prop + single-party scope per @bokelley line reviews (round 7) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit @bokelley posted two substantive line-level reviews on 2026-05-25 that went deeper than the round-5 tightening notes — they raised abstraction-level questions about whether the primitive justifies its surface area. 1. system-reference.json:8 — "calling these 'systems' is confusing. RampID is clearly an identifier and is defined elsewhere in the protocol. Taxonomies like are too. Not clear to me how this generalized system helps us." 2. system-reference-conversion.json:78 — "this conversion is not per-signal, it's happening upon signal activation. The problem is that there is not necessarily a signal conversion happening - in the programmatic context, LiveRamp has a graph, the publisher may have a graph, the SSP has a graph, the DSP has a graph. Crap, the agency has a graph. We don't have any idea how many times this conversion is happening, and honestly neither do they." Description-only sharpening (no schema shape changes): system-reference.json — Description rewritten to lead with the UNION-AXIS value proposition. The primitive's purpose is union- axis rows (identity substrate, measurement source, PBA taxonomy) where a single field can carry any of N systems with the same comparator semantics. Per @bokelley's "RampID is defined elsewhere" point: yes, and that's fine — D1 does NOT replace per-dimension schemas where they exist (geo_metros, ramp_id, existing taxonomy schemas). For single-axis rows, inline per- dimension fields are simpler. D1 earns its keep specifically on union-axis rows that don't yet have a shape. Without it, each union-axis row independently reinvents oneOf discriminators across N inline shapes; with it, ONE comparator + ONE extension story + ONE schema slot. Also addresses @lukasz-pubx's system → type rename suggestion in a naming note — keeping `system` because the connotation of "named external reference frame with its own lifecycle" is what we want vs. `type` which is overloaded across AdCP for general type discrimination on discriminated unions. system-reference-conversion.json — Description rewritten to clarify SINGLE-PARTY OBSERVABLE SCOPE. Real programmatic chains have multiple parties (publisher / SSP / DSP / agency / vendor) each potentially performing their own conversions; no single party observes the full chain. The structure describes ONE party's observable conversion (signals seller's in-agent translation in the deployment case; measurement vendor's projection in the reporting case), NOT the multi-hop chain. Downstream conversions are out of scope, observed by other parties; the protocol intentionally doesn't pretend the seller can speak to them. Buyer agents reading this should understand they're seeing one party's view, not the full provenance from origin to activation. Strawman remains shape-only — no row-level adoption, no field additions / removals / renames. The schema is unchanged; descriptions catch up to what the structure actually means. Co-Authored-By: Claude Opus 4.7 (1M context) --- .changeset/4616-strawman-system-reference-d1.md | 8 ++++++++ .../schemas/source/core/system-reference-conversion.json | 2 +- static/schemas/source/core/system-reference.json | 2 +- 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/.changeset/4616-strawman-system-reference-d1.md b/.changeset/4616-strawman-system-reference-d1.md index 966e6420ae..8c5e0aeb3e 100644 --- a/.changeset/4616-strawman-system-reference-d1.md +++ b/.changeset/4616-strawman-system-reference-d1.md @@ -21,6 +21,14 @@ Adds three reusable schema primitives that future row-level RFCs can adopt witho Plus a description-only note that consuming row-level schemas adopting this primitive MAY add their own row-level `last_updated` field on the row itself to surface signal-record freshness (which IS verifiable, even if underlying methodology freshness isn't — per Simona's framing). +**Round 7 — value-prop sharpening + single-party scope clarification per @bokelley's 2026-05-25 line reviews on PR #4622:** + +6. **`system-reference.json` description** rewritten to lead with the **union-axis value proposition**: the primitive earns its keep on rows where a single field can carry any of several external systems with the same comparator semantics (identity substrate, measurement source, PBA taxonomy). For single-axis rows where only one system applies, inline per-dimension fields remain simpler — the primitive is overhead. Explicitly does NOT replace working per-dimension schemas like `geo_metros` / `geo_postal_areas` / existing `ramp_id`. Per @bokelley's "RampID is defined elsewhere" comment: yes, and that's fine — the primitive's value is in giving union-axis rows ONE comparator vs. N parallel inline shapes, not in wrapping single-system rows that already have a shape. + +7. **`system-reference-conversion.json` description** rewritten to clarify **single-party observable scope** per @bokelley's comment that real programmatic chains have multiple hops (publisher / SSP / DSP / agency / vendor) and no single party observes the full chain. The structure describes ONE party's observable conversion (signals seller's in-agent translation, measurement vendor's projection), NOT the multi-hop chain. Downstream conversions are out of scope, observed by other parties; the protocol intentionally doesn't pretend the seller can speak to them. + +Plus naming note in the primitive description acknowledging @lukasz-pubx's `system → type` suggestion but keeping `system` — the connotation of "named external reference frame with its own lifecycle" is what we want vs. `type` which is overloaded across AdCP for general type discrimination on discriminated unions. + - `core/system-reference.json` — the canonical `{system, value, version?, name?}` shape for a value defined against an external identity, taxonomy, geographic, or measurement system. `system` is intentionally an open string at the primitive level; per-use constraints live in consuming schemas. Field named `value` (not `id`) because the primitive is cross-axis: identity systems issue IDs, taxonomies issue values/terms, measurement systems issue methodology labels. - `enums/system-reference-fidelity.json` — `exact | converted | approximated | unsupported` for deployment-side fidelity. `converted` covers the case where the destination uses a different system but the conversion is deterministic and lossless (e.g. Nielsen DMA → Comscore Market via crosswalk, UID2 → ID5 via identity graph). Generalizes #4475's `market_fidelity` mechanism to all reference-system axes. - `core/system-reference-conversion.json` — `{from, to, method, method_provider?, method_details?}` structure describing how a deployment converts between systems. REQUIRED when fidelity is `converted`, OPTIONAL when `approximated`. `method` enum covers `id_graph | name_match | crosswalk | upscaled | inferred | projected | custom`. **`inferred` vs `projected` sharpening per @SimonaNemes (#4616 issuecomment-4493606229):** `inferred` is **entity-level attribution** ("given clues, who/what is this entity?" — uncertainty in per-record correctness); `projected` is **population-level estimation** ("given this sample, what should we expect at scale?" — uncertainty in the estimate, not individuals). Same underlying data can drive both; the distinction is the level at which uncertainty operates. `method_provider?` surfaces vendor identity (e.g. `LiveRamp`, `ID5`, `IAB`) as an opaque-by-convention string so buyer agents can branch on well-known providers without parsing free-text. diff --git a/static/schemas/source/core/system-reference-conversion.json b/static/schemas/source/core/system-reference-conversion.json index 9d28f246f4..5aaf452157 100644 --- a/static/schemas/source/core/system-reference-conversion.json +++ b/static/schemas/source/core/system-reference-conversion.json @@ -2,7 +2,7 @@ "$schema": "http://json-schema.org/draft-07/schema#", "$id": "/schemas/core/system-reference-conversion.json", "title": "System Reference Conversion", - "description": "Describes the conversion from a source reference system to the destination's native system. Used by two surfaces: (1) signal deployments — when a destination remaps a signal's market / identity / taxonomy / measurement reference; (2) delivery measurement — when a measurement vendor projects, infers, or upscales from one methodology source to another (e.g. set-top-box panel → P18-49 currency). REQUIRED when the surrounding fidelity is `converted`; OPTIONAL surfacing when fidelity is `approximated` (lets consumers see WHY a deployment or report is approximated, not just that it is). Lets buyer agents and measurement consumers reason about source, destination, and conversion method without parsing free-text. The conversion claim itself (`from` system → `to` system, by `method`) is self-reported by the seller / measurement vendor; third-party verification of the claim is a future attestation concern (decision D3 in #4616; see #3877 `completion_source` for shipped prior art on the seller-attested vs vendor-attested split). **Per the #4616 thread consensus (@bokelley / @lukasz-pubx / @SimonaNemes / Addie):** the protocol carries structured-where-verifiable fields (source / destination / method) and link-out fields (`method_doc_url`) for claims that need external attestation; consuming row-level schemas that adopt this primitive MAY add their own row-level `last_updated` field on the row itself to surface signal-record freshness (which IS verifiable: the seller published this record on this date, even if the underlying methodology freshness isn't).", + "description": "Describes **one party's observable conversion claim** — typically the signals seller's view (in signal deployments) or the measurement vendor's view (in delivery reports). Per @bokelley's #4616 review: real programmatic chains have multiple parties (publisher / SSP / DSP / agency / vendor) each potentially performing their own identity-graph, taxonomy, or geographic conversions; no single party observes the full chain. This structure does NOT model the multi-hop conversion chain — it models ONE party's observable hop, the one that party can describe truthfully. Buyer agents reading this should understand they're seeing one party's conversion claim, not the full provenance from origin to activation. Downstream conversions (DSP graph, SSP graph, agency graph) are out of scope; they happen, they're real, and the protocol intentionally does not pretend the seller can speak to them.\n\nUsed by two surfaces: (1) signal deployments — when the **signals seller** remaps a signal's market / identity / taxonomy / measurement reference inside its own agent before activation; (2) delivery measurement — when the **measurement vendor** projects, infers, or upscales from one methodology source to another (e.g. set-top-box panel → P18-49 currency). REQUIRED when the surrounding fidelity is `converted`; OPTIONAL surfacing when fidelity is `approximated` (lets consumers see WHY a deployment or report is approximated, not just that it is). Lets buyer agents and measurement consumers reason about source, destination, and conversion method without parsing free-text.\n\nThe conversion claim itself (`from` system → `to` system, by `method`) is self-reported by the seller / measurement vendor; third-party verification of the claim is a future attestation concern (decision D3 in #4616; see #3877 `completion_source` for shipped prior art on the seller-attested vs vendor-attested split). **Per the #4616 thread consensus (@bokelley / @lukasz-pubx / @SimonaNemes / Addie):** the protocol carries structured-where-verifiable fields (source / destination / method) and link-out fields (`method_doc_url`) for claims that need external attestation; consuming row-level schemas that adopt this primitive MAY add their own row-level `last_updated` field on the row itself to surface signal-record freshness (which IS verifiable: the seller published this record on this date, even if the underlying methodology freshness isn't).", "type": "object", "properties": { "from": { diff --git a/static/schemas/source/core/system-reference.json b/static/schemas/source/core/system-reference.json index 7fd3c15a79..a9e2f1c26c 100644 --- a/static/schemas/source/core/system-reference.json +++ b/static/schemas/source/core/system-reference.json @@ -2,7 +2,7 @@ "$schema": "http://json-schema.org/draft-07/schema#", "$id": "/schemas/core/system-reference.json", "title": "System Reference", - "description": "Reference to a value within a named external system. Four primary axes of use: (1) **identity** — UID2, ID5, RampID, hashed-email, MAID; (2) **taxonomy** — IAB Audience Taxonomy, IAB Content Taxonomy, vendor-specific category trees; (3) **geographic** — Nielsen DMA, Comscore Market, OMB MSA, postcode aggregates (NOTE: where per-dimension geographic schemas already exist in AdCP — `geo_metros`, `geo_postal_areas` — they are the right shape for current geo targeting; D1 does not replace them. D1 applies to geographic surfaces that lack a per-dimension schema today); (4) **measurement** — measurement currencies (Nielsen P18-49), methodology sources (panel / set_top_box / ACR / census / server_logs / SDK per the construction-methodology row in #4616), vendor measurement schemes. Provides a single canonical shape `{system, value, version?, name?}` reused across signals, deployments, buy-terms, delivery reporting, and measurement attribution wherever a value is defined against an external reference frame. The `system` axis is intentionally an open string at the primitive level; per-use constraints (closed enums, vendor allowlists) belong in the consuming schema's `oneOf` or `enum`, not here. **Important: the primitive shape alone does NOT create interoperability — consuming row-level schemas MUST constrain or document which `system` values are meaningful for that row.** D1's value is consistent SHAPE across rows, not a universal cross-row vocabulary. Decision D1 from the signal-and-measurement epistemic-model umbrella (see issue #4616); per @bokelley's WG-acceptance comment, D1 is scoped to newly-modeled rows (identity substrate, taxonomy / ranking references, measurement source / methodology, product-level seller-built audience metadata) and does NOT replace per-dimension schemas that already work.", + "description": "Reference to a value within a named external system. **Primary value: a uniform shape for union-axis rows** — rows where a single field can carry any of several external systems with the same comparator semantics. The motivating cases (per the #4616 thread): an **identity-substrate** field that may carry UID2 OR RampID OR ID5 OR hashed-email OR MAID; a **measurement-source** field (#2041) that may carry set_top_box OR ACR OR census OR panel; a **PBA taxonomy** field (#4472) that may carry IAB Audience Taxonomy v3 OR vendor-X taxonomy OR a custom ranking reference. Without the primitive, each union-axis row independently invents `oneOf` discriminators across N inline shapes; with it, one comparator (`a.system === b.system && a.value === b.value && (a.version ?? null) === (b.version ?? null)`), one extension story, one schema slot.\n\n**Not a replacement for per-dimension schemas.** Per @bokelley's WG framing: where AdCP has working per-dimension schemas (`geo_metros`, `geo_postal_areas`, the existing `ramp_id` / taxonomy schemas), they are the right shape — inline per-dimension fields are simpler than wrapping `RampID` in `{system: \"ramp_id\", value: \"...\"}` for a row that only ever carries RampID. D1 applies to **union-axis rows that don't yet have a row shape** (identity substrate, taxonomy / ranking references, measurement source / methodology, product-level seller-built audience metadata); it does NOT churn surfaces that already work.\n\nThe `system` axis is intentionally an open string at the primitive level; per-use constraints (closed enums, vendor allowlists) belong in the consuming schema's `oneOf` or `enum`, not here. **Important: the primitive shape alone does NOT create interoperability — consuming row-level schemas MUST constrain or document which `system` values are meaningful for that row.** D1's value is consistent SHAPE across rows, not a universal cross-row vocabulary.\n\n**Naming note:** `system` (not `type`) — `system` carries the connotation of a named external reference frame with its own lifecycle (Nielsen, IAB, LiveRamp), vs. `type` which is overloaded across AdCP for general type discrimination on discriminated unions. Decision D1 from the signal-and-measurement epistemic-model umbrella (see issue #4616).", "type": "object", "properties": { "system": {