Skip to content

draft: D1 strawman — system-reference primitive + fidelity enum (RFC #4616)#4622

Draft
EvgenyAndroid wants to merge 8 commits into
adcontextprotocol:mainfrom
EvgenyAndroid:4616-strawman-system-reference
Draft

draft: D1 strawman — system-reference primitive + fidelity enum (RFC #4616)#4622
EvgenyAndroid wants to merge 8 commits into
adcontextprotocol:mainfrom
EvgenyAndroid:4616-strawman-system-reference

Conversation

@EvgenyAndroid
Copy link
Copy Markdown

@EvgenyAndroid EvgenyAndroid commented May 16, 2026

This is a draft / discussion artifact, NOT a merge bid. Filed proactively per the process proposed in #4616 (comment-4468168781) so the WG has something concrete to react to on the shape of decision D1 (shared primitive vs per-dimension types).

Closing this and re-drafting is the expected path if the WG counter-proposes a fundamentally different shape. No sunk cost — three files + a changeset.

Updated 2026-05-16 round 1 (commit 5a0ebfe7): folded @lukasz-pubx's review + #4616 D1 vote:

  • Renamed idvalue (cross-axis neutrality — primitive is used for taxonomies & measurement currencies, not just identity systems)
  • Added converted to the fidelity enum (deterministic & lossless conversion case)
  • Added new core/system-reference-conversion.json schema factoring "how preserved?" (enum) separately from "via what?" ({from, to, method, method_details?})

Updated 2026-05-16 round 2 (commit ac4c6183): @lukasz-pubx's follow-up — three new method values + a vendor-identity field:

  • Method enum extends from 4 → 7: id_graph | name_match | crosswalk | upscaled | inferred | projected | custom
  • Added method_provider? to the conversion structure (opaque-by-convention string for vendor identity: LiveRamp, ID5, IAB, Nielsen, Comscore, etc.)

Updated 2026-05-17 round 3 (commit 9716bb15): @bokelley's third-sibling framing on #4616 (issuecomment-4470049814) — D1 covers measurement AS WELL AS signals; the primitive's design was already cross-axis but descriptions buried measurement as secondary. Promoting to first-class:

  • Schema descriptions rewritten to explicitly enumerate four primary axes (identity / taxonomy / geographic / measurement), with measurement currencies AND methodology sources called out distinctly.
  • New examples: nielsen_p_18_49:P18-49:2025 (measurement currency), measurement_source:set_top_box (the Measurement source attribution on delivery reports #2041 canonical use case), STB-to-currency projection (measurement-side conversion).
  • Description-only / examples-only change. Structural shape unchanged.

Customer-set update 2026-05-18: #4475 closed by @bokelley on 5/17 — "we have geo_metro already, and I think it makes sense to extend this to include MSAs vs creating a new concept here. The only one of these that doesn't fit is zip, but we also have geo_postal_areas which covers that." Per-dimension geo schemas already cover #4475's row. Re-centering D1's customer set on rows where per-dimension schemas DON'T already exist — see Cross-references below. D1's value prop sharpens here: stop dimension-schema proliferation before it starts on newly-modeled rows (ID graphs, measurement methodology, taxonomy versioning), rather than replace per-dimension schemas that already work. No structural change to the strawman.

Updated 2026-05-20 round 4 (commit ed382184): @SimonaNemes endorsed round 2 on #4616 (issuecomment-4493606229) AND offered a sharpening of the inferred vs projected distinction — both can use ML; the distinction is the LEVEL at which uncertainty operates:

  • inferred = entity-level attribution ("given clues, who/what is this entity?" — uncertainty per-record)
  • projected = population-level estimation ("given this sample, what should we expect at scale?" — uncertainty in the estimate, not individuals)

enumDescriptions rewritten with the entity-level vs population-level framing and cross-references. Paired example added: same device_graph input data routed through inferred (per-record IAB attribution confidence 0.78) and projected (population-level membership 95% CI ±0.4pp). Description + example only; no structural change.

Updated 2026-05-24 round 5 (commit 40cedb4f): @bokelley's WG-acceptance comment on #4616 (issuecomment-4526559566) — "Yes on D1 as a shared primitive for newly-modeled epistemic rows" — with four substantive tightening notes addressed:

  • Version semantics REVERSED: omitted version now means UNKNOWN / unpinned, NOT a wildcard. Exact equality requires (system, value, version) match for versioned systems; row-level schemas MAY declare a system version-insensitive (UID2 / RampID etc.). Comparators MUST NOT treat omitted version as "matches any version."
  • converted fidelity tightened — reserved for deterministic AND row-semantics-preserving mappings. Deterministic ≠ lossless preservation; sellers MUST NOT advertise converted when conversion changes the row's meaningful semantics.
  • upscaled and crosswalk cautions — both should typically pair with approximated fidelity (undefined inverse → lost granularity for upscaled; deterministic ≠ semantically lossless for crosswalk). converted only when the row explicitly says the lost granularity / semantic difference does not matter.
  • Interop caveat added to primitive description: "the primitive shape alone does NOT create interoperability — consuming row-level schemas MUST constrain or document which system values are meaningful for that row."

Plus scoping clarification: D1 explicitly applies to newly-modeled rows only and does NOT replace per-dimension schemas that already work (geo_metros, geo_postal_areas). Per bokelley: "After that, I'm comfortable using it as the D1 foundation for #2041, the identity-substrate RFC, and the #4472 split."

Updated 2026-05-24 round 6 (commit b49275923): Picks up the @bokelley / @lukasz-pubx / @SimonaNemes / Addie ads.txt-pattern thread on #4616 — protocol carries structured-where-verifiable, links-to-doc-where-it's-a-claim. The strawman primitives are already in the "keep structured" half; round 6 adds the link-out anchor:

  • method_doc_url? on system-reference-conversion.json — optional URI pointing at the seller's published methodology document (vendor identity-graph page, published crosswalk spec, IAB migration map). Strictly informational on the wire; buyers MAY follow out-of-band to verify but MUST NOT branch on its content programmatically. Consuming row-level schemas MAY require this field in their row's binding if methodology disclosure matters.
  • Description note that consuming rows adopting the primitive MAY add their own row-level last_updated field on the row itself for signal-record freshness (verifiable, per @SimonaNemes — seller published this record on this date, even if underlying methodology freshness isn't).

Net: +1 optional field, +1 description sentence, +1 example field. Gives downstream row-level RFCs (#4472, #2041, identity-substrate, @tescoboy's product-level) a canonical place to anchor link-out fields without each row reinventing.

Updated 2026-05-25 round 7 (commit ad712795): @bokelley's 5/25 line-level reviews went deeper than the round-5 tightening notes — abstraction-level questions about whether the primitive justifies its surface area. Description-only sharpening (no schema shape change):

  • Primitive description now leads with the union-axis value proposition — D1 earns its keep on rows where a single field can carry any of N systems with the same comparator semantics (identity substrate, measurement source, PBA taxonomy). For single-axis rows, inline per-dimension fields remain simpler. Explicitly does NOT replace existing per-dimension schemas (per @bokelley's "RampID is defined elsewhere" point).
  • Conversion structure description clarifies single-party observable scope — the structure describes ONE party's observable conversion (signals seller's in-agent translation, measurement vendor's projection), NOT the multi-hop chain (publisher / SSP / DSP / agency graphs) per @bokelley's review.
  • Naming note added acknowledging @lukasz-pubx's system → type suggestion but keeping system (less overloaded across AdCP).

Net: 3 description rewrites, zero shape changes. The schema is unchanged; descriptions catch up to what the structure actually means.

What's in the PR

Three reusable schema primitives, no row-level adoption:

1. static/schemas/source/core/system-reference.json

The canonical {system, value, version?, name?} shape for any value defined against an external identity / taxonomy / geographic / measurement system. Used wherever a row-level RFC needs to reference a Nielsen DMA market, an IAB Audience Taxonomy node, a UID2 identity, etc.

Design choices spelled out for review:

  • Field named value, not id. The primitive is cross-axis: identity systems issue IDs, taxonomies issue values/terms, measurement systems issue methodology labels. id is identity-axis-coded and collides with AdCP's existing *_id fields that point at AdCP-issued entities with AdCP lifecycle. value is the cross-vocabulary least common denominator.
  • system is an open string at the primitive level. Per-use constraints (closed enums, vendor allowlists) belong in the consuming schema's oneOf or enum, not here. Rationale: a closed enum at the primitive level forces every new system addition through a primitive bump; per-use constraints let row-RFCs add systems independently. Recommended convention: kebab_case or snake_case stable identifiers.
  • version? is RECOMMENDED, not REQUIRED. Some systems (UID2, RampID) are version-less; others (Nielsen DMA, IAB Audience Taxonomy) have boundary/semantic drift between versions. Omitting version implies "latest stable as known to the emitting party." Buyer-side comparators SHOULD treat omitted version as a wildcard against any version. Required wherever the system has version history is the kind of constraint that belongs in the row-RFC, not the primitive.
  • name? is informational only. Never used for equality or routing. Strictly for UI display. The canonical reference is (system, value, version?).
  • additionalProperties: false — primitives are tight; extensions belong on the consuming schema, not on the reference itself.

2. static/schemas/source/enums/system-reference-fidelity.json

exact | converted | approximated | unsupported — generalizes the market_fidelity mechanism originally proposed in #4475 (now closed; mechanism preserved here as a reusable primitive for the remaining customer rows) to all reference-system axes. Used as a property on deployments[] entries or analogous per-destination structures.

  • exact — destination uses the same system (and version when both are present); reference frame preserved byte-for-byte.
  • converted — destination uses a DIFFERENT system but the seller asserts the conversion is deterministic and lossless. Examples: Nielsen DMA → Comscore Market via canonical county crosswalk; UID2 → ID5 via direct identity-graph link; IAB v2 → v3 via published migration map. Functionally equivalent to exact for buyers who only need preservation; structurally distinct because the system axis changed. Buyers reasoning about the federation chain read the deployment's conversion block (file 3 below).
  • approximated — destination maps to a different system or different version; close but not byte-equivalent. Buyer-decision point. For modeled audiences whose statistical validity depends on the reference frame, approximated activation may invalidate the model.
  • unsupported — destination cannot resolve the system. Activation will fail or silently degrade. Buyer MUST NOT proceed without explicit opt-in to seller's advertised fallback.

Fidelity is per-deployment, not per-signal — the signal's reference is canonical; fidelity is a function of where it's activated.

3. static/schemas/source/core/system-reference-conversion.json

{from, to, method, method_details?} describing how a deployment converts between systems. REQUIRED when fidelity is converted; OPTIONAL surfacing when approximated (lets buyers see WHY a deployment is approximated, not just that it is).

  • from — signal's original reference system (a full system-reference)
  • to — system the destination natively uses (a full system-reference)
  • method enum — id_graph | name_match | crosswalk | upscaled | inferred | projected | custom (semantics in the schema's enumDescriptions)
  • method_provider? — optional opaque-by-convention vendor identity (LiveRamp, ID5, IAB, Nielsen, etc.). Consumers MAY branch on well-known providers; not a closed enum
  • method_details? — free-text vendor-specific elaboration; strictly informational; buyer agents MUST NOT branch on this field

Factors the question correctly: the enum says "how preserved?"; the conversion structure says "via what?". Buyers who want the simple signal stop at the enum; buyers who want to reason about the federation chain (e.g. for ID-graph translation correctness) read the conversion block.

What's NOT in this PR (deliberately)

Open questions for WG review

  1. Shape: is {system, value, version?, name?} the right field set? Alternatives raised in adjacent work include {kind, identifier, vocabulary?} and {type, code, scheme?}. Field names matter for downstream codegen.
  2. system as open string vs closed enum at the primitive level. Open string is the strawman's choice (extensibility at row-RFC level); a closed enum at the primitive level is the more restrictive alternative. Trade-off explained above.
  3. method enum scope. Round 2 extended the enum from 4 → 7 values per @lukasz-pubx feedback. Current set: id_graph | name_match | crosswalk | upscaled | inferred | projected | custom. Anything still missing for ID-graph / measurement / probabilistic cases?
  4. Naming of the fidelity enum: system-reference-fidelity is the strawman's choice. system-fidelity, reference-fidelity, or domain-prefixed names (market-fidelity, taxonomy-fidelity per-dimension) are alternatives. Strawman picks the canonical one. (Originally mirrored RFC: Structured geographic market identifiers for signals with market-bounded audiences #4475's market_fidelity naming; that RFC is now closed but the mechanism generalizes cleanly to the remaining customer rows.)
  5. 3.1.x vs 4.0 landing window. Process proposal in RFC: Signal epistemic model — framing the questions behind in-flight signal RFCs #4616 argues for 3.1.x as foundational schema (additive, patch-eligible, unblocks 4.0 row-RFCs). Open to WG counter-proposal.

Test plan

Cross-references

Signals row-level RFCs (D1 customers, pending):

Closed / handled by existing per-dimension schemas (no longer a D1 customer):

Measurement row-level RFC (D1 customer, pending — added per @bokelley's third-sibling framing 5/17):

Measurement prior art (already shipped, informs D3 direction):

Adjacent (not blocked here):

Precedents (PR shape):

I have read the IPR Policy

…rawman for adcontextprotocol#4616

NOT A MERGE BID — discussion artifact for the WG to react to on the
shape of decision D1 (shared {system, id, version?} primitive vs
per-dimension types) from the signal epistemic-model umbrella issue
adcontextprotocol#4616.

Two files added, no existing schema modified:

  static/schemas/source/core/system-reference.json
    The canonical {system, id, version?, name?} shape for any value
    defined against an external identity / taxonomy / geographic /
    measurement system. `system` is intentionally an open string at
    the primitive level — per-use constraints (closed enums, vendor
    allowlists) belong in the consuming schema's oneOf or enum, not
    here. `version` is optional but recommended whenever the system
    has a versioned definition history. `name` is informational only.

  static/schemas/source/enums/system-reference-fidelity.json
    exact | approximated | unsupported. Generalizes the market_fidelity
    enum proposed in adcontextprotocol#4475 to all reference-system axes (markets, ID
    graphs, taxonomies, measurement currencies). Used as a property on
    deployments[] entries or analogous per-destination structures.

Plus changeset entry following adcontextprotocol#2506 / adcontextprotocol#4271 precedent.

Non-normative on its own: neither primitive is referenced by any
existing schema in this PR. Adoption happens row-by-row in the
follow-up RFCs (adcontextprotocol#4472, adcontextprotocol#4475, identity-substrate) against whatever
shape the WG settles on. If the WG counter-proposes, this PR is one
file + one enum + this changeset — close and re-draft. No sunk cost.

Closes (when decided): the D1 thread of adcontextprotocol#4616.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@EvgenyAndroid
Copy link
Copy Markdown
Author

I have read the IPR Policy

"title": "System Reference Fidelity",
"description": "How faithfully a deployment can honor a value defined against an external reference system (see /schemas/core/system-reference.json). Surfaces the cross-system fidelity question at the deployment layer — buyer agents read this BEFORE activating a signal so they can make an informed decision about geographic, taxonomic, or identity drift introduced by the destination's mapping. Generalizes the `market_fidelity` mechanism proposed in #4475 to all reference-system axes (markets, ID graphs, taxonomies, measurement currencies). Used as a property on `deployments[]` entries or analogous per-destination structures; not used on the signal definition itself (the signal's reference is canonical; fidelity is a function of where it's activated).",
"type": "string",
"enum": [
Copy link
Copy Markdown

@lukasz-pubx lukasz-pubx May 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a converted (or translated) option i.e. if nielsen has been (or can be) converted exactly into comscore, or UID2 has been translated to ID5, similar to temp from C to F. I'm a bit torn, but I don't think approximated covers this, especially if you can guarantee it or carry high degree of confidence.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed converted + a new system-reference-conversion.json schema in 5a0ebfe. Factored your two questions ("should there be a converted option?" and "from what to what / how was it done?") as separate concerns:

  • Fidelity enum answers "how preserved?": exact | converted | approximated | unsupported. converted = different system, deterministic & lossless (your C→F analogy).
  • system-reference-conversion.json answers "via what?": {from, to, method, method_details?} with method enum.

The conversion structure is REQUIRED when fidelity is converted, OPTIONAL when approximated (so buyers can see WHY a deployment is approximated, not just that it is). Splitting the two means buyers who only need the simple signal stop at the enum; buyers who need to reason about the federation chain read the conversion block.

One question on the method enum — strawman picks four values:

  • id_graph — translation via identity-graph link (UID2 → ID5 via LiveRamp / TTD / similar)
  • name_match — shared external identifier present in both systems (e.g. hashed email)
  • crosswalk — published 1:1 mapping with no information loss (Nielsen DMA → Comscore Market via the canonical county crosswalk; IAB v2 → v3 via published migration map)
  • custom — vendor-specific mapping

Does that cover the ID-graph cases you have in mind? Anything missing that you'd want represented — e.g. inferred (ML-predicted match), aggregation (geographic upscaling), sampled (panel projection)? Easier to lock the method-enum scope now than after row-level RFCs adopt the primitive.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @EvgenyAndroid, AdCP never sleeps ;) To answer your follow up questions:

The upscaled for geographic aggregation is a good one for sure, inferred and projected (for sampled data) as well.

For the ID-graph case I would consider adding one more optional i.e. method_provider? -> {from, to, method, method_provider?, method_details?} to allow to specify the vendor.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All four accepted, pushed in ac4c618. Method enum extends to 7 values, ordered by semantic grouping:

Group Methods
Identity-domain id_graph, name_match
Structural-mapping crosswalk, upscaled (new)
Probabilistic inferred (new), projected (new)
Escape hatch custom

Semantics for the three new values (in the schema's enumDescriptions):

  • upscaled — deterministic aggregation of finer-grained units into a coarser system (zip → DMA, county → state, postcode → MSA). Forward-lossless; inverse undefined. Important nuance I added in the description: for modeled audiences whose statistical validity depends on the finer granularity, deployment fidelity SHOULD be approximated even when the method is upscaled. The method describes the transform; fidelity describes preservation for the use case.
  • inferred — ML-predicted or rule-based inference (device graph → IAB taxonomy node; behavior → demographic membership). Probabilistic, no per-record correctness guarantee. method_provider SHOULD identify the model or vendor.
  • projected — statistical projection from a sample to a population (panel-projected reach, currency-projected impression counts, audience extrapolation). Used primarily for measurement currencies and modeled audiences. method_details SHOULD describe sample size, weighting methodology, and confidence intervals where relevant.

method_provider? added as an optional opaque-by-convention string. Consumers MAY branch on well-known names (LiveRamp, ID5, IAB, Nielsen, Comscore, TTD) without it being a closed enum — new providers enter additively, same pattern as the top-level system axis. Most meaningful for id_graph / inferred where the vendor IS the meaningful identifier; less load-bearing for crosswalk where the published mapping is canonical regardless of who hosts it.

Updated examples now cover all 7 method values with realistic method_provider values — see commit ac4c6183 for the diff.

Anything else missing? With these additions the strawman is starting to feel covered for the ID-graph + measurement-currency cases; happy to keep iterating if you can think of method types the seven values still don't capture cleanly.

"omb_msa"
]
},
"id": {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling that using the word id is confusing, should this be simply value?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — pushed the rename in 5a0ebfe. Same reasoning at three levels:

  1. AdCP's existing *_id fields all point at AdCP-issued entities (account_id, brand_id, media_buy_id, signal_agent_segment_id) with AdCP lifecycle. Reusing id for external-system values invites buyer-agent code to conflate the two — same word, materially different semantics.
  2. For taxonomies (IAB Audience Taxonomy paths like 4-1-2-3) and measurement currencies (methodology labels), value reads more naturally than id. IAB docs themselves say "values" / "terms."
  3. value is cross-axis-neutral; id is identity-axis-coded and prejudices the primitive's read for non-identity uses.

The (system, value) tuple now replaces (system, id) throughout the schema description, the version description, the name description, the required array, and all examples.

evgen and others added 2 commits May 16, 2026 19:06
Responding to @lukasz-pubx's review on this PR and his D1 vote in adcontextprotocol#4616:

  1. id→value rename — "I have a feeling that using the word `id` is
     confusing" (PR comment r3253676569 + adcontextprotocol#4616 vote).
     Agreed. The primitive is intentionally cross-axis (markets,
     taxonomies, ID graphs, measurement currencies); `id` is
     identity-axis-coded and creates a connotation collision with
     existing AdCP `*_id` fields that point at AdCP-issued entities.
     `value` is system-axis-neutral and matches how taxonomy authors
     (IAB) refer to taxonomy terms.

  2. Add `converted` to the fidelity enum (PR comment r3253674753 +
     adcontextprotocol#4616 vote).
     Covers the case where the destination uses a different system but
     the conversion is deterministic and lossless — Nielsen DMA →
     Comscore Market via canonical crosswalk, UID2 → ID5 via direct
     identity-graph link. Materially different from `approximated`
     (which implies drift) and from `exact` (which requires same
     system). Lukasz's framing: "similar to temp from C to F."

  3. New `system-reference-conversion.json` (response to "I'm not sure
     if in this case it should be indicated somehow from what to what
     was the conversion and/or how was it done i.e. ID Graph / by name
     / custom" in adcontextprotocol#4616).
     Factors the question correctly: the fidelity enum says "how
     preserved?"; the conversion structure says "via what?". Buyers
     who want the simple signal stop at the enum; buyers who want to
     reason about the federation chain read the conversion block.
     Method enum: id_graph | name_match | crosswalk | custom. REQUIRED
     when fidelity is `converted`; OPTIONAL surfacing when
     `approximated` (so buyers can see WHY).

All three changes preserve the strawman's "shape only, no row-level
adoption" framing. Still a draft / discussion artifact, still cheap to
re-draft if the WG counter-proposes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lukasz-pubx

Round 2 of @lukasz-pubx's feedback on adcontextprotocol#4616 / adcontextprotocol#4622 — three new method
enum values + an optional method_provider field on the conversion
structure.

Method enum extends from 4 → 7 values, ordered by semantic grouping:

  identity-domain:        id_graph, name_match
  structural-mapping:     crosswalk, upscaled (new)
  probabilistic:          inferred (new), projected (new)
  escape hatch:           custom

New value semantics:

  upscaled    — Deterministic aggregation of finer-grained units into
                a coarser system (zip→DMA, county→state). Forward-
                lossless; inverse undefined. For modeled audiences
                where statistical validity depends on the finer
                granularity, deployment fidelity SHOULD be
                `approximated` even when the method is `upscaled`.

  inferred    — ML-predicted or rule-based inference (device-graph
                → IAB taxonomy node; behavior → demographic
                membership). Probabilistic; no per-record correctness
                guarantee. method_provider SHOULD identify the model
                or vendor.

  projected   — Statistical projection from a sample to a population
                (panel-projected reach, currency-projected impressions,
                audience extrapolation). Used primarily for
                measurement currencies and modeled audiences.
                method_details SHOULD describe sample size + weighting
                + confidence intervals where relevant.

method_provider (new optional field):

  Opaque-by-convention string identifying the vendor or organization
  providing the conversion method (e.g. `LiveRamp`, `ID5`, `IAB`,
  `Nielsen`). Consumers MAY branch on well-known provider names but
  the field is not constrained to a closed enum — new providers enter
  the ecosystem additively. Particularly meaningful for `id_graph` and
  `inferred` methods; less load-bearing for `crosswalk` where the
  published mapping is canonical regardless of who hosts it.

Updated examples cover all 7 method values with realistic
method_provider values.

Strawman remains shape-only — no row-level adoption, still draft, still
cheap to re-draft.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@lukasz-pubx lukasz-pubx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

evgen and others added 4 commits May 17, 2026 07:08
…umbrella update

Round 3 of @bokelley's third-sibling framing on adcontextprotocol#4616
(issuecomment-4470049814): the epistemic-model table maps 1:1 onto
delivery measurement, not just signals. The strawman's shape was
already cross-axis, but the descriptions buried measurement as a
secondary axis. Promoting it to first-class.

Changes are description + examples only — no structural change to the
primitive, the fidelity enum, or the conversion schema.

system-reference.json:

  - Description rewritten to explicitly enumerate the four primary
    axes (identity / taxonomy / geographic / measurement). Calls out
    measurement currencies (Nielsen P18-49) AND methodology sources
    (panel / set_top_box / ACR / census / server_logs / SDK per the
    construction-methodology row in adcontextprotocol#4616) as distinct measurement-
    side use cases.
  - `system` examples list extended with `nielsen_p_18_49` and
    `measurement_source` to surface the measurement axis.
  - Two new top-level examples: a measurement currency
    (`nielsen_p_18_49:P18-49:2025`) and a measurement-source row
    (`measurement_source:set_top_box`) — the latter is the canonical
    adcontextprotocol#2041 use case.

system-reference-fidelity.json:

  - Description expanded to name two consumer types: buyer agents
    activating signals, and measurement consumers accepting delivery
    reports. Adds "methodology sources" to the generalized axis list.

system-reference-conversion.json:

  - Description expanded to name the two surfaces (signal deployments
    + delivery measurement). References adcontextprotocol#3877's `completion_source`
    qualifier as shipped prior art for the D3 seller-attested vs
    vendor-attested split.
  - New example: STB measurement projected to Nielsen P18-49
    currency via Samba TV methodology — the measurement-side analog
    of the existing signal-side panel projection example.

Changeset updated to reflect the broadened scope.

Strawman remains shape-only — no row-level adoption, no structural
change to the primitive. adcontextprotocol#2041 / adcontextprotocol#4472 / adcontextprotocol#4475 / identity-substrate
RFCs still adopt independently against whatever shape D1 settles on.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@SimonaNemes endorsed round 2 of the strawman on adcontextprotocol#4616
(issuecomment-4493606229) and offered a sharpening of the inferred /
projected distinction. Both methods can use ML; the distinction is
the LEVEL at which uncertainty operates:

  inferred  = ENTITY-LEVEL attribution from observed signals.
              Answers: "given these clues, who/what is this entity?"
              Uncertainty lies in the correctness of attributes
              assigned to individual entities.

  projected = POPULATION-LEVEL estimation from a sample.
              Answers: "given this sample, what should we expect at
              scale?" Uncertainty lies in the estimate itself, NOT
              in any individual entity's attributes.

Same underlying data may drive both — the distinction is the
question being answered and where the uncertainty lives.

Changes:

  - enumDescriptions for `inferred` + `projected` rewritten with the
    entity-level vs population-level framing and explicit cross-
    reference to each other so consumers see the contrast.

  - New paired example: same device_graph input data routed through
    `inferred` (entity-level attribution with per-record confidence
    0.78) and through `projected` (population-level estimate with 95%
    CI ±0.4pp). Makes the distinction concrete.

  - Changeset updated to reflect the round-4 sharpening + cite
    @SimonaNemes's framing.

Strawman remains shape-only — no row-level adoption, no structural
change. Sharpening the prose so the distinction survives consumer
interpretation across implementer teams.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bokelley posted WG-level acceptance of D1 on adcontextprotocol#4616
(issuecomment-4526559566): "Yes on D1 as a shared primitive for
newly-modeled epistemic rows" — with four substantive tightening
notes before he's comfortable merging. All four addressed in this
revision; the changes are normative tightening, not structural.

1. VERSION SEMANTICS REVERSED (system-reference.json)
   Strawman previously said "treat omitted version as a wildcard
   match against any version." Bokelley: this recreates the fuzzy
   matching problem D1 was supposed to avoid. Reversed.

   Omitted `version` now means UNKNOWN / unpinned — NOT a wildcard.
   Exact equality requires (system, value, version) to all match
   for versioned systems. Row-level schemas MAY declare a system
   version-insensitive (UID2 / RampID etc.); otherwise omitted
   version is a buyer-decision point.

2. CONVERTED FIDELITY TIGHTENED (system-reference-fidelity.json)
   Reserved for deterministic AND row-semantics-preserving mappings.
   Deterministic ≠ lossless preservation: a crosswalk may be
   mathematically deterministic but lose semantic information the
   row's downstream consumers depend on (DMA vs Comscore Market
   methodologies; IAB v2 nodes that split or merge in v3). Sellers
   MUST NOT advertise `converted` when the conversion changes the
   row's meaningful semantics.

3. UPSCALED + CROSSWALK CAUTIONS (system-reference-conversion.json)
   `upscaled` typically pairs with `approximated` fidelity — undefined
   inverse means lost granularity. Only `converted` if the row
   explicitly declares granularity doesn't matter. Same caution for
   `crosswalk` — deterministic mapping is not automatically lossless
   semantic preservation.

4. INTEROP CAVEAT (system-reference.json description)
   Explicit note: the primitive alone does NOT create interop.
   Consuming row-level schemas MUST constrain or document which
   `system` values are meaningful for that row. D1's value is
   consistent SHAPE across rows, not a universal vocabulary.

Plus updated the description to scope D1 explicitly to newly-modeled
rows (per bokelley's "do not replace per-dimension schemas that
already work" guardrail — geo_metros / geo_postal_areas stay where
they are) and added @tescoboy's product-level audience-construction-
metadata row as the fifth D1 customer per bokelley's endorsement.

Strawman remains shape-only — no row-level adoption. After this round
bokelley said he's "comfortable using it as the D1 foundation for
adcontextprotocol#2041, the identity-substrate RFC, and the adcontextprotocol#4472 split."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round 6 picks up the ads.txt-pattern thread on adcontextprotocol#4616 between
@bokelley, @lukasz-pubx, @SimonaNemes, and Addie. Brian's
philosophical question: "Should the protocol carry the weight of
fully describing signals and methodology OR should this be something
that the signal provider handles separately (thinking about how a
media kit carries weight on the inventory side)?"

Addie's recommendation in the thread:
  - Geo identifiers + activation fidelity: KEEP structured (bilaterally
    verifiable; ads.txt pattern).
  - Methodology fields: COLLAPSE to a pointer (link out to attested
    document). One field like methodology_url or audience_model_ref.

The strawman primitives (system-reference + fidelity enum + conversion
structure) already sit in the "keep structured" half of Addie's
recommendation — bilaterally verifiable identifiers and activation-
time facts. The conversion structure adds the round-6 anchor field
for the "link out" half:

  method_doc_url (optional URI) — points at the seller's published
  methodology document (vendor's identity-graph page, published
  crosswalk specification, IAB v2-to-v3 migration map, etc.).
  Strictly informational; buyer agents MAY follow out-of-band to
  verify but MUST NOT branch on its content programmatically.

Why at the primitive layer:
  - Gives downstream row-level RFCs (adcontextprotocol#4472, adcontextprotocol#2041, identity-substrate,
    tescoboy product-level) a canonical place to anchor link-out
    fields. Without it, each row reinvents the field name.
  - Picks up Addie's "collapse to a pointer" recommendation as an
    OPTION at the primitive layer; row-level RFCs that want to require
    methodology disclosure mark this required in their binding.
  - Complements existing method_details (free-text inline) — doc_url
    is the canonical-source pointer; method_details is the
    triage-time elaboration.

Plus a description note that consuming rows MAY add their own
row-level last_updated field on the row itself (per @SimonaNemes:
signal-record freshness IS verifiable even though underlying
methodology freshness isn't).

Updated one example (LiveRamp UID2→ID5 conversion) to demonstrate
the field in context.

Strawman remains shape-only; no row-level adoption. Net delta:
+1 optional field, +1 description sentence, +1 example field.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
"method_details": "Canonical county-union mapping per Nielsen DMA 2024 spec + Comscore Market 2025-Q1 definition."
},
{
"from": { "system": "uid2", "value": "AAAA..." },
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this conversion is not per-signal, it's happening upon signal activation. The problem is that there is not necessarily a signal conversion happening - in the programmatic context, LiveRamp has a graph, the publisher may have a graph, the SSP has a graph, the DSP has a graph. Crap, the agency has a graph. We don't have any idea how many times this conversion is happening, and honestly neither do the participants.

Main point is that we need to separate "what is this signal" from "how is this signal being connected to the delivery surface".

Copy link
Copy Markdown

@lukasz-pubx lukasz-pubx May 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understand why does it matter that SSP, DSP and agency have a graph? isn't this information from singals provider's perspective? We can't control all the hops.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that real chains have multiple parties (LiveRamp graph + publisher graph + SSP graph + DSP graph + agency graph), and no single party sees them all. The strawman as written reads like it's trying to model the full chain — which it can't, and shouldn't pretend to.

Round 7 (ad712795) tightens the description to single-party observable scope explicitly: this structure describes ONE party's observable conversion claim — typically the signals seller's in-agent translation, or the measurement vendor's projection. Downstream conversions are out of scope; the protocol intentionally doesn't pretend the seller can speak to what happens after activation handoff.

Lukasz's framing in this thread is the right one — "we can't control all the hops" is exactly the design constraint. The structure is useful for the conversions one party CAN describe (signal-side identity translation before destination handoff, measurement vendor's panel-to-currency projection). It's not useful for, and shouldn't try to model, the multi-hop chain that follows.

If even single-party conversion descriptions don't earn their keep, then this entire schema collapses to "trust the seller's free-text" and the structure is dead weight. I think the single-party case is genuinely useful (especially for measurement projection and signal-provider in-agent translation), but if you read the multi-hop reality as making even that case suspect, happy to drop the entire conversion structure in round 8 and keep just system-reference + system-reference-fidelity.

"description": "Reference to a value within a named external system. Four primary axes of use: (1) **identity** — UID2, ID5, RampID, hashed-email, MAID; (2) **taxonomy** — IAB Audience Taxonomy, IAB Content Taxonomy, vendor-specific category trees; (3) **geographic** — Nielsen DMA, Comscore Market, OMB MSA, postcode aggregates (NOTE: where per-dimension geographic schemas already exist in AdCP — `geo_metros`, `geo_postal_areas` — they are the right shape for current geo targeting; D1 does not replace them. D1 applies to geographic surfaces that lack a per-dimension schema today); (4) **measurement** — measurement currencies (Nielsen P18-49), methodology sources (panel / set_top_box / ACR / census / server_logs / SDK per the construction-methodology row in #4616), vendor measurement schemes. Provides a single canonical shape `{system, value, version?, name?}` reused across signals, deployments, buy-terms, delivery reporting, and measurement attribution wherever a value is defined against an external reference frame. The `system` axis is intentionally an open string at the primitive level; per-use constraints (closed enums, vendor allowlists) belong in the consuming schema's `oneOf` or `enum`, not here. **Important: the primitive shape alone does NOT create interoperability — consuming row-level schemas MUST constrain or document which `system` values are meaningful for that row.** D1's value is consistent SHAPE across rows, not a universal cross-row vocabulary. Decision D1 from the signal-and-measurement epistemic-model umbrella (see issue #4616); per @bokelley's WG-acceptance comment, D1 is scoped to newly-modeled rows (identity substrate, taxonomy / ranking references, measurement source / methodology, product-level seller-built audience metadata) and does NOT replace per-dimension schemas that already work.",
"type": "object",
"properties": {
"system": {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calling these "systems" is confusing. Ramp ID is clearly an identifier and is defined elsewhere in the protocol. Taxonomies like are too. Not clear to me how this generalized system helps us. We do have enums with the various geo metros providers which makes sense: we know that geo metro must be a supported type there.

Copy link
Copy Markdown

@lukasz-pubx lukasz-pubx May 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could rename system to type? But I think system illustrates the fact that it's "more complicated" than just a label.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair pushback. The primitive's value isn't in wrapping RampID — it's in giving union-axis rows a single shape when the row can carry any of N systems.

Round 7 (ad712795) leads the description with that:

The primitive's purpose is union-axis rows (identity substrate, measurement source, PBA taxonomy) where a single field can carry any of N systems with the same comparator semantics. For single-axis rows where only one system applies, inline per-dimension fields remain simpler — the primitive is overhead. The strawman explicitly does NOT replace per-dimension schemas like geo_metros, geo_postal_areas, or existing ramp_id.

Where the primitive earns its keep (and where it doesn't):

Case Without primitive With primitive
Single-axis row: signal always references RampID → ramp_id field, full stop inline ramp_id: "..." — simple ✅ identity: {system: "ramp_id", value: "..."} — wrapper overhead ❌
Union-axis row: identity substrate can be UID2 OR RampID OR ID5 OR custom oneOf: [{ uid2: ... }, { ramp_id: ... }, { id5: ... }, { custom: ... }] — N inline shapes per row identity: SystemReference — ONE shape ✅
Union-axis row: measurement source can be set_top_box OR ACR OR census OR panel Same N-way oneOf repeated in #2041 Same SystemReference ✅
Union-axis row: PBA ranking reference can be IAB v3 OR vendor-X OR custom audience model Same N-way oneOf repeated in #4472 Same SystemReference ✅

The motivating cases that opened #4616 (#4472, #2041, identity-substrate, @tescoboy's product-level) are all union-axis. If the union-axis case turns out to be thinner than the thread assumed — i.e., if each row really wants a single dominant system and the union case is hypothetical — then the primitive's surface area isn't justified and the right answer is per-row inline shapes plus the row defining its own enum. In that case I'd close this PR and let row-RFCs proceed independently.

But I think the union case is real for at least identity-substrate (no consensus on which graph wins) and measurement-source (the construction-methodology row is genuinely heterogeneous). Curious whether you read those rows as union-axis or as eventually-single-axis when the WG picks winners.

On @lukasz-pubx's system → type rename suggestion: round 7 acknowledges it in a naming note but keeps systemtype is overloaded across AdCP for general type discrimination on discriminated unions, and system carries the connotation we want (named external reference frame with its own lifecycle).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed system carries the "more complicated than a label" connotation. Keeping it — round 7 (ad712795) adds a naming note in the description acknowledging the type alternative was considered but type is overloaded across AdCP for general type discrimination on discriminated unions, while system cleanly says "named external reference frame with its own lifecycle." Thanks for the framing.

…views (round 7)

@bokelley posted two substantive line-level reviews on 2026-05-25
that went deeper than the round-5 tightening notes — they raised
abstraction-level questions about whether the primitive justifies
its surface area.

  1. system-reference.json:8 — "calling these 'systems' is confusing.
     RampID is clearly an identifier and is defined elsewhere in the
     protocol. Taxonomies like are too. Not clear to me how this
     generalized system helps us."

  2. system-reference-conversion.json:78 — "this conversion is not
     per-signal, it's happening upon signal activation. The problem
     is that there is not necessarily a signal conversion happening
     - in the programmatic context, LiveRamp has a graph, the
     publisher may have a graph, the SSP has a graph, the DSP has
     a graph. Crap, the agency has a graph. We don't have any idea
     how many times this conversion is happening, and honestly
     neither do they."

Description-only sharpening (no schema shape changes):

  system-reference.json — Description rewritten to lead with the
  UNION-AXIS value proposition. The primitive's purpose is union-
  axis rows (identity substrate, measurement source, PBA taxonomy)
  where a single field can carry any of N systems with the same
  comparator semantics. Per @bokelley's "RampID is defined
  elsewhere" point: yes, and that's fine — D1 does NOT replace
  per-dimension schemas where they exist (geo_metros, ramp_id,
  existing taxonomy schemas). For single-axis rows, inline per-
  dimension fields are simpler. D1 earns its keep specifically on
  union-axis rows that don't yet have a shape. Without it, each
  union-axis row independently reinvents oneOf discriminators across
  N inline shapes; with it, ONE comparator + ONE extension story +
  ONE schema slot.

  Also addresses @lukasz-pubx's system → type rename suggestion in
  a naming note — keeping `system` because the connotation of
  "named external reference frame with its own lifecycle" is what
  we want vs. `type` which is overloaded across AdCP for general
  type discrimination on discriminated unions.

  system-reference-conversion.json — Description rewritten to
  clarify SINGLE-PARTY OBSERVABLE SCOPE. Real programmatic chains
  have multiple parties (publisher / SSP / DSP / agency / vendor)
  each potentially performing their own conversions; no single
  party observes the full chain. The structure describes ONE party's
  observable conversion (signals seller's in-agent translation in
  the deployment case; measurement vendor's projection in the
  reporting case), NOT the multi-hop chain. Downstream conversions
  are out of scope, observed by other parties; the protocol
  intentionally doesn't pretend the seller can speak to them. Buyer
  agents reading this should understand they're seeing one party's
  view, not the full provenance from origin to activation.

Strawman remains shape-only — no row-level adoption, no field
additions / removals / renames. The schema is unchanged; descriptions
catch up to what the structure actually means.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants