Skip to content

Per-source matcher fixtures convention #94

Description

@koinsaari

Goal

Establish the convention that every ingested source ships its own fixture set under internal/sources/<name>/testdata/, covering both adapter behaviour and matcher realism on data shaped like that source.

Why

The source-agnostic fixtures in internal/identity/testdata/ (see #73) test the matcher algorithm. They cannot test:

  • Whether the source's adapter correctly translates upstream rows into identity.Record values (handling that source's quirks: name conventions, category strings, address-tag patterns, missing fields).
  • Whether the matcher gets that source's actual data right at acceptable precision/recall.

Centralising those tests under internal/identity would hide source-specific concerns in a shared file. Each source owns the tests for its own quirks.

Scope

Convention (to be applied per source when each source is added):

  • internal/sources/<name>/testdata/adapter_fixtures.json — given this upstream row, the adapter produces this identity.Record.
  • internal/sources/<name>/testdata/match_corpus.json — records sampled from the source paired with nearby OSM places + ground-truth labels. Used to measure aggregate precision/recall for that source.

Out of scope

  • Building these for any specific source today. This issue documents the expectation; each concrete source's implementation issue picks them up.
  • The source-agnostic algorithm regression file (Expand matcher regression fixtures #73).

Acceptance

  • First non-OSM source implementation that lands establishes the convention by example (creates the two files for that source).
  • A short note in internal/sources/README.md points to that source as the pattern to follow.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions