Skip to content

P5-search-show: animedex search + animedex show (entity infrastructure + multi-source flagship) #19

Description

@narugo1992

Goal

Land Phase 5's flagship entity commands: animedex search <type> <q> and animedex show <type> <prefix:id>. The two commands share one substrate — a prefix-encoded ID parser, a type→backend routing table, a multi-source fan-out helper with per-source fallback, and a common output schema with mandatory source attribution. Both ship in this PR because separating them would duplicate that substrate and force a later reconciliation. After this PR, users have a single command surface that asks "what's this anime called Frieren across every catalogue we cover" and answers it without the user needing to know which backend has what.

Refs #1 §7 (Phase 5 checklist).

Why this slice

search and show are tightly coupled by design:

  • They share the type vocabulary (anime / manga / character / person / studio / publisher).
  • They share the prefix:id schemasearch outputs _source and a backend-native ID; show accepts that backend-native ID with a prefix and routes back to the same backend.
  • They share the type → backend routing table — when type is anime, the same set of backends gets queried by search (fan-out) and routed to by show (single-source).
  • They share the multi-source fan-out helper — for search, all type-supporting backends are queried in parallel; for show, the prefix selects exactly one backend.

Splitting them into two PRs would mean each PR re-implementing half of this substrate, then merging two divergent implementations. One PR keeps the design coherent.

The aggregate command group is also the first piece of code that introduces the project's flagship multi-source value proposition: animedex search anime "frieren" returning rows from anilist + jikan + kitsu + shikimori, with cross-source merging on rows the matcher confidently identifies as the same anime (shared external IDs first, then a deterministic fuzzy comparison) and source attribution preserved through a sources / records map on every merged row. This commits the project to "wide source coverage with mandatory attribution and intelligent cross-source matching" as the user-facing shape — the merged row is the value, not a step the user has to assemble themselves with jq. Single-source rows that don't match anything on the other backends remain visible with their own attribution.

The merging contract here is the same one P5-calendar (issue #18) establishes for season. Both commands must agree on the merge rule and the merged-row schema; the abstraction proposal step below coordinates that agreement.

Scope (minimum required)

animedex search <type> <q> [--limit N] [--source <backends>]

  • <type> ∈ {anime, manga, character, person, studio, publisher}. The positional is required — bare animedex search "frieren" exits with a clear "missing TYPE" Click error, not a guess.
  • <q> is the search string, passed verbatim to each upstream's search endpoint.
  • --limit N is per source (default 10). The merge step runs after the per-source limit, so a --limit 5 against 4 sources can return between 5 and 20 merged rows depending on overlap. To globally cap, use --jq 'limit(N; .)' downstream.
  • --source <comma-separated> collapses the fan-out to a subset of backends. E.g. --source anilist,jikan skips kitsu / shikimori.

The type → backend mapping (minimum required surface, derived from the high-level helpers already shipped on main):

Type Backends fanned out by default
anime anilist, jikan, kitsu, shikimori
manga anilist, jikan, kitsu, mangadex, shikimori
character anilist, jikan, kitsu, shikimori
person anilist, jikan, kitsu, shikimori
studio anilist, jikan, kitsu, shikimori
publisher shikimori (only backend with a publisher catalogue today)

If a backend in the fan-out fails (rate-limit, 5xx, network), the fallback policy (same as P5-calendar) applies: per-source try/except, the successful sources' rows still come back, the failed sources surface in an envelope sources: {...} map with {status, reason, http_status, message, duration_ms}. Stderr emits one inform line per failed source. Exit code 0 if at least one source succeeded; non-zero only when every source failed.

animedex show <type> <prefix:id>

  • <type> is required, same vocabulary as search.
  • <prefix:id> is the project's canonical entity reference. Examples:
    • anilist:154587
    • mal:52991 (alias for jikan, because Jikan wraps MyAnimeList)
    • kitsu:43534
    • shikimori:52991
    • mangadex:dc8bbc4c-eb7a-4d27-b96a-9aa8c8db4adb
  • Single-source dispatch: the prefix picks exactly one backend. The command routes to that backend's show() / manga_show() / character() / etc. helper per the type.
  • Bad combinations (e.g. show publisher anilist:1 — anilist has no publisher catalogue) exit with a clear "type 'publisher' is not supported by backend 'anilist'" error and a one-line list of supported backends for that type.

Prefix → backend mapping (minimum required)

Prefix Routes to backend module Notes
anilist animedex.backends.anilist numeric ID
mal animedex.backends.jikan numeric ID; alias for jikan
jikan animedex.backends.jikan numeric ID; same as mal
kitsu animedex.backends.kitsu numeric ID
shikimori animedex.backends.shikimori numeric ID
mangadex animedex.backends.mangadex UUID string
ann animedex.backends.ann numeric ID

Treat anidb: as a documented-but-deferred prefix that raises a typed informative error pointing at Phase 7. Do not silently swallow anidb: references.

Encouraged exploration

The minimum surface above is the floor. AGENTS §15.2 ("read-only API surface coverage") asks for everything the upstreams support that fits the pattern, not the cherry-picked subset. Encouraged additions:

  • More type entries: topic (shikimori), club (shikimori), report (ann) — only if the high-level helper exists and the upstream's search returns a clean list shape. Topics / clubs are community resources rather than canonical entities; if they look out of place under search anime, that's the right reason to keep them as future work.
  • Reverse-prefix aliases: myanimelist: as a long-form alias for mal:, etc. Cheap and friendly.
  • A --first flag that returns only the first result row per source (useful for piping into show).
  • Concurrent fan-out (concurrent.futures.ThreadPoolExecutor(max_workers=len(sources))) so a 4-source search anime completes in max-of-4 time rather than sum-of-4 time. Default to concurrent if the abstraction proposal sign-off accepts it.

API documentation entry points

The high-level helpers this PR composes already exist on main. The relevant module heads:

  • AniList: animedex/backends/anilist/__init__.pysearch() / show() / character() / staff() / studio().
  • Jikan: animedex/backends/jikan/__init__.py — same shapes. Note Jikan's staff is per-anime; the top-level person endpoint is person().
  • Kitsu: animedex/backends/kitsu/__init__.py.
  • Shikimori: animedex/backends/shikimori/__init__.py — already exposes search, show, manga_search, manga_show, ranobe_*, club_*, publishers, people_search, person, characters, staff per PR dev(narugo1992): add shikimori and ann high-level backends #15.
  • MangaDex: animedex/backends/mangadex/__init__.py — for type=manga fan-out, this is the manga-native catalogue (uses UUID IDs, not numeric).
  • ANN: animedex/backends/ann/__init__.pysearch() returns multiple anime; show() accepts numeric ID.

If a helper is missing for a particular type/backend combination listed in the type table above, that's a real gap and ought to be filled in this PR (per §15.2). Most of the gaps were closed in PR #15's shikimori expansion; spot-check before assuming the helper exists.

Substrate touch points (read carefully)

This PR introduces the project's first aggregate-shape layer. Before writing implementation code, post an abstraction proposal as a comment on this issue and wait for an explicit external sign-off — sign-off is a recorded maintainer reply, not a self-reply (AGENTS §15.5; see the merged PR #14 issue-#11 comment and the merged PR #15 issue-#12 comment for the well-formed proposal shapes; both received an explicit maintainer "+1" before implementation began).

The proposal should answer at least:

  1. Where does aggregate code live? Recommended: animedex/agg/ as a new sibling to animedex/backends/. The module split inside:

    • animedex/agg/_prefix_id.pyparse(prefix_id: str) -> (backend: str, id: str) and reverse; also the prefix → backend module name table.
    • animedex/agg/_type_routes.pybackends_for_type(type: str) -> List[str] and the type vocabulary.
    • animedex/agg/_fanout.py — the multi-source fan-out helper. (If P5-calendar lands first and puts this helper in the same place, this PR reuses it; merge the proposal accordingly.)
    • animedex/agg/search.py, animedex/agg/show.py — the high-level command implementations.
    • animedex/entry/search.py, animedex/entry/show.py — the Click entry modules.
  2. The output schema for search. Each row carries the rich-model dump from its originating backend (e.g. a row from anilist is the full AnilistAnime.model_dump(by_alias=True), not a lowest-common-denominator projection). The row also carries a top-level _source (matching the existing source_tag convention) and a _prefix_id (e.g. anilist:154587) so the user can feed search output directly into show. Confirm the rich-model-preserved approach in the proposal.

  3. Common-model projection for TTY rendering. JSON output keeps the full rich row. TTY output renders a compact table (title / score / status / [src:]) using to_common() projections for display. Confirm the renderer path; this is the only place lossy projection is acceptable.

  4. Fan-out strategy: concurrent vs sequential. Up to 5 backends per search call (manga case: 5 sources). Concurrent (ThreadPoolExecutor) is the natural choice; rate-limit buckets are per-backend so there is no contention; the existing transport stack is thread-safe via requests.Session per call. Propose the default with rationale.

  5. show with unsupported (type, backend) combinations. E.g. show publisher anilist:1. Two options: (a) exit with a Click error before any network call; (b) attempt the upstream and return whatever 404 it gives. (a) is more informative — propose it. The error message should name the type, the offending backend, and the list of backends that do support the type.

  6. Conflict with P5-calendar's _fanout.py. If P5-calendar merges first, this PR consumes its animedex/agg/_fanout.py verbatim. If this PR merges first, P5-calendar consumes ours. The proposal step coordinates which PR's fan-out helper definition is canonical; the helper itself should be generic (no search-specific assumptions baked in).

  7. anidb: prefix handling. The advisory classifier should recognise the anidb: prefix and raise a ApiError(reason="auth-required") or a new reason like "deferred-feature" with a message such as "ANN ANIDB / AniDB high-level helpers are not yet shipped; use animedex api anidb /... to call the raw passthrough once it lands". Settle the reason string in the proposal.

If the proposal you write diverges materially from these defaults, that is fine; the proposal exists to surface the trade-off in writing before code lands.

Fixture capture

Fixture corpora for each fan-out source already exist on main (see test/fixtures/<backend>/). For search and show:

  • Happy path — reuse existing per-backend search/show fixtures. For search anime "frieren", expect to compose fixtures from test/fixtures/anilist/anime_search/01-frieren.yaml, test/fixtures/jikan/anime_search/01-frieren-52991.yaml (or equivalent), test/fixtures/kitsu/anime_search/01-frieren.yaml, test/fixtures/shikimori/animes_search/01-frieren.yaml. Spot-check each exists; if a probe is missing for a type+backend combo, capture one as part of this PR using tools/fixtures/run_<backend>.py.
  • Partial failure — synthetic 429 / 5xx fixtures using the same hand-edit technique as P5-calendar (edit a captured fixture's response.status and add response.captured_from: synthetic-429 metadata).
  • Total failure — synthetic 5xx across every fan-out source for one type.

show reuses single-source per-backend show fixtures; no fan-out fixture needed except for the prefix-parser tests.

Document capture date and synthetic-fixture notes in the PR body (AGENTS §15.5).

Verification checklist (self-check before requesting review)

Search

  • animedex search anime "frieren" fans out to all 4 anime-supporting backends; envelope contains rows from each, every row has _source and _prefix_id.
  • animedex search manga "berserk" fans out to all 5 manga-supporting backends including mangadex.
  • animedex search character "frieren" fans out to all 4 character-supporting backends.
  • animedex search person "miyazaki" fans out to all 4 person-supporting backends.
  • animedex search studio "ghibli" fans out to all 4 studio-supporting backends.
  • animedex search publisher "kodansha" is single-source on shikimori; envelope's sources map contains only shikimori.
  • animedex search (no type) exits with a Click missing-argument error mentioning the supported types.
  • animedex search badtype "x" exits with a clear "unknown type" error listing valid types.
  • --source anilist,jikan collapses the fan-out to the named subset; other backends do not appear in the sources map.
  • --limit 5 returns at most 5 rows per source.
  • Partial failure: synthetic-429 fixture for anilist + healthy fixtures for jikan/kitsu/shikimori; the command exits 0, returns rows from the 3 healthy sources, stderr names anilist's failure.
  • Total failure: synthetic 5xx for every fan-out source; command exits non-zero, stdout is an empty-items envelope with sources showing every status as failed.

Show

  • animedex show anime anilist:154587 routes to anilist.show and returns the rich anilist record with source attribution.
  • animedex show anime mal:52991 routes to jikan.show(52991).
  • animedex show anime jikan:52991 also routes to jikan (mal/jikan alias).
  • animedex show manga mangadex:<uuid> accepts the UUID format.
  • animedex show character anilist:206439 routes to anilist.character.
  • animedex show person shikimori:1870 routes to shikimori.person.
  • animedex show studio anilist:21 routes to anilist.studio.
  • animedex show publisher anilist:1 exits with a clear "type 'publisher' is not supported by backend 'anilist'; supported backends: shikimori".
  • animedex show anime badprefix:1 exits with a clear "unknown prefix" error.
  • animedex show anime anidb:42 exits with the Phase-7-deferred informative error message.
  • animedex show anime anilist:abc exits with a clear "ID is not numeric for backend 'anilist'" error before any HTTP call.
  • animedex show anime anilist:9999999999 (valid format, 404 upstream) propagates the upstream 404 cleanly.

Cross-cutting

  • CLI tested in both --json and the default TTY path with isatty()=True forced (AGENTS §9bis.6).
  • HTTP is the only mock seam — responses.RequestsMock against captured/synthetic fixtures; no monkeypatch.setattr(animedex.backends.<x>, ...) above-the-wire shortcuts (§9bis.1).
  • No new entry in animedex/transport/read_only.py — these commands compose existing backends, not new ones.
  • _BACKEND_POLICY in animedex/entry/_cli_factory.py gains entries for the two aggregate commands if registered as top-level Click groups; \f cutoff convention from §10 step 6 honoured.
  • make rst_auto regenerates docs/source/api_doc/agg/ and the diff is committed.
  • Tutorial entries at docs/source/tutorials/aggregate.rst (or split per command) with at least one runnable search example, one show example, and one example showing the partial-failure shape.
  • _SELFTEST_TARGETS registers every new animedex.agg.* module.
  • Pre-flight (AGENTS §15.4): full sequence green; grep -rE 'Phase [0-9]|AGENTS[. ]§|Reviewer review' animedex/ tools/ returns zero.

Load-bearing reminders

  • Lossless rich model under merging — when search groups multiple backends' rows into one merged entry, each upstream's full rich-model dump is preserved under the merged row's sources / records map. §13 still applies: no upstream-visible field disappears. The merged row's compact TTY rendering may project a common shape for the eye, but the JSON path keeps each upstream's full rich record. Both score values stay visible under the merged row (Phase 5 contract: do not average; surface both).
  • Cross-source merging is the user-visible default, not an opt-in. Matching is conservative: external IDs first, then a deterministic fuzzy comparison; a single threshold cutoff determines when two upstreams' rows are merged into one entry. Rows that do not meet the threshold remain as their own single-source entries — they are not silently dropped. The merge is what the multi-source search value exists to produce.
  • §0 inform-not-gate on degraded sources — failed sources never crash the command. The user always gets the data the healthy sources returned; the failed sources surface as stderr inform lines and a structured sources envelope entry.
  • §15.2 read-only surface coverage — type → backend mapping covers every backend that genuinely has the entity. Spot-check before deferring a backend; "I didn't have time" is not a valid reason to drop a backend from the table.
  • HTTP-only mock seam — tests load real captured fixtures (and the documented synthetic 429/5xx fixtures) through responses.RequestsMock. No project-internal monkeypatch.setattr.

Parallelism

This issue can be picked up by codex while the P5-calendar issue runs in parallel. They share zero source files; the only conflict point is the registration line in animedex/entry/__init__.py (or wherever top-level Click commands are wired) where each PR adds its own subcommand — a trivial textual merge.

The fan-out helper at animedex/agg/_fanout.py is contributed by whichever PR merges first. The abstraction proposal step on both issues should reach agreement on the helper's signature before either PR writes code; if there is uncertainty, the first-to-merge PR's shape is canonical and the second PR refactors to consume it.

The P5-crossref issue depends on this PR's _prefix_id.py parser and waits for this PR to merge.

PR body template (what to include)

When you open the PR, please include:

  • Summary — one-paragraph overview of the new commands, the substrate they introduced (prefix:id parser, type routes, fan-out helper), the fallback contract.
  • Demos — TTY GIFs for both search and show. For search, ideally a captured run that shows multi-source attribution (rows from 4 sources with [src:] tags). For show, demonstrate one anime + one character + one studio command in the same recording.
  • Examples and expected output — copy-pastable command lines + jq-extracted expected output. Include one partial-failure example.
  • Fixture notes — capture date in UTC; explicit synthetic-fixture disclosure; proxy notes if any.
  • Verification — tick the checklist; for each item write done / partial / deferred with a one-line reason.
  • Abstraction proposal link — link to the comment on this issue where you settled the substrate design, plus the reviewer comment that signed off (mandatory per §15.5).

Out of scope

  • --no-merge / inspect-raw-fan-out toggle (not required; merged rows keep per-upstream rich records under the sources / records map, so a caller who needs the raw upstream rows reads them from there).
  • Bare animedex search "frieren" without explicit type — by design.
  • animedex show <prefix:id> without type — by design (same ID space can encode anime or manga on most backends).
  • animedex crossref — that's P5-crossref.
  • AniDB integration for anidb: prefix — Phase 7.
  • Calendar commands (season, schedule) — P5-calendar.
  • Filters like --genre, --year, --studio on search — future extension.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions