Skip to content

P5-followup: animedex enhance + agg merge-anime (agent-composable search layer) #23

Description

@narugo1992

Goal

Two tightly-coupled follow-ups to PR #22 (search/show annotate-only) — kept together because they share the same agent-friendly philosophy ("provide building blocks, let the caller compose") and the same empirical motivation (the 50-seed eval posted under PR #22 issuecomment-4439876211):

  1. animedex enhance <query> — stateless search-query enhancement — a new top-level CLI command that takes any input string and returns a structured list of query-variant suggestions. No network. No side effects. Agents and humans alike call it to get rewrite candidates and then drive further searches with whichever variants they pick.
  2. animedex agg merge-anime — merge-on-search as a separate tool, not a fan-out side effect — a new CLI command that ingests an existing aggregate search result (JSON) and emits a merged version, reusing _merge_season_items from PR Add calendar aggregate commands #21 verbatim. Keeps search itself annotate-only (PR P5-search-show: add aggregate search and show commands #22 contract), and lets agents opt into merging when they need it.

Both deliberately stay out of animedex search's own code path so that the unmodified-data principle of search (one row per backend, no implicit fusion) stays clean. Composing them is the agent's job.

Refs #1 §7 (Phase 5 closeout — merge-on-search and the search enhancement layer).

Why this slice (and why one issue, not two)

  • They share the design tenet "ship building blocks, not side effects" — both refuse to silently transform inside search. Splitting them into separate issues would force re-stating the empirical motivation twice.
  • They share the empirical evidence already collected on PR P5-search-show: add aggregate search and show commands #22 (the 50-seed eval, the 1,461-pair merge-fitness study, the 10-seed live alignment test).
  • They are independently implementable as two separate small PRs (15.1-respecting); this issue just tracks them as a coordinated slice.

Scope A — animedex enhance <query>

Minimum required surface

  • New top-level Click command animedex enhance <query> [--type anime|manga|character|person|studio|publisher].

  • New Python API animedex.enhance.suggest(query: str, *, entity_type: Optional[str] = None) -> EnhanceResult returning a stable, ordered list of variants.

  • Pure heuristic, fully offline. No HTTP, no fixture lookups, no caches. Output depends only on the input string and the static est_rescue_on annotation table baked into the module.

  • Output shape (one entry per suggestion):

    {
        "input": "Chainsaw Man (Official Colored)",
        "entity_type": "manga",
        "suggestions": [
            {
                "variant_id": "raw",
                "query": "Chainsaw Man (Official Colored)",
                "rationale": "Original input, baseline for the agent's call.",
                "est_rescue_on": ["baseline"]
            },
            {
                "variant_id": "strip_parenthetical",
                "query": "Chainsaw Man",
                "rationale": "Drop trailing (...) / ~...~ / [...] / 【...】 annotation. Specifically helpful on AniList manga.",
                "est_rescue_on": ["manga/anilist (annotation-suffixed titles)", "manga/kitsu (weaker)"]
            },
            { ... }
        ]
    }
  • Variants emitted (already deterministic in the POC at tools/search_eval/enhance.py):
    raw, nfkc, jaconv_norm, kata2hira (CJK only), anyascii (CJK only), unidecode (CJK only), lowercase_strip_punct, first_token, strip_parenthetical, strip_dash_suffix.

  • TTY and --json rendering both honoured per AGENTS §9bis.6.

  • Docstring follows AGENTS §8/§10 with Backend: animedex (local), Rate limit: not applicable, and an --- LLM Agent Guidance --- block describing when to reach for the command.

Out of scope (deliberate)

  • No --probe <backend> mode. A hybrid "make one HTTP call, harvest aliases from the response, suggest them as further queries" would have a side-effect and is naturally an agent-loop workflow ("call search, look at output, call enhance again with the result, call search again"). enhance stays stateless; the agent composes.
  • No automatic execution. enhance does not invoke search itself; it only suggests.
  • No paternal filtering. Every heuristic that produces a distinct, non-empty variant is emitted regardless of --type, with est_rescue_on data informing (not gating) the agent's choice.

Empirical motivation

The 50-seed eval (tools/search_eval/runs/full50/variant_gain.md, also in the PR comment) measured every variant's rescue rate. Two cells dominate:

  • anime/ann + alias_english: raw 18% → any-variant 72% (+54pp) — note: alias_english relies on cross-source data and stays an agent workflow, not a heuristic
  • manga/anilist + first_token: raw 55% → any-variant 97% (+42pp) — pure heuristic, the strongest win enhance directly delivers

Scope B — animedex agg merge-anime

Minimum required surface

  • New CLI command animedex agg merge-anime <input-file> (or read from stdin via -) that:
    1. Reads a JSON aggregate result (the shape animedex search anime ... --json produces).
    2. Projects each row to animedex.models.anime.Anime via the backend's to_common().
    3. Calls animedex.agg.calendar._merge_season_items (or a renamed, generalised wrapper) on the rows.
    4. Emits the merged result on stdout with the MergedAnime schema PR Add calendar aggregate commands #21 established.
  • New Python API animedex.agg.merge.merge_anime(result: AggregateResult) -> AggregateResult.
  • Anime-only this round. Other entity types stay annotate-only (their per-entity scorers would need their own adjudication corpora; out of scope).
  • HTTP-only tests against captured fixtures from PR P5-search-show: add aggregate search and show commands #22's anime aggregate + the season_matrix candidates corpus.

Required shikimori to_common patch (ships with this slice, or in a tiny prelude PR)

Shikimori's anime ID namespace is MAL-derived for entries that exist on MAL. The current to_common() only sets ids["mal"] when myanimelist_id is explicitly returned, which the search endpoint does not include. One-line patch:

# animedex/backends/shikimori/models.py — ShikimoriAnime.to_common
ids = {"shikimori": str(self.id)}
if self.myanimelist_id:
    ids["mal"] = str(self.myanimelist_id)
else:
    ids["mal"] = str(self.id)

This was measured (tools/search_eval/runs/alignment_small/) to raise the Frieren single-word case from 40% to 60% real-world recall with zero false-positive merges across 10 seeds × 5 backends.

Out of scope (deliberate)

  • No automatic merge inside search. The annotate-only contract from PR P5-search-show: add aggregate search and show commands #22 stays. search returns one row per backend, exactly as today; only agg merge-anime produces merged rows.
  • No threshold tuning. Reuse PR Add calendar aggregate commands #21's _MERGE_THRESHOLD = 70 as-is. The 10-seed live alignment test confirmed zero false-positive merges even on sequel-adjacent queries ("Naruto" vs "Naruto Shippuden" never fuse because the year/episodes context subtracts).
  • No manga / character / person / studio / publisher merging. Each needs its own per-entity scorer + adjudication corpus; that is its own future slice with its own abstraction proposal.

Empirical motivation

The merge-fitness analysis on 1,461 high-confidence adjudicated pairs showed _anime_match_score reaches the 70 threshold 100% when both rows carry external IDs, 82% when only title + year + season is preserved, and 0% on bare title. The 10-seed live alignment test on production search output confirmed 100% precision and 40-100% recall (recall variability is entirely upstream-search ranking, not merge function behaviour).

Encouraged exploration

  • A --limit-rows-per-source flag on enhance to cap how many variants are emitted (useful for token-tight agent prompts). Default unlimited.
  • A --source <comma-separated> flag on agg merge-anime to restrict which backends participate (matches animedex search's flag).
  • A dry-run mode for agg merge-anime that reports merge_diagnostics (which pairs would fuse, with score breakdown) without actually emitting the merged rows. Useful for agents inspecting merge decisions.
  • Surface the shikimori-patch idea to other backends. Inspect whether Kitsu's anime relationships.mappings could give us the equivalent of idMal without a second HTTP call.
  • Document the agent loop pattern in docs/source/tutorials/ (or a new docs/source/research/): how to compose enhancesearchagg merge-anime from an LLM-agent prompt, with worked example.

Where the harness already lives

A complete, runnable harness exists under tools/search_eval/ on a working tree (not yet committed to main; was kept out of PR #22 per §15.1):

tools/search_eval/
├── enhance.py              POC of Scope A (already produces the JSON shape above)
├── alignment_small.py      Scope B's live validation harness (10 seeds × 5 backends)
├── merge_fitness.py        1,461-pair re-scoring under search-shaped inputs
├── seeds.py / variants.py / scoring.py / run_eval.py / variant_gain.py
└── runs/full50/, runs/merge_fitness/, runs/alignment_small/   ← committed-fixture-grade artefacts

Either (a) the implementer rebuilds the production version under animedex/ from scratch using tools/search_eval/ as reference, or (b) we file a separate scoped PR to land tools/search_eval/ under tools/ first so it can be re-run during the follow-up. Up to the implementer.

Abstraction questions to settle before coding

  1. Where does enhance live in the source tree? Recommended: animedex/enhance/ as a sibling to animedex/agg/, with _heuristics.py (pure variant generators) and __init__.py (public suggest() + the est_rescue_on static table). Entry under animedex/entry/enhance.py.
  2. Where does merge-anime live? Recommended: animedex/agg/merge.py as a sibling to agg/search.py and agg/show.py; CLI entry at animedex/entry/agg_merge.py or merged into a new animedex agg <subcommand> group.
  3. The shikimori to_common patch — ship inside the merge-anime PR or as a tiny prelude? Recommended: tiny prelude (5-line model patch + one regression test) so the merge-anime PR stays focused on the new code path.
  4. Should enhance register itself as an MCP tool too? Recommended yes; agent friendliness is the whole point.

Verification checklist (self-check before requesting review on the implementer's PR)

animedex enhance

  • animedex enhance "Chainsaw Man (Official Colored)" --type manga --json returns at least raw, strip_parenthetical → "Chainsaw Man", first_token → "Chainsaw", lowercase_strip_punct.
  • animedex enhance "葬送のフリーレン" --type anime --json returns kata2hira, anyascii, unidecode variants.
  • animedex enhance "Frieren" returns at least 2 entries (raw + lowercase_strip_punct).
  • Default rendering and --json rendering both tested (AGENTS §9bis.6).
  • HTTP-mock tests with HTTP disabled at the seamenhance is offline-only.
  • Docstring policy lint green (Backend: / Rate limit: / --- LLM Agent Guidance --- ... --- End ---).

animedex agg merge-anime

  • On a fixture-derived AggregateResult containing AniList+Jikan+Kitsu+Shikimori rows for Frieren, command emits one merged row with all four backend payloads under records.
  • On a "Naruto" + "Naruto Shippuden" cross-seed result, the command produces two merged rows, not one (precision test).
  • Shikimori-only rows merge correctly thanks to the idmal ID prelude patch.
  • On non-anime aggregate input, command exits with a clear "merge-anime only supports type=anime; manga / character / person / studio / publisher fall through annotate-only" error.
  • Reproduces the 10-seed alignment_small.py numbers under fixture-replay (3/10 perfect, 7/10 partial, 0/10 false-positive).

Cross-cutting

  • CLI tested in both --json and the default TTY path with isatty()=True forced (AGENTS §9bis.6).
  • HTTP is the only mock seam — responses.RequestsMock against captured fixtures; no monkeypatch.setattr(animedex.*, ...) above the wire (AGENTS §9bis.1).
  • _BACKEND_POLICY in animedex/entry/_cli_factory.py gains entries for both top-level commands; \f cutoff convention honoured.
  • make rst_auto regenerates the API docs under docs/source/api_doc/.
  • _SELFTEST_TARGETS in animedex/diag/selftest.py registers every new animedex.enhance.* and animedex.agg.merge module.
  • grep -rE 'Phase [0-9]|AGENTS[. ]§|Reviewer review' animedex/ tools/ returns zero matches (AGENTS §14).
  • No new entry in animedex/transport/read_only.py (these are derived commands, not new backends).

Load-bearing reminders

  • Search stays annotate-only. enhance and agg merge-anime are deliberate building blocks the agent composes; neither lives inside animedex search's code path. This is the agreement from PR P5-search-show: add aggregate search and show commands #22's scope clarification.
  • Inform-do-not-gate (AGENTS §0). enhance always emits every applicable heuristic; the est_rescue_on field is informational data the agent reads, not a gate. agg merge-anime always emits the merged shape requested; it does not refuse to merge "low-confidence" pairs by silently dropping them — instead it surfaces them via merge_diagnostics.
  • §13 lossless rich model. When agg merge-anime projects rich rows to Anime for scoring, the original rich payloads stay accessible under MergedAnime.source_payloads (the convention PR Add calendar aggregate commands #21 established).
  • HTTP-only mock seam (§9bis). Both deliverables must come with responses.RequestsMock-driven tests against captured fixtures; no above-the-wire monkeypatch.

Parallelism

These two scopes share no source files; they can be implemented as two independent PRs:

  • PR Aanimedex enhance (~200 lines of source + tests + docs + tutorial).
  • PR B — shikimori to_common prelude + animedex agg merge-anime (~150 lines of source + tests + docs + tutorial).

If the implementer prefers, both can ship in one PR titled e.g. dev(...): add enhance command + merge-anime aggregate tool. Per AGENTS §15.1, one PR with a single coherent theme ("agent-composable search layer") is acceptable, but two narrower PRs are easier to review.

Cross-references

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions