Skip to content

Port: LitFinder v1.4.6 (bug fixes and features, no plugin system)#2

Merged
spin-drift merged 24 commits into
mainfrom
port-litfinder-1-4-6
Jun 14, 2026
Merged

Port: LitFinder v1.4.6 (bug fixes and features, no plugin system)#2
spin-drift merged 24 commits into
mainfrom
port-litfinder-1-4-6

Conversation

@spin-drift

Copy link
Copy Markdown
Owner

Port: LitFinder v1.4.6 bug fixes and features (excluding rebrand and plugin system)

Ports 22 commits from NemesisHubris/litfinder's v1.4.6 release (AGPL-3.0), preserving original authorship via git cherry-pick. LitFinder is a community fork of calibrain/shelfmark; this PR pulls in the substantive fixes and features while leaving the LitFinder rebranding and the runtime custom-source plugin system out.

Upstream issue fixes

The following commits address open issues on the upstream calibrain/shelfmark tracker:

Additional fixes

  • fix(abb): info hash validation with magnet-link fallback; exact-phrase fallback extended to manual queries; default language to en when ABB listing has none, preventing valid results from being hidden
  • fix: Anna's Archive title parser now handles nested edition spans and filters lgli catalog descriptor entries (e.g. "Book/Online Audio") that were polluting results
  • fix: Python 2 except syntax across 27 filesexcept X, Y:except (X, Y): SyntaxErrors that prevented affected modules from importing at runtime

Deliberately not ported

  • LitFinder rebranding (UI strings, Apprise app ID, logo, README, license switch to AGPL-3.0) — these tie the codebase to the LitFinder identity, which is out of scope for a fork keeping the Shelfmark name.
  • Custom-source plugin system (064f20c and follow-ups) — a significant feature in its own right; deserves a focused PR and review of its own rather than landing as part of a port pass.
  • cbcd9ad (book-languages.json gitignore restore) — addresses a gitignore/build interaction specific to LitFinder's repo layout; the file isn't dropped in this tree.
  • 393f9a4 (destination.py Py2 syntax) — the except in question lives in a LitFinder-specific code path that doesn't exist here; the broader f3689df sweep already covers the cases that do exist in shelfmark.
  • 20f6532 (transmission stopped-state fix) — the same root issue was independently resolved by fix(transmission): handle completion when seed ratio is 0 calibrain/shelfmark#1023, which is already in main.
  • LitFinder-specific test environment — tailored to LitFinder's CI and local-dev setup; doesn't translate cleanly to this fork.

New features

  • Multi-variant title search — when a title includes a series tag, subtitle, or genre descriptor (e.g. Dune (Dune Chronicles, #1), Project Hail Mary: A Novel), a clean short-form query is generated first and falls back to the full title, significantly improving hit rates on IRC, ABB, Prowlarr, and Anna's Archive
  • Multi-book flat folder grouping — flat audiobook downloads containing multiple books are split into per-book subfolders before ABS delivery
  • Fuzzy text matching utility — tolerant title/author/ISBN comparison for result matching
  • Noop "Leave in Place" output handler — download files without moving or renaming
  • Admin display name in activity feed — admins see which user triggered each download

Verification

  • Backend: 1945 passed, 96 skipped (1 preexisting failure on seleniumbase-dependent test in local venv, unrelated to this port)
  • Frontend: typecheck, lint, format clean; 120/120 unit tests passing
  • Minor follow-up commits in this branch: dropped orphan patches/ files (LitFinder's internal mirror-copy convention, unused without the plugin system), reformatted one file with project's oxfmt rules, and updated test_scraper.py info-hash fixtures to valid hex (the new ABB validation correctly rejects the pre-existing non-hex placeholder strings).

Credit

Per-commit authorship preserved by cherry-pick; the fix: three upstream bugs commit ports the logic but reverts Apprise app-id/logo strings from "LitFinder" back to "Shelfmark" (noted in the commit body, with Co-Authored-By trailer). All other ported commits are unmodified from upstream. Thanks to @NemesisHubris for the work.

NemesisHubris and others added 24 commits June 14, 2026 00:02
- Add info hash validation with magnet link fallback in scraper
- Extend exact-phrase fallback to manual queries, not just auto-generated ones
- Default language to 'en' when ABB listing has no language field, preventing
  valid results from being hidden by the language filter
…k label, notification proxy

- calibrain#999: strip query string/fragment from mirror URLs in normalize_http_url
  so appending search paths doesn't produce malformed URLs
- calibrain#1025: add RTORRENT_AUDIOBOOK_LABEL setting; falls back to book label if unset
- calibrain#956: inject configured proxy env vars before Apprise dispatch so
  Telegram/Discord notifications respect the app proxy setting

Cherry-picked from NemesisHubris/litfinder@8e854f9 with the Apprise app_id,
description, and logo-URL strings reverted from "LitFinder" back to "Shelfmark".

Co-Authored-By: NemesisHubris <155838970+NemesisHubris@users.noreply.github.com>
Cover the specific scenarios that were broken:
- normalize_http_url: 5 cases for query/fragment stripping (calibrain#999)
- RTorrentClient: audiobook label selection and book-label fallback (calibrain#1025)
- TransmissionClient: stopped status treated as complete (seeding ratio fix)
- _apprise_proxy_env: http/socks5 proxy injection and env-var preservation (calibrain#956)

These tests would catch any future regression to the exact failure modes
that motivated each fix.
All instances of `except X, Y:` (Python 2 syntax) replaced with
`except (X, Y):` (Python 3). These were SyntaxErrors that would
prevent the affected modules from importing at runtime.
…fails

validate_destination() now tracks whether it created the directory
itself. If the write probe then fails, the empty directory is removed
so nothing is orphaned on disk.
handleCancel only called fetchStatus(), leaving a gap where the item
had left the live queue but not yet appeared in the DB snapshot.
Both are now refreshed in parallel so cancelled items stay visible.
…t infinite loop

_extract_slow_download_url was calling _get_download_url recursively
after each countdown wait with no depth limit. Pages that repeatedly
return a countdown (common through Tor) caused an infinite loop.

Now retries are bounded by _AA_COUNTDOWN_MAX_RETRIES (3). Also fixed
tor.sh rotation_monitor sending pkill -HUP twice in one cycle when
DNS resolution failed.
When DIRECT_DOWNLOAD_LANGUAGE_FROM_PATH is enabled, parse the file
path column in AA search results (e.g. lgli/N:\...\[BD FR] Book.cbz)
to infer language when AA's own metadata is missing or unknown.

Detection priority:
1. Explicit bracket tags: [FR], [BD FR], [En]
2. Keyed markers: "BD FR", "language: fr"
3. Full language names: "french", "deutsch"
4. Loose 2-3 char codes (ambiguous ones like "en"/"de" require
   bracket or key context to avoid false positives)

When enabled with a language filter, the server-side &lang= parameter
is suppressed and filtering is done locally so lgli files without AA
language metadata are not excluded before the path can be inspected.

Also relaxes _parse_search_result_row to only require title + format,
allowing sparse lgli rows (missing author/publisher/year) to pass through.
Propagates display_name alongside username through the full stack so the
activity sidebar shows friendly names instead of raw usernames. Backend
enriches queue status, download history, and request snapshots. Frontend
uses display_name in meta lines, the user filter dropdown, and filter
tooltips, falling back to username when display_name is absent.

Also adds relative timestamps (e.g. "2h ago") beneath each activity card's
meta line.
Adds a new output mode that skips post-download file moving entirely —
the downloaded file stays wherever it landed in the temp directory.
Useful for testing and for setups where an external process handles
file placement.
…arison

Ported from InfiniteAvenger/shelfmark fork. Provides shared token normalization,
stopword filtering, fuzzy title matching with configurable threshold, author
surname extraction, and ISBN normalization.
…S delivery

When a torrent drops a flat mix of chapter M4Bs (Book 4 as 33 files) and
standalone M4Bs (other books) into one directory, ABS treats every file as
a separate library item. This detects when files share a common chapter-number
prefix, groups them per-book, and routes each group to its own subfolder so
ABS sees exactly one folder per book.

Falls back to existing part-numbering logic when no chapter-group pattern is
detected (all-standalone or untitled files), preserving prior behavior.
…atching

Books with subtitles, series info, or marketing fluff in the title
(e.g. "Dune (Dune Chronicles, #1)", "Project Hail Mary: A Novel") were
sending the full string to every source. IRC does near-exact filename
matching and returned nothing. ABB similarly failed.

Add generate_title_search_variants() in text_match.py that strips
common suffix patterns (parenthetical series, volume suffixes, genre
descriptors, long colon subtitles, em-dash subtitles) to produce a clean
short form. The search plan now generates [short_title, full_title] as
variants so sources try the clean title first with the full title as
fallback. HTTP sources (ABB, Prowlarr) try all variants with stop-on-first
or merge-all at no meaningful extra cost. IRC tries at most 2 sequential
queries.
Add patterns for:
- (N of M) part indicators, mid-string (e.g. 'Title (1 of 2) Series 2')
- Bare 'Book N' suffix without separator ('The Hunter's Code Book 9')
- Bare '#N' suffix without parens ('Law of the Jungle calibrain#13')
- Square bracket content anywhere ('[Dramatized Adaptation]')

Reorder patterns so long colon subtitles are stripped before volume
suffixes, fixing cases like 'Shadows of Sparta: A Greek Mythology
Romantasy The Spartan Flame Trilogy, Book 1' → 'Shadows of Sparta'.
Switch to multi-pass stripping so compound suffixes are fully removed.
… volume suffixes

Add biography/autobiography/collection/anthology to genre subtitle pattern.
Add edition subtitle pattern for '25th Anniversary Edition', 'Revised Edition' etc.
Fix volume suffix pattern to handle 'Part 2 of 3' (was stopping at first token).
…age titles

AA changed their HTML structure — related-edition titles are now nested as
child spans inside the main title span. Using get_text() on the outer span
concatenated everything into a single garbled string. Fix extracts only the
direct NavigableString children. Adds a filter to skip lgli catalog-format
descriptor entries like "Book/Online Audio".
These were pulled in as duplicates of the real source files when
cherry-picking the display-name and noop-output features. Without the
custom-source plugin system (litfinder's bucket 6), the patches/ tree
is unreferenced by any code path and just clutters the repo.
Cherry-picked code from NemesisHubris/litfinder used a slightly different
formatting style; this re-applies the project's oxfmt rules.
The fix(abb) port (f189a5f) added SHA-1/SHA-256 hex validation on info
hashes. The existing test fixtures used 'ABC123DEF456GHI789...' and
'ABC 123 DEF 456', neither of which is valid hex — pre-fix they passed
trivially, post-fix they (correctly) fail validation.

Replaced with valid 40-char hex strings; assertions still match on the
partial 'ABC123DEF456' prefix so the cleanup-behavior assertion is
preserved.
Cherry-picked code from NemesisHubris/litfinder used slightly different
ruff settings; this aligns with this repo's stricter config.

Auto-fixed (ruff --fix): import sorting in pipeline.py and transfer.py,
quote style in audiobookbay/scraper.py regex, unused pytest imports in
three test files.

Manual fixes:
- BLE001: narrow blind `except Exception:` to (AttributeError, KeyError,
  TypeError) in activity_routes.py and main.py display-name lookups
- SIM105: replace try/except/pass with contextlib.suppress(OSError) in
  destination.py write-probe cleanup
- TC003: move `from pathlib import Path` into TYPE_CHECKING block in
  postprocess/group.py (only used as annotation, file has
  `from __future__ import annotations`)
- RET504: inline final return in direct_download._normalize_candidate
- S105: noqa on `token == "all"` (language sentinel, not a credential)
- C405: list literal → set literal in test_text_match.py
- C408: dict() call → dict literal in test_postprocess_group_integration.py
- RUF059: prefix unused `final_paths` with `_` in one integration test

All 1942 backend tests still pass.
LitFinder uses slightly different formatting (line length, wrap style)
than this repo. The 31 files that drifted are all touched by the port
commits; `make python-format` was failing on them. Running `ruff format`
brings everything back to this repo's existing config.

No behavior changes; all 1942 backend tests still pass.
@spin-drift spin-drift merged commit 598afdb into main Jun 14, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants