feat(auth/permissions): privacy-preserving RepoNotFound shape for read denials (#76)#89
Conversation
…d denials Closes #76. Aligns hkub's permission-denial wire shape with HuggingFace's authed-style RepoNotFound contract — anonymous-on-private and authed-no-access both collapse to ``404 + X-Error-Code: RepoNotFound`` with an empty body. ``huggingface_hub.utils._http.hf_raise_for_status`` then dispatches ``RepositoryNotFoundError`` on the client (verified live against hf_hub 1.11.0 in the new interop test), which is what ``transformers``, ``datasets``, etc. key off for actionable error messages. Pre-fix gap (most acute): authed-no-access on a private hkub repo returned raw ``HTTPException(403)`` from ``check_repo_read_permission`` across tree / paths-info / resolve / commits / likes / stats / xet- lookup / branches; FastAPI's default ``{"detail": "..."}`` envelope had no ``X-Error-Code``, so hf_hub clients fell through to generic ``HfHubHTTPError``. ``info.py`` and ``_get_file_metadata`` had ad-hoc ``try/except HTTPException`` translations to ``hf_repo_not_found`` — this PR lifts that into a single global handler so every endpoint through ``check_repo_read_permission`` gets the right shape for free. Empirical anchor (re-probed live 2026-05-05; authed shape from PR #77's 2026-04-30 capture): - Anon → missing/private HF repo: ``401`` bare + ``X-Error-Message: "Invalid username or password."`` (anti-enum). hf_hub raises ``RepositoryNotFoundError`` via the URL-pattern branch. - Authed → missing/no-access HF repo: ``404`` + ``X-Error-Code: RepoNotFound``. Same client exception. Both wire shapes dispatch to ``RepositoryNotFoundError``. Picking the authed shape for hkub (issue #76 Option A): same client exception, privacy-preserving (no enum leak between "missing" and "private"), simpler than mirroring HF's anon-vs-authed branch. Implementation: - ``auth/permissions.py``: new ``RepoReadDeniedError`` exception, raised by ``check_repo_read_permission`` for both anon-on-private and authed-no-access. Inherits ``Exception`` (not ``HTTPException``) precisely so the fallback decorator's ``except HTTPException`` does not catch it — masked private repos must not trigger an upstream chain probe. - ``main.py``: global ``@app.exception_handler(RepoReadDeniedError)`` returning ``hf_repo_not_found(...)``. Centralizes the wire-shape decision in one place. - ``info.py`` / ``files.py``: removed the ad-hoc per-endpoint ``try/except HTTPException`` translations (subsumed by the global handler). - ``repo/utils/hf.py``: added ``hf_disabled_repo()`` helper emitting HF's exact ``X-Error-Message: "Access to this resource is disabled."`` string with no ``X-Error-Code`` (DisabledRepoError dispatch is keyed off the message verbatim — verified in a new round-trip test against ``hf_raise_for_status``). No call site yet; reserved for a future moderation feature. ``HFErrorCode`` docstring labels its KohakuHub-only extensions distinctly from the four HF-recognized codes. - ``fallback/probe_local.py``: chain-tester probe also has to honour the new exception (it bypasses both ``with_repo_fallback`` and the app handler by calling the unwrapped inner directly), so it reproduces the conversion locally. The ``RepoReadDeniedError`` import is intentionally lazy inside the function body: the test harness reloads ``kohakuhub.auth.permissions`` between sessions, freezing a module-level import to a stale class identity such that ``isinstance()`` would silently return False even for legitimate raises (caught live via temporary instrumentation; the comment block in the file documents the failure mode and the lazy-import remedy). Tests: - New ``test/kohakuhub/api/test_repo_read_denial_hf_alignment.py`` — 15 tests: 12 wire-shape RED-first across info / tree / paths-info / resolve HEAD/GET × {anon, outsider}, 2 owner regression guards, 3 ``huggingface_hub`` interop tests proving ``RepositoryNotFoundError`` dispatches end-to-end against a live test server. - New ``hf_disabled_repo`` unit tests in ``test_hf.py`` including a real ``hf_raise_for_status`` round-trip dispatching ``DisabledRepoError``. - Updated ``test_permissions.py`` / ``test_permissions_unit.py`` / ``test_files_unit.py`` to expect ``RepoReadDeniedError`` for read denials (write/delete denials still raise ``HTTPException``; unchanged). Out of scope: write/delete permission denials (genuine 403, no privacy concern); SPA changes (``http-errors.js`` already maps ``RepoNotFound`` to ``NOT_FOUND``); ``hf_disabled_repo`` wiring (helper added but no call site yet, awaiting moderation feature).
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## dev/narugo1992 #89 +/- ##
===============================================
Coverage 94.88% 94.88%
===============================================
Files 139 139
Lines 23715 23724 +9
Branches 2119 2119
===============================================
+ Hits 22501 22511 +10
+ Misses 1172 1171 -1
Partials 42 42
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
CI's hf_hub 0.20.3 matrix lane fails on collection with:
ModuleNotFoundError: No module named 'huggingface_hub.errors'
``huggingface_hub.errors`` landed around v0.22; v0.20.3 (and earlier)
keep these exceptions exposed through ``huggingface_hub.utils`` only.
Switch the two new test modules from this PR to the version-portable
import path ``from huggingface_hub.utils import ...`` — same pattern
the existing ``test_huggingface_hub_deep.py`` already uses.
Additionally, ``DisabledRepoError`` itself was not introduced until
~v0.21; the ``hf_disabled_repo_dispatches_to_disabled_repo_error_in_huggingface_hub``
test now ``pytest.skip``s on the older lanes that don't export the
symbol — the helper's on-the-wire shape (status code, exact message
string, no X-Error-Code) is still pinned by the two unit tests above
that don't depend on the named exception class.
…er loops Two follow-up holes from the #76 refactor surfaced on the full CI matrix (now collecting cleanly post-523b3dd): 1. ``api/likes.py:list_user_likes`` and ``api/stats.py``'s trending builder both filter private-repos-the-caller-can't-see by wrapping ``check_repo_read_permission`` in ``try/except HTTPException``. The helper now raises ``RepoReadDeniedError`` (non-HTTPException, by design — see ``permissions.py`` docstring), so the catch missed and the private-repo permission denial bubbled up to the route, which then 404'd the *whole listing* instead of dropping just the masked row. Reproduced in ``test_list_user_likes_hides_private_repos_without_access`` (asserts 200 with the private repo filtered out; was getting 404). Fix: ``except (HTTPException, RepoReadDeniedError)`` at both sites. ``HTTPException`` stays in the catch for any sibling check that raises through the legacy shape (e.g., the user-not-found 404 raised before the loop). 2. ``test_info_unit.py::test_get_repo_info_covers_invalid_type_not_found_siblings_and_storage_paths`` was still pinning the OLD ad-hoc translation behaviour: it monkey- patched ``check_repo_read_permission`` to raise ``HTTPException(401)`` and asserted ``info.py`` translated it to 404. Post-#76 that translation lives in ``main.py``'s global handler (which the unit call doesn't traverse). Updated the mock to raise ``RepoReadDeniedError`` and assert it propagates, mirroring the same adaptation already done in ``test_files_unit.py``. Wire-shape coverage of the on-the-wire 404 + RepoNotFound is in ``test_repo_read_denial_hf_alignment.py`` integration tests. Targeted regression: 907 passed across ``test/kohakuhub/api/`` + auth (0 fail). The two failures from the prior CI run are gone.
…ch gap Previous skip used ``ImportError`` on ``DisabledRepoError`` — that catches v0.20.3 (no symbol) but not v0.30.2 / v0.36.2, where the class is exported but ``hf_raise_for_status`` does not yet route the ``X-Error-Message == "Access to this resource is disabled."`` branch to it. Those versions raise httpx's generic ``HTTPStatusError`` against our wire shape, so ``pytest.raises(DisabledRepoError)`` fails. Replace with a runtime probe: feed our wire shape through ``hf_raise_for_status`` once and skip if the dispatch isn't wired in the running hf_hub version (catch any non-DisabledRepoError exception). The dispatch branch landed ~v1.0; v1.0.1 / v1.6.0 / latest pass; older matrix lanes skip cleanly. The helper's on-the-wire shape is still pinned by the two unit tests above (independent of any hf_hub version).
narugo1992
left a comment
There was a problem hiding this comment.
Self-review with hostile lens. Strengths: empirical anchoring via live HF probes is solid; the hf_disabled_repo exact-message contract + dispatch-probe skip logic is well-justified; existing test_files_unit.py migration to RepoReadDeniedError is consistent. Critical issue uncovered: the design assumes with_repo_fallback's except HTTPException is the only catch in the way, but the decorator also has a generic except Exception block (decorators.py:363) that swallows the new exception and re-raises as a 500. Tests don't catch this because the harness sets KOHAKU_HUB_FALLBACK_ENABLED=false AND every new integration test additionally passes ?fallback=false — both gates short-circuit the decorator before the bug fires. With cfg.fallback.enabled=true (the production default), every read endpoint wrapped by @with_repo_fallback emits 500 + {"detail":{"error":"Internal server error in local handler"}} for private-repo denials instead of the promised 404 + RepoNotFound. Don't merge until that's fixed and at least one test pins the production shape with fallback enabled. See inline comments for full picture + a separate git-over-HTTP regression on the unwrapped path.
… Exception Critical regression caught in self-review of #89: ``with_repo_fallback`` has a generic ``except Exception as e`` block at line 363 (intended for LakeFS / DB / timeout failures, surfaced as ``LOCAL_OTHER_ERROR`` + 500 + ``{"error": "Internal server error in local handler"}``). The ``RepoReadDeniedError`` introduced for issue #76 is a non- ``HTTPException`` exception class — chosen specifically so the decorator's ``except HTTPException`` does not absorb it — but the generic catch one branch lower swallowed it just the same. The global FastAPI handler in ``main.py`` therefore never ran, and every read endpoint wrapped by ``@with_repo_fallback`` (info / tree / paths-info / resolve HEAD/GET / revision) emitted a ``500`` for private-repo denials in production instead of the promised ``404 + X-Error-Code: RepoNotFound``. The original test suite missed this for two stacked reasons: 1. The harness sets ``KOHAKU_HUB_FALLBACK_ENABLED=false`` (``test/kohakuhub/support/service_bootstrap.py:120``), short- circuiting the wrapper at ``decorators.py:207`` before the buggy ``except`` chain runs. 2. Every wire-shape test in ``test_repo_read_denial_hf_alignment.py`` additionally passes ``?fallback=false`` to scope to the local response. Reproduced by the new ``test_info_anon_on_private_dataset_returns_repo_not_found_with_fallback_enabled`` test below: with ``cfg.fallback.enabled=True`` and no ``?fallback=false`` query param, the response was ``500 + {"detail": {"error": "Internal server error in local handler"}}`` and the backend log carried ``Local handler raised RepoReadDeniedError for dataset/acme-labs/private-dataset`` — exact symptom called out in the self-review comment on PR #89. Fix: insert ``except _resolve_repo_read_denied_error(): raise`` between the cancellation handler and the generic ``except Exception``. Bare ``raise`` sends the exception through to the FastAPI global handler unchanged. The chain probe is intentionally not entered for permission-denial cases — we don't want to leak the existence of a masked private local repo by emitting upstream traffic for it (the strict-consistency contract ``Local namespace wins absolutely`` from the post-#77 README applies even when the local layer is hiding the repo). The class is resolved lazily through ``_resolve_repo_read_denied_error()`` because the test harness reloads ``kohakuhub.auth.permissions`` between sessions, freezing module-level imports to a stale class object such that ``isinstance()`` (which Python's ``except`` uses internally) would silently return False on legitimate raises. Same gotcha ``fallback/probe_local.py`` already documents — same lazy-resolve fix. Test coverage: ``test_info_anon_on_private_dataset_returns_repo_not_found_with_fallback_enabled`` was added as the explicit production-shape regression guard. It runs without ``?fallback=false`` and forces ``cfg.fallback.enabled=True`` via monkeypatch so the wrapper code path is fully exercised. Verified RED before the fix (500), GREEN after (404 + RepoNotFound). Existing fallback + auth + new-tests sweep (441 tests) stays clean.
…r RepoReadDeniedError Self-review on #89 surfaced two doc gaps left over from the original design and not corrected by the critical wrapper fix in 92f358b: 1. **Privacy claim oversells equivalence.** The class docstring asserted "anon and authed-no-access become indistinguishable on the wire" — true at the response-shape level, but the authed-no-access branch issues two extra DB lookups (``get_organization`` + ``get_user_organization``) that the anon-on-private branch skips, leaving a measurable timing-side-channel that could theoretically distinguish the two callers (and probe namespace-is-org). HF's own backend has the same asymmetry so we're not strictly worse, but the docstring read like a stronger guarantee than the code delivers. Reframed as "wire shape identical; sub-millisecond timing side-channel acknowledged and out-of-scope" with rationale for leaving constant-time defenses as future work. 2. **"The decorator only catches HTTPException" is now factually wrong.** Both ``main.py``'s exception-handler docstring and ``probe_local.py``'s catch comment claimed that propagation past ``with_repo_fallback`` is preserved solely by the inheritance choice (``Exception``, not ``HTTPException``). After the critical fix in 92f358b that's only half the story — the decorator now *also* has an explicit ``except RepoReadDeniedError: raise`` clause sitting between the cancellation handler and the generic ``except Exception`` block, and *both* pieces are load-bearing (without the explicit re-raise the generic catch would have collapsed the exception into a 500). Updated both notes so the next person reading either site sees the full propagation story rather than a partial one that could mislead a future refactor of the catch chain. The git-over-HTTP UX regression for anon clones (the third "important" finding in the same self-review) is tracked separately as #90. Pure-doc commit; no behavior change. Verified the test suite still passes (63/63 across permission + fallback + new tests).
…rror Self-review minor on PR #89 flagged the dispatch-probe block's ``except Exception: pytest.skip(...)`` as over-broad. If a future ``huggingface_hub`` version dispatches the disabled-repo message to a different specific exception (e.g. a hypothetical ``ContentDisabledError`` introduced alongside or in place of ``DisabledRepoError``), the skip catches it and silences the divergence — defeating the test's purpose. Narrow to the two exception families actually observed across the CI matrix: - ``httpx.HTTPStatusError`` — raised by v0.30.x / v0.36.x where ``hf_raise_for_status`` doesn't yet wire the disabled-message dispatch and the response falls through to httpx's raw 4xx default. - ``HfHubHTTPError`` — the base class for every dispatched hf_hub error; catches any future hf_hub-native fall-through too. Anything outside those two families now propagates as a real ``pytest.fail`` so the regression is visible. Verified locally against hf_hub 1.11.0; same skip semantics on the older lanes.
Summary
Closes #76. Aligns hkub's permission-denial wire shape with HuggingFace's authed-style
RepoNotFoundcontract — anonymous-on-private and authed-no-access both collapse to404 + X-Error-Code: RepoNotFoundwith an empty body.huggingface_hub.utils._http.hf_raise_for_statusthen dispatchesRepositoryNotFoundErroron the client, which is whattransformers,datasets,huggingface_hubitself, etc. key off for actionable error messages.Empirical anchor
Re-probed live against
https://huggingface.co2026-05-05 (anon; authed shape from PR #77's 2026-04-30 capture, well-anchored), plus a synthetic round-trip throughhf_raise_for_status(hf_hub 1.11.0) for every wire shape:Invalid username or password.(HF anon anti-enum)RepositoryNotFoundErrorX-Error-Code: RepoNotFound(HF authed)RepositoryNotFoundErrorX-Error-Code: GatedRepoGatedRepoErrorX-Error-Code: EntryNotFoundEntryNotFoundErrorX-Error-Code: RevisionNotFoundRevisionNotFoundErrorX-Error-Message: "Access to this resource is disabled."DisabledRepoErrorHfHubHTTPError← what hkub did wrong on outsider-on-privateBoth anon-anti-enum and authed-RepoNotFound dispatch to
RepositoryNotFoundErroron the client. hkub picks the authed shape for emit (issue #76 Option A): same client exception, privacy-preserving (no enum leak between "missing" and "private"), simpler than mirroring HF's branch.Pre-fix gap (most acute)
RepositoryNotFoundError(accidental — URL-pattern branch){"detail": "..."}HfHubHTTPError← gapRepositoryNotFoundError✓The ad-hoc translation pattern was duplicated in
info.pyand_get_file_metadata's missing-repo path, but missing from tree / paths-info / resolve / commits / likes / stats / xet-lookup / branches. This PR lifts that translation into a global FastAPI exception handler, removing all per-endpoint duplication.Implementation
auth/permissions.py— newRepoReadDeniedErrorexception.check_repo_read_permissionraises it for both anon-on-private and authed-no-access. InheritsException(notHTTPException) precisely so the fallback decorator'sexcept HTTPExceptiondoes not catch it: masked private repos must not trigger an upstream chain probe.main.py— global@app.exception_handler(RepoReadDeniedError)that returnshf_repo_not_found(...). Centralizes the wire-shape decision in one place; every read endpoint that callscheck_repo_read_permissiongets the privacy-preserving translation for free.info.py/files.py— removed the per-endpoint ad-hoctry/except HTTPExceptiontranslations (now subsumed by the global handler). Net deletion + comments pointing to the new flow.repo/utils/hf.py— addedhf_disabled_repo()helper that emits HF's exactX-Error-Message: "Access to this resource is disabled."string with noX-Error-Code(DisabledRepoError dispatch is keyed off the message string verbatim). No call site yet; reserved for a future moderation feature. PlusHFErrorCodedocstring labels its KohakuHub-only extensions distinctly from the four HF-recognized codes.fallback/probe_local.py— chain-tester probe also has to honour the new exception (it bypasses bothwith_repo_fallbackand the FastAPI app handler by calling the unwrapped inner directly), so it reproduces the conversion locally. TheRepoReadDeniedErrorimport is intentionally lazy inside the function: the test harness reloadskohakuhub.auth.permissionsbetween sessions, so a module-level import would freeze the old class identity andisinstance()would silently return False even for legitimate raises (caught live via temporary instrumentation; the comment block in the file documents the failure mode).Tests
New:
test/kohakuhub/api/test_repo_read_denial_hf_alignment.py— 15 tests:404 + X-Error-Code: RepoNotFound + X-Error-Message + empty body.HfApi.dataset_info(anon + outsider) andhf_hub_download(anon) against a live test server, asserting the dispatched exception class isRepositoryNotFoundError.?fallback=falseto isolate the local wire shape — fallback chain interaction is a separate concern covered by the fallback test suite.New helper tests in
test_hf.pyforhf_disabled_repo():X-Error-Code, empty body.hf_raise_for_statusraisingDisabledRepoError(the proof that the wire shape works).Updates to
test_permissions.py,test_permissions_unit.py,test_files_unit.py— expectRepoReadDeniedErrorfor read denials (write/delete denials still raiseHTTPException, unchanged).Test plan
pytest test/kohakuhub/api/test_repo_read_denial_hf_alignment.py— 15/15)pytest test/kohakuhub/api/repo/utils/test_hf.py— added 3, all pass)pytest test/kohakuhub/auth/)Out of scope (deferred)
403, no privacy concern (you can see the public repo, just can't modify it). Continue to raiseHTTPException(403); not part of errors: align hkub error response shape with HuggingFace contract (X-Error-Code on permission denials) #76.http-errors.jsalready handles bothRepoNotFoundand bare 401 asNOT_FOUND(the privacy-preserving outsider UX shifts from "Forbidden" to "Not found", which is the design intent of Option A).hf_disabled_repo()wiring — helper added but no call site yet; reserved for a future moderation feature.