Scope annotation-studio access per language (+ persistence, readiness & storage hardening)#85
Merged
joaocarvoli merged 7 commits intoJun 9, 2026
Conversation
Introduce as_language_members to scope annotation-studio facilitators to specific languages (admins and platform admins bypass it). The Alembic migration creates the table, seeds memberships for every current AS user across every currently-active language so no one is locked out on deploy, and adds composite (FK, upload_status) indexes on the tier recording/clip tables used by readiness and export filters. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add access helpers (assert_language_access, accessible_language_ids and the language_id resolvers) and call them on every annotation-studio data route: path routes check the URL language, by-id routes resolve the owning language from the resource. The dashboard now lists only languages the user may see. Closes the gap where any approved AS user could read or modify any language's data. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add admin-only endpoints to list, add (by email) and remove per-language facilitators, backed by member_service and the LanguageMember schemas, and mount the members router. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the in-Python Counter/row-loading readiness computation with SQL COUNT/GROUP BY/HAVING aggregates. This avoids loading every recording, clip and sort row into memory and removes the per-language cost the dashboard multiplied across languages. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Delete storage objects only after the database commit succeeds, so a failed commit can no longer leave a row pointing at a deleted file (delete word, recording, clip and export, plus the reference set/clear flows). Also enforce a maximum audio size at the upload "complete" step, deleting oversized objects instead of marking them stored. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add service-level tests for assert_language_access, the audio storage-key resolver, readiness aggregation and member management, plus an end-to-end HTTP suite that drives the real ASGI stack to prove cross-language requests are rejected with 403 (and guards against a route forgetting the check). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Run `ruff format` on the new annotation-studio modules and tests to satisfy the CI format check (`ruff format --check`). Formatting only; no behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
joaocarvoli
added a commit
that referenced
this pull request
Jun 9, 2026
Resolve conflicts for PR #86 (feat/annotation-studio → main). The annotation-studio module existed independently on both branches with byte-identical content at this branch's pre-#85 base, so every conflicted module file was resolved to this branch's version — the superset that adds the #85 per-language access control, persistence, readiness and storage changes without dropping anything from main (which only equalled the base for these files). main's unrelated changes (oral-collector, bhsa, storage, scripts, tests) merge in cleanly. Verified on the merged tree: single alembic head (20260608_0001), ruff format/check clean, mypy clean (429 files), pytest 677 passed / 2 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
joaocarvoli
added a commit
that referenced
this pull request
Jun 9, 2026
* Add annotation-studio enums and SQLAlchemy models Nine as_* tables, every language_id FK retargeted to tripod languages.id (CASCADE). As* enums kept isolated from app.core.enums so the studio's pending/stored upload lifecycle never clashes with the oral-collector UploadStatus. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Add annotation-studio Pydantic schemas Ported from the studio interface schemas; reuses tripod LanguageResponse for the picker and drops is_seed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Add annotation-studio service layer Collection engine ported into tripod's async-SQLAlchemy style: speakers, Tier A/B/C, export bundler, readiness, results, dashboard. naming.py and export_plan.py are pure and verbatim to preserve the CSV/zip contract. storage.py reuses the oral-collector GCS presign pattern, keys prefixed annotation-studio/ in the shared bucket. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Add annotation-studio API routers Eight routers (speakers, Tier A/B/C, export, results, languages, audio) + _deps.py guards (require_app_access / require_role). Mounted at /api/annotation-studio (40 routes). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Add annotation-studio migration and app/role seed Migration creates only the as_* tables and seeds the App row + admin/facilitator roles idempotently. Default access-request role is facilitator. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Add annotation-studio CORS origin Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Use a dedicated GCS bucket for annotation-studio audio Isolate experiment audio in its own bucket (annotation-studio-audio) instead of sharing oral-collector's tripod-image-uploads. Its CORS is managed independently. Drops the now-redundant annotation-studio/ key prefix so object names are the logical naming.py keys directly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Fix ruff and mypy on the annotation-studio module Wrap long lines + ruff-format; contextlib.suppress for the GCS delete; type annotations on the export helpers (_config_hint/_assemble_zip/_gather); type:ignore for the GCS-stub Any returns and the generic get_or_404 model.id; Row→tuple comprehension in tier_c; avoid rebinding the CurrentUser _ in set_reference. No behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Add tier_b.pairs_ready to readiness (balanced, not just created) readiness.tier_b reported only `pairs` (total created) + `recordings`, so the studio UI marked Tier B complete/export-ready even on empty pairs. Add `pairs_ready` = pairs where both sides have >= REPS_PER_SIDE (5) stored takes, mirroring Tier A's `words_ready`. The frontend gates its ready/percent on this field instead of `pairs`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Scope annotation-studio access per language (+ persistence, readiness & storage hardening) (#85) * feat(db): add per-language membership table and indexes Introduce as_language_members to scope annotation-studio facilitators to specific languages (admins and platform admins bypass it). The Alembic migration creates the table, seeds memberships for every current AS user across every currently-active language so no one is locked out on deploy, and adds composite (FK, upload_status) indexes on the tier recording/clip tables used by readiness and export filters. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(access): enforce per-language access across all routers Add access helpers (assert_language_access, accessible_language_ids and the language_id resolvers) and call them on every annotation-studio data route: path routes check the URL language, by-id routes resolve the owning language from the resource. The dashboard now lists only languages the user may see. Closes the gap where any approved AS user could read or modify any language's data. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(members): admin endpoints to manage language members Add admin-only endpoints to list, add (by email) and remove per-language facilitators, backed by member_service and the LanguageMember schemas, and mount the members router. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * perf(readiness): compute readiness with SQL aggregation Replace the in-Python Counter/row-loading readiness computation with SQL COUNT/GROUP BY/HAVING aggregates. This avoids loading every recording, clip and sort row into memory and removes the per-language cost the dashboard multiplied across languages. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(storage): commit before deleting objects and cap upload size Delete storage objects only after the database commit succeeds, so a failed commit can no longer leave a row pointing at a deleted file (delete word, recording, clip and export, plus the reference set/clear flows). Also enforce a maximum audio size at the upload "complete" step, deleting oversized objects instead of marking them stored. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(annotation-studio): cover access control, readiness and members Add service-level tests for assert_language_access, the audio storage-key resolver, readiness aggregation and member management, plus an end-to-end HTTP suite that drives the real ASGI stack to prove cross-language requests are rejected with 403 (and guards against a route forgetting the check). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * style(annotation-studio): apply ruff format Run `ruff format` on the new annotation-studio modules and tests to satisfy the CI format check (`ruff format --check`). Formatting only; no behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds per-language access control to the annotation-studio module so only admin-assigned facilitators (plus admins / platform admins) can work on or manage a given language — closing a gap where any approved AS user could read or modify any language's data. Also fixes a storage/DB write-ordering hazard, rewrites readiness as SQL aggregation, and adds upload size limits. All changes are confined to the
annotation_studiofeature; shared tripod auth/RBAC is reused, not modified.Changes
as_language_memberstable + Alembic migration (20260608_0001) that seeds existing AS users into all currently-active languages (no-lockout) and adds composite(fk, upload_status)indexes; model registered inapp/db/models/__init__.py.access.pyhelpers (assert_language_access,accessible_language_ids, language_id resolvers) enforced on every data route (path + by-id); dashboard scoped viadashboard_service.py;/audio/urlnow resolves the key→language and re-checks.members.py,member_service.py), schemas inmodels/annotation_studio.py, router mounted.readiness_service.pyrewritten with SQLCOUNT/GROUP BY/HAVING(no row loading; defangs the dashboard N+1).storage.delete()in all delete/clear flows; enforceMAX_AUDIO_BYTESat thecompletestep (tier_a/b/c_service.py,export_service.py,common.py,constants.py,storage.py).Type of Change
Primarily a new feature (per-language scoping) plus a persistence fix and perf/hardening.
Testing
uv run pytest— full suite 655 passed, 2 skipped; new AS suite is 22 tests (12 service + 10 HTTP).ruffandmypyclean on all changed modules. The HTTP test was verified to have teeth: removing a route guard flips its case to failure.Deploy note: run
alembic upgrade headon Postgres. The migration seeds all current AS users into all active languages to avoid lockout — an admin should then prune memberships to least-privilege via the new Members panel.Related Issues
Follow-up to the annotation-studio persistence/query/security review.
🤖 Generated with Claude Code