Skip to content

Scope annotation-studio access per language (+ persistence, readiness & storage hardening)#85

Merged
joaocarvoli merged 7 commits into
feat/annotation-studiofrom
feat/annotation-studio-access-control
Jun 9, 2026
Merged

Scope annotation-studio access per language (+ persistence, readiness & storage hardening)#85
joaocarvoli merged 7 commits into
feat/annotation-studiofrom
feat/annotation-studio-access-control

Conversation

@joaocarvoli

Copy link
Copy Markdown
Member

Summary

Adds per-language access control to the annotation-studio module so only admin-assigned facilitators (plus admins / platform admins) can work on or manage a given language — closing a gap where any approved AS user could read or modify any language's data. Also fixes a storage/DB write-ordering hazard, rewrites readiness as SQL aggregation, and adds upload size limits. All changes are confined to the annotation_studio feature; shared tripod auth/RBAC is reused, not modified.

Changes

  • feat(db): new as_language_members table + Alembic migration (20260608_0001) that seeds existing AS users into all currently-active languages (no-lockout) and adds composite (fk, upload_status) indexes; model registered in app/db/models/__init__.py.
  • feat(access): access.py helpers (assert_language_access, accessible_language_ids, language_id resolvers) enforced on every data route (path + by-id); dashboard scoped via dashboard_service.py; /audio/url now resolves the key→language and re-checks.
  • feat(members): admin-only list/add(by email)/remove member endpoints (members.py, member_service.py), schemas in models/annotation_studio.py, router mounted.
  • perf(readiness): readiness_service.py rewritten with SQL COUNT/GROUP BY/HAVING (no row loading; defangs the dashboard N+1).
  • fix(storage): commit DB before storage.delete() in all delete/clear flows; enforce MAX_AUDIO_BYTES at the complete step (tier_a/b/c_service.py, export_service.py, common.py, constants.py, storage.py).
  • test(annotation-studio): service tests (access, audio-key, readiness, members) + an end-to-end HTTP suite proving cross-language 403 through the real ASGI stack.

Type of Change

  • New feature (non-breaking change that adds functionality)
  • Bug fix (non-breaking change that fixes an issue)

Primarily a new feature (per-language scoping) plus a persistence fix and perf/hardening.

Testing

uv run pytest — full suite 655 passed, 2 skipped; new AS suite is 22 tests (12 service + 10 HTTP). ruff and mypy clean on all changed modules. The HTTP test was verified to have teeth: removing a route guard flips its case to failure.

Deploy note: run alembic upgrade head on Postgres. The migration seeds all current AS users into all active languages to avoid lockout — an admin should then prune memberships to least-privilege via the new Members panel.

Related Issues

Follow-up to the annotation-studio persistence/query/security review.


🤖 Generated with Claude Code

joaocarvoli and others added 7 commits June 8, 2026 17:55
Introduce as_language_members to scope annotation-studio facilitators to
specific languages (admins and platform admins bypass it). The Alembic
migration creates the table, seeds memberships for every current AS user
across every currently-active language so no one is locked out on deploy,
and adds composite (FK, upload_status) indexes on the tier recording/clip
tables used by readiness and export filters.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add access helpers (assert_language_access, accessible_language_ids and the
language_id resolvers) and call them on every annotation-studio data route:
path routes check the URL language, by-id routes resolve the owning language
from the resource. The dashboard now lists only languages the user may see.
Closes the gap where any approved AS user could read or modify any language's
data.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add admin-only endpoints to list, add (by email) and remove per-language
facilitators, backed by member_service and the LanguageMember schemas, and
mount the members router.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the in-Python Counter/row-loading readiness computation with SQL
COUNT/GROUP BY/HAVING aggregates. This avoids loading every recording, clip
and sort row into memory and removes the per-language cost the dashboard
multiplied across languages.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Delete storage objects only after the database commit succeeds, so a failed
commit can no longer leave a row pointing at a deleted file (delete word,
recording, clip and export, plus the reference set/clear flows). Also enforce
a maximum audio size at the upload "complete" step, deleting oversized objects
instead of marking them stored.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add service-level tests for assert_language_access, the audio storage-key
resolver, readiness aggregation and member management, plus an end-to-end
HTTP suite that drives the real ASGI stack to prove cross-language requests
are rejected with 403 (and guards against a route forgetting the check).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Run `ruff format` on the new annotation-studio modules and tests to satisfy
the CI format check (`ruff format --check`). Formatting only; no behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@joaocarvoli joaocarvoli merged commit c8ea6a8 into feat/annotation-studio Jun 9, 2026
4 checks passed
joaocarvoli added a commit that referenced this pull request Jun 9, 2026
Resolve conflicts for PR #86 (feat/annotation-studio → main). The
annotation-studio module existed independently on both branches with
byte-identical content at this branch's pre-#85 base, so every conflicted
module file was resolved to this branch's version — the superset that adds
the #85 per-language access control, persistence, readiness and storage
changes without dropping anything from main (which only equalled the base
for these files). main's unrelated changes (oral-collector, bhsa, storage,
scripts, tests) merge in cleanly.

Verified on the merged tree: single alembic head (20260608_0001), ruff
format/check clean, mypy clean (429 files), pytest 677 passed / 2 skipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
joaocarvoli added a commit that referenced this pull request Jun 9, 2026
* Add annotation-studio enums and SQLAlchemy models

Nine as_* tables, every language_id FK retargeted to tripod languages.id (CASCADE). As* enums kept isolated from app.core.enums so the studio's pending/stored upload lifecycle never clashes with the oral-collector UploadStatus.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Add annotation-studio Pydantic schemas

Ported from the studio interface schemas; reuses tripod LanguageResponse for the picker and drops is_seed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Add annotation-studio service layer

Collection engine ported into tripod's async-SQLAlchemy style: speakers, Tier A/B/C, export bundler, readiness, results, dashboard. naming.py and export_plan.py are pure and verbatim to preserve the CSV/zip contract. storage.py reuses the oral-collector GCS presign pattern, keys prefixed annotation-studio/ in the shared bucket.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Add annotation-studio API routers

Eight routers (speakers, Tier A/B/C, export, results, languages, audio) + _deps.py guards (require_app_access / require_role). Mounted at /api/annotation-studio (40 routes).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Add annotation-studio migration and app/role seed

Migration creates only the as_* tables and seeds the App row + admin/facilitator roles idempotently. Default access-request role is facilitator.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Add annotation-studio CORS origin

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Use a dedicated GCS bucket for annotation-studio audio

Isolate experiment audio in its own bucket (annotation-studio-audio) instead of sharing oral-collector's tripod-image-uploads. Its CORS is managed independently. Drops the now-redundant annotation-studio/ key prefix so object names are the logical naming.py keys directly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Fix ruff and mypy on the annotation-studio module

Wrap long lines + ruff-format; contextlib.suppress for the GCS delete; type annotations on the export helpers (_config_hint/_assemble_zip/_gather); type:ignore for the GCS-stub Any returns and the generic get_or_404 model.id; Row→tuple comprehension in tier_c; avoid rebinding the CurrentUser _ in set_reference. No behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Add tier_b.pairs_ready to readiness (balanced, not just created)

readiness.tier_b reported only `pairs` (total created) + `recordings`, so the studio UI marked Tier B complete/export-ready even on empty pairs. Add `pairs_ready` = pairs where both sides have >= REPS_PER_SIDE (5) stored takes, mirroring Tier A's `words_ready`. The frontend gates its ready/percent on this field instead of `pairs`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Scope annotation-studio access per language (+ persistence, readiness & storage hardening) (#85)

* feat(db): add per-language membership table and indexes

Introduce as_language_members to scope annotation-studio facilitators to
specific languages (admins and platform admins bypass it). The Alembic
migration creates the table, seeds memberships for every current AS user
across every currently-active language so no one is locked out on deploy,
and adds composite (FK, upload_status) indexes on the tier recording/clip
tables used by readiness and export filters.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(access): enforce per-language access across all routers

Add access helpers (assert_language_access, accessible_language_ids and the
language_id resolvers) and call them on every annotation-studio data route:
path routes check the URL language, by-id routes resolve the owning language
from the resource. The dashboard now lists only languages the user may see.
Closes the gap where any approved AS user could read or modify any language's
data.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(members): admin endpoints to manage language members

Add admin-only endpoints to list, add (by email) and remove per-language
facilitators, backed by member_service and the LanguageMember schemas, and
mount the members router.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* perf(readiness): compute readiness with SQL aggregation

Replace the in-Python Counter/row-loading readiness computation with SQL
COUNT/GROUP BY/HAVING aggregates. This avoids loading every recording, clip
and sort row into memory and removes the per-language cost the dashboard
multiplied across languages.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(storage): commit before deleting objects and cap upload size

Delete storage objects only after the database commit succeeds, so a failed
commit can no longer leave a row pointing at a deleted file (delete word,
recording, clip and export, plus the reference set/clear flows). Also enforce
a maximum audio size at the upload "complete" step, deleting oversized objects
instead of marking them stored.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(annotation-studio): cover access control, readiness and members

Add service-level tests for assert_language_access, the audio storage-key
resolver, readiness aggregation and member management, plus an end-to-end
HTTP suite that drives the real ASGI stack to prove cross-language requests
are rejected with 403 (and guards against a route forgetting the check).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* style(annotation-studio): apply ruff format

Run `ruff format` on the new annotation-studio modules and tests to satisfy
the CI format check (`ruff format --check`). Formatting only; no behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant