Skip to content

Feat/annotation studio#86

Merged
joaocarvoli merged 11 commits into
mainfrom
feat/annotation-studio
Jun 9, 2026
Merged

Feat/annotation studio#86
joaocarvoli merged 11 commits into
mainfrom
feat/annotation-studio

Conversation

@joaocarvoli

Copy link
Copy Markdown
Member

No description provided.

joaocarvoli and others added 10 commits June 3, 2026 17:59
Nine as_* tables, every language_id FK retargeted to tripod languages.id (CASCADE). As* enums kept isolated from app.core.enums so the studio's pending/stored upload lifecycle never clashes with the oral-collector UploadStatus.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Ported from the studio interface schemas; reuses tripod LanguageResponse for the picker and drops is_seed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Collection engine ported into tripod's async-SQLAlchemy style: speakers, Tier A/B/C, export bundler, readiness, results, dashboard. naming.py and export_plan.py are pure and verbatim to preserve the CSV/zip contract. storage.py reuses the oral-collector GCS presign pattern, keys prefixed annotation-studio/ in the shared bucket.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Eight routers (speakers, Tier A/B/C, export, results, languages, audio) + _deps.py guards (require_app_access / require_role). Mounted at /api/annotation-studio (40 routes).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Migration creates only the as_* tables and seeds the App row + admin/facilitator roles idempotently. Default access-request role is facilitator.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Isolate experiment audio in its own bucket (annotation-studio-audio) instead of sharing oral-collector's tripod-image-uploads. Its CORS is managed independently. Drops the now-redundant annotation-studio/ key prefix so object names are the logical naming.py keys directly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wrap long lines + ruff-format; contextlib.suppress for the GCS delete; type annotations on the export helpers (_config_hint/_assemble_zip/_gather); type:ignore for the GCS-stub Any returns and the generic get_or_404 model.id; Row→tuple comprehension in tier_c; avoid rebinding the CurrentUser _ in set_reference. No behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
readiness.tier_b reported only `pairs` (total created) + `recordings`, so the studio UI marked Tier B complete/export-ready even on empty pairs. Add `pairs_ready` = pairs where both sides have >= REPS_PER_SIDE (5) stored takes, mirroring Tier A's `words_ready`. The frontend gates its ready/percent on this field instead of `pairs`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… & storage hardening) (#85)

* feat(db): add per-language membership table and indexes

Introduce as_language_members to scope annotation-studio facilitators to
specific languages (admins and platform admins bypass it). The Alembic
migration creates the table, seeds memberships for every current AS user
across every currently-active language so no one is locked out on deploy,
and adds composite (FK, upload_status) indexes on the tier recording/clip
tables used by readiness and export filters.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(access): enforce per-language access across all routers

Add access helpers (assert_language_access, accessible_language_ids and the
language_id resolvers) and call them on every annotation-studio data route:
path routes check the URL language, by-id routes resolve the owning language
from the resource. The dashboard now lists only languages the user may see.
Closes the gap where any approved AS user could read or modify any language's
data.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(members): admin endpoints to manage language members

Add admin-only endpoints to list, add (by email) and remove per-language
facilitators, backed by member_service and the LanguageMember schemas, and
mount the members router.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* perf(readiness): compute readiness with SQL aggregation

Replace the in-Python Counter/row-loading readiness computation with SQL
COUNT/GROUP BY/HAVING aggregates. This avoids loading every recording, clip
and sort row into memory and removes the per-language cost the dashboard
multiplied across languages.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(storage): commit before deleting objects and cap upload size

Delete storage objects only after the database commit succeeds, so a failed
commit can no longer leave a row pointing at a deleted file (delete word,
recording, clip and export, plus the reference set/clear flows). Also enforce
a maximum audio size at the upload "complete" step, deleting oversized objects
instead of marking them stored.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(annotation-studio): cover access control, readiness and members

Add service-level tests for assert_language_access, the audio storage-key
resolver, readiness aggregation and member management, plus an end-to-end
HTTP suite that drives the real ASGI stack to prove cross-language requests
are rejected with 403 (and guards against a route forgetting the check).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* style(annotation-studio): apply ruff format

Run `ruff format` on the new annotation-studio modules and tests to satisfy
the CI format check (`ruff format --check`). Formatting only; no behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@joaocarvoli joaocarvoli self-assigned this Jun 9, 2026
Resolve conflicts for PR #86 (feat/annotation-studio → main). The
annotation-studio module existed independently on both branches with
byte-identical content at this branch's pre-#85 base, so every conflicted
module file was resolved to this branch's version — the superset that adds
the #85 per-language access control, persistence, readiness and storage
changes without dropping anything from main (which only equalled the base
for these files). main's unrelated changes (oral-collector, bhsa, storage,
scripts, tests) merge in cleanly.

Verified on the merged tree: single alembic head (20260608_0001), ruff
format/check clean, mypy clean (429 files), pytest 677 passed / 2 skipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@joaocarvoli joaocarvoli merged commit 8dd5061 into main Jun 9, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant