Skip to content

Sbp 250#91

Open
vtnphan wants to merge 9 commits into
mainfrom
sbp-250
Open

Sbp 250#91
vtnphan wants to merge 9 commits into
mainfrom
sbp-250

Conversation

@vtnphan

@vtnphan vtnphan commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Pull Request

Summary

SBP-250 Replace Seqera Platform dataset uploads with direct S3 uploads for workflow samplesheets. This removes the dependency on Seqera datasets as an intermediary and simplifies the data flow: CSV samplesheets are now uploaded directly to S3, and the resulting S3 key is passed to workflow launchers.

Changes

  • Removed seqera_dataset_id column from the WorkflowRun model and added an Alembic migration (drop_seqera_dataset_id_from_workflow_runs) to drop it from the DB
  • Replaced Seqera dataset service (create_seqera_dataset, upload_dataset_to_seqera, upload_interaction_screening_dataset) with S3-backed equivalents (upload_csv_to_s3, upload_interaction_screening_csv_to_s3) in app/services/datasets.py
  • Changed launch payload: WorkflowLaunchPayload.datasetIds3InputKey; all three workflow executors (bindflow, wisps, proteinfold) now receive s3_input_key: str instead of dataset_id
  • Updated response schemas: DatasetUploadResponse / InteractionScreeningDatasetUploadResponseS3DatasetUploadResponse / InteractionScreeningS3UploadResponse
  • Added RunInput tracking: on workflow launch, the samplesheet S3 object is recorded as a RunInput linked to the WorkflowRun, enabling a new presigned-URL endpoint to download the input samplesheet
  • Updated schema diagram to reflect the removed column
  • All tests updated across 10+ test files; coverage maintained at ≥90%

How to Test

  1. Run the Alembic migration: uv run alembic upgrade head
  2. Upload a samplesheet via the dataset upload endpoint and confirm an S3 key is returned (no Seqera dataset ID)
  3. Launch a workflow with s3InputKey in the payload and confirm it succeeds
  4. Verify old datasetId payloads are rejected with 422
  5. Run the full test suite: uv run pytest --cov=app --cov-fail-under=90
  6. Run linters: uv run ruff check app tests && uv run black --check app tests && uv run mypy app

Type of change

  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Breaking: WorkflowLaunchPayload.datasetId is replaced by s3InputKey. Frontend clients must update the launch payload field name. The seqera_dataset_id column is dropped from workflow_runs.

Checklist

  • I have added tests that prove my fix is effective or that my feature works
  • I have added or updated documentation where necessary
  • I have run linting and unit tests locally
  • The code follows the project's style guidelines

@vtnphan vtnphan marked this pull request as ready for review June 26, 2026 05:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant