STT Audio Pipeline

Scalable audio splitting pipeline for Speech-to-Text data preparation. Downloads audio from MinIO, splits using Silero VAD into segments (max 30 seconds), and uploads to AWS S3.

Key Features:

AWS SQS job queue for distributed processing
EC2 Spot Auto Scaling (cost-effective for large batches)
Silero VAD for accurate speech detection
Supports 75,000+ hours of audio processing

Quick Start

# 1. Configure environment
cp deploy/.env.example deploy/.env
# Edit deploy/.env with your credentials

# 2. Deploy infrastructure
python deploy/scripts/deploy_stack.py --action deploy

# 3. Enqueue jobs
python deploy/scripts/enqueue_batch.py --limit 1000 --collection amdo

# 4. Monitor progress
python deploy/scripts/monitor_progress.py

Architecture

MinIO (source) -> SQS Queue -> EC2 Spot Workers -> AWS S3 (output)

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
audio_download_and_split		audio_download_and_split
deploy		deploy
inference_runner		inference_runner
job_queue		job_queue
json_config		json_config
make_db_csv		make_db_csv
others		others
util		util
workers		workers
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
TODO.md		TODO.md
requirements.txt		requirements.txt
run_all.sh		run_all.sh
upload_to_s3.py		upload_to_s3.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STT Audio Pipeline

Quick Start

Architecture

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

STT Audio Pipeline

Quick Start

Architecture

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages