Skip to content

chenyangkang/Mockingbird

Repository files navigation

Mockingbird

Mockingbird is a minimum viable bird-call imitation app with separate Streamlit entrypoints for prediction, admin, and public contribution:

  • a public prediction app for end users
  • a private collection/training app for admins
  • an optional public contribution app for citizen-science uploads backed by Supabase

This first version is intentionally narrow:

  • 10 common North American species
  • Streamlit front end
  • separate public and private apps
  • hierarchical PyTorch sequence model that preserves phrase order instead of relying on blind clipping
  • trained on bird-reference audio plus human mimic audio
  • evaluated on held-out human mimic recordings only

Disclaimer

This is a vibe-coding project currently under development.

Included species

  • American Robin
  • Northern Cardinal
  • Blue Jay
  • Mourning Dove
  • Black-capped Chickadee
  • Carolina Wren
  • Red-winged Blackbird
  • American Crow
  • House Sparrow
  • Downy Woodpecker

Repo layout

  • streamlit_app.py: public prediction app
  • contribute_app.py: public citizen-science contribution app
  • collector_app.py: private collection + training app
  • train.py: trains the sequence model and writes a PyTorch checkpoint
  • scripts/download_xeno_canto.py: downloads a small, attribution-aware dataset
  • src/mockingbird/: reusable app, audio, feature, and inference code
  • data/species.csv: the MVP species list and UI hints
  • artifacts/: trained PyTorch checkpoint + metrics
  • supabase/mimic_submissions.sql: one-time schema setup for the public contribution app

Quickstart

Create a virtual environment and install the package:

python3 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -e .

Download a compact dataset:

python -m mockingbird.cli set-xc-key your_key_here
python scripts/download_xeno_canto.py --per-species 10

Xeno-Canto's current metadata API requires an API key for search. Audio download links are still public, but the repo's downloader now expects one of:

  • a saved local key in .mockingbird/secrets.toml
  • XENO_CANTO_API_KEY in your environment
  • --api-key passed directly

By default, the downloader now accepts BY, BY-SA, CC0, BY-NC, and BY-NC-SA recordings for local experimentation. If you want the stricter open-license-only set, use:

python scripts/download_xeno_canto.py --per-species 10 --strict-licenses

Train the sequence model:

python train.py

If you collect human mimic examples in the private admin app, they are automatically saved under data/human_mimics/ and the training script will include them on the next run.

If you have not downloaded bird-reference audio yet, the trainer can still learn from your own mimic dataset once you have examples for at least two species.

Run the public prediction app:

streamlit run streamlit_app.py

Run the private collection/training app:

streamlit run collector_app.py

Run the public citizen-science contribution app:

streamlit run contribute_app.py

Streamlit deployment

This repo is set up for Streamlit Community Cloud:

  1. Push the repository to GitHub.
  2. Confirm artifacts/mockingbird_sequence.pt exists, or train the model in advance and commit the small artifact.
  3. In Streamlit Community Cloud, deploy streamlit_app.py.
  4. If you want larger uploads, adjust .streamlit/config.toml.

Keep collector_app.py private and run it only locally or in an internal deployment.

If you want a public contribution portal as well, deploy a second Streamlit app from the same repo using contribute_app.py as the entrypoint.

Collecting a human imitation dataset

The private admin app includes a Collect Data workspace and a Train Model workspace.

Recommended format for each example:

  1. Choose the target species.
  2. If available, pick a specific downloaded reference clip for that species.
  3. Record one short imitation from your microphone.
  4. Save the example.

Each saved row stores:

  • target species
  • your recorded mimic path
  • optional paired reference clip path / URL
  • performer alias
  • notes

That makes it possible to fine-tune the model with your own imitation data without needing a separate annotation tool.

If you do not have a Xeno-Canto API key yet, you can still use the private collection app right away by:

  1. choosing the target species manually
  2. optionally uploading the exact bird reference clip you want to imitate
  3. recording your mimic from the browser microphone
  4. saving the paired example locally

Local secrets

To store the Xeno-Canto key once for this repo without exporting it every session:

python -m mockingbird.cli set-xc-key your_key_here

This writes a gitignored file at .mockingbird/secrets.toml.

If you prefer, the package will also read .streamlit/secrets.toml and the XENO_CANTO_API_KEY environment variable.

Supabase contribution backend

To accept public imitation uploads on Streamlit Community Cloud, use contribute_app.py with Supabase.

  1. Create a Supabase project.
  2. Run the SQL in supabase/mimic_submissions.sql.
  3. In the Streamlit app settings for contribute_app.py, add secrets like:
[supabase]
supabase_url = "https://YOUR_PROJECT.supabase.co"
supabase_secret_key = "YOUR_SECRET_KEY"
supabase_bucket = "mimic-audio"
supabase_table = "mimic_submissions"

prediction_app_url = "https://YOUR-PREDICTION-APP.streamlit.app"
contribution_app_url = "https://YOUR-CONTRIBUTION-APP.streamlit.app"

The contribution app stores uploaded audio files in Supabase Storage and submission metadata in the mimic_submissions table. Keep collector_app.py private for training and local admin work.

To pull approved public contributions back into your local training folder:

python scripts/sync_supabase_contributions.py --review-status approved

That command downloads consented submissions into data/human_mimics/supabase_imports/ and writes a training-ready CSV at data/human_mimics/supabase_imports.csv.

After that, python train.py will automatically include both:

  • your local/private data/human_mimics/metadata.csv
  • synced public data/human_mimics/supabase_imports.csv

Model approach

The current model is a hierarchical PyTorch sequence classifier designed around your real task:

  • input: human imitation
  • output: bird species

Key design choices:

  • recordings are split into phrases by silence, not blind fixed clips
  • each phrase is represented by a log-mel sequence over the whole phrase
  • the model also extracts a note-sequence branch using pitch tracking, note durations, gaps, onset peaks, and relative pitch intervals
  • a dual-branch phrase encoder fuses:
    • time-compressed spectrogram sequence
    • note-token sequence
  • a second recording-level Transformer keeps the ordered phrase sequence connected, including phrase timing features
  • training uses:
    • all downloaded bird-reference recordings
    • human mimic training split
  • validation uses:
    • held-out human mimic recordings only, scored at the whole-recording level

This makes the headline metrics much closer to the real product question: can a human imitation be mapped back to the intended bird?

Open-source and data licensing

  • Code license: MIT
  • Raw audio is not committed to git
  • The downloader accepts CC BY 4.0, CC BY-SA 4.0, CC0, CC BY-NC 4.0, and CC BY-NC-SA 4.0 by default
  • Use --strict-licenses if you only want the more permissive open-license subset
  • Keep attribution metadata from data/raw/downloads.csv if you reuse the downloaded clips

Limitations

  • Human mimic evaluation is only as good as the amount and quality of your collected mimic dataset
  • Pitch tracking on noisy or very breathy recordings can still be unstable
  • Some file formats may depend on host audio codecs; WAV works best for the MVP

Next steps

  • collect a paired human imitation dataset
  • add top matched reference clips and attribution links in the UI
  • move to an embedding-retrieval model for better mimic handling
  • expose a small API if the app outgrows Streamlit-only hosting

About

Identify birds by mimicking their songs with your voice

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages