Mockingbird is a minimum viable bird-call imitation app with separate Streamlit entrypoints for prediction, admin, and public contribution:
- a public prediction app for end users
- a private collection/training app for admins
- an optional public contribution app for citizen-science uploads backed by Supabase
This first version is intentionally narrow:
- 10 common North American species
- Streamlit front end
- separate public and private apps
- hierarchical PyTorch sequence model that preserves phrase order instead of relying on blind clipping
- trained on bird-reference audio plus human mimic audio
- evaluated on held-out human mimic recordings only
This is a vibe-coding project currently under development.
- American Robin
- Northern Cardinal
- Blue Jay
- Mourning Dove
- Black-capped Chickadee
- Carolina Wren
- Red-winged Blackbird
- American Crow
- House Sparrow
- Downy Woodpecker
streamlit_app.py: public prediction appcontribute_app.py: public citizen-science contribution appcollector_app.py: private collection + training apptrain.py: trains the sequence model and writes a PyTorch checkpointscripts/download_xeno_canto.py: downloads a small, attribution-aware datasetsrc/mockingbird/: reusable app, audio, feature, and inference codedata/species.csv: the MVP species list and UI hintsartifacts/: trained PyTorch checkpoint + metricssupabase/mimic_submissions.sql: one-time schema setup for the public contribution app
Create a virtual environment and install the package:
python3 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -e .Download a compact dataset:
python -m mockingbird.cli set-xc-key your_key_here
python scripts/download_xeno_canto.py --per-species 10Xeno-Canto's current metadata API requires an API key for search. Audio download links are still public, but the repo's downloader now expects one of:
- a saved local key in
.mockingbird/secrets.toml XENO_CANTO_API_KEYin your environment--api-keypassed directly
By default, the downloader now accepts BY, BY-SA, CC0, BY-NC, and BY-NC-SA recordings for local experimentation. If you want the stricter open-license-only set, use:
python scripts/download_xeno_canto.py --per-species 10 --strict-licensesTrain the sequence model:
python train.pyIf you collect human mimic examples in the private admin app, they are automatically saved under data/human_mimics/ and the training script will include them on the next run.
If you have not downloaded bird-reference audio yet, the trainer can still learn from your own mimic dataset once you have examples for at least two species.
Run the public prediction app:
streamlit run streamlit_app.pyRun the private collection/training app:
streamlit run collector_app.pyRun the public citizen-science contribution app:
streamlit run contribute_app.pyThis repo is set up for Streamlit Community Cloud:
- Push the repository to GitHub.
- Confirm
artifacts/mockingbird_sequence.ptexists, or train the model in advance and commit the small artifact. - In Streamlit Community Cloud, deploy
streamlit_app.py. - If you want larger uploads, adjust
.streamlit/config.toml.
Keep collector_app.py private and run it only locally or in an internal deployment.
If you want a public contribution portal as well, deploy a second Streamlit app from the same repo using contribute_app.py as the entrypoint.
The private admin app includes a Collect Data workspace and a Train Model workspace.
Recommended format for each example:
- Choose the target species.
- If available, pick a specific downloaded reference clip for that species.
- Record one short imitation from your microphone.
- Save the example.
Each saved row stores:
- target species
- your recorded mimic path
- optional paired reference clip path / URL
- performer alias
- notes
That makes it possible to fine-tune the model with your own imitation data without needing a separate annotation tool.
If you do not have a Xeno-Canto API key yet, you can still use the private collection app right away by:
- choosing the target species manually
- optionally uploading the exact bird reference clip you want to imitate
- recording your mimic from the browser microphone
- saving the paired example locally
To store the Xeno-Canto key once for this repo without exporting it every session:
python -m mockingbird.cli set-xc-key your_key_hereThis writes a gitignored file at .mockingbird/secrets.toml.
If you prefer, the package will also read .streamlit/secrets.toml and the XENO_CANTO_API_KEY environment variable.
To accept public imitation uploads on Streamlit Community Cloud, use contribute_app.py with Supabase.
- Create a Supabase project.
- Run the SQL in
supabase/mimic_submissions.sql. - In the Streamlit app settings for
contribute_app.py, add secrets like:
[supabase]
supabase_url = "https://YOUR_PROJECT.supabase.co"
supabase_secret_key = "YOUR_SECRET_KEY"
supabase_bucket = "mimic-audio"
supabase_table = "mimic_submissions"
prediction_app_url = "https://YOUR-PREDICTION-APP.streamlit.app"
contribution_app_url = "https://YOUR-CONTRIBUTION-APP.streamlit.app"The contribution app stores uploaded audio files in Supabase Storage and submission metadata in the mimic_submissions table. Keep collector_app.py private for training and local admin work.
To pull approved public contributions back into your local training folder:
python scripts/sync_supabase_contributions.py --review-status approvedThat command downloads consented submissions into data/human_mimics/supabase_imports/ and writes a training-ready CSV at data/human_mimics/supabase_imports.csv.
After that, python train.py will automatically include both:
- your local/private
data/human_mimics/metadata.csv - synced public
data/human_mimics/supabase_imports.csv
The current model is a hierarchical PyTorch sequence classifier designed around your real task:
- input: human imitation
- output: bird species
Key design choices:
- recordings are split into phrases by silence, not blind fixed clips
- each phrase is represented by a log-mel sequence over the whole phrase
- the model also extracts a note-sequence branch using pitch tracking, note durations, gaps, onset peaks, and relative pitch intervals
- a dual-branch phrase encoder fuses:
- time-compressed spectrogram sequence
- note-token sequence
- a second recording-level Transformer keeps the ordered phrase sequence connected, including phrase timing features
- training uses:
- all downloaded bird-reference recordings
- human mimic training split
- validation uses:
- held-out human mimic recordings only, scored at the whole-recording level
This makes the headline metrics much closer to the real product question: can a human imitation be mapped back to the intended bird?
- Code license: MIT
- Raw audio is not committed to git
- The downloader accepts
CC BY 4.0,CC BY-SA 4.0,CC0,CC BY-NC 4.0, andCC BY-NC-SA 4.0by default - Use
--strict-licensesif you only want the more permissive open-license subset - Keep attribution metadata from
data/raw/downloads.csvif you reuse the downloaded clips
- Human mimic evaluation is only as good as the amount and quality of your collected mimic dataset
- Pitch tracking on noisy or very breathy recordings can still be unstable
- Some file formats may depend on host audio codecs; WAV works best for the MVP
- collect a paired human imitation dataset
- add top matched reference clips and attribution links in the UI
- move to an embedding-retrieval model for better mimic handling
- expose a small API if the app outgrows Streamlit-only hosting