Add AudioSkillsXL and F0Bioacoustic datasets by david-rx · Pull Request #259 · earthspecies/esp-data

david-rx · 2026-04-01T05:15:26Z

Summary

AudioSkillsXL: Multi-task audio understanding dataset combining conversation-style QA from nvidia/AudioSkills with AudioSet audio. Six sources (counting_qa, wavcaps, fsd50k, clotho_v2, audioset, audioset_sl) loadable individually or as a union via the "all" split. Handles structured messages column (JSON chat turns), dual data roots for local vs AudioSet-backed audio, and pre-resampled 16k/32k audio paths.
F0Bioacoustic: ~250k non-human vocalizations from 13 taxa with ground-truth F0 contours (Musikhin et al. 2025). Supports taxa filtering, pre-resampled 16k/32k audio, inline TSV F0 contour parsing into DataFrames, and Nyquist flagging. CC0-1.0 licensed.

Both follow existing esp-data conventions: @register_dataset, DatasetInfo, from_config, full numpy-style docstrings, _process/__getitem__/__iter__ protocol.

Test plan

test_audio_skills_xl.py (17 tests): info, data, columns, splits, sources, length, getitem, iteration, invalid split/sources, consistency, transforms from config, output mapping, audio processing, str repr, index error, messages parsing
test_f0_bioacoustic.py (20 tests): info, data, columns, splits, taxa, length, getitem, F0 contour parsing, iteration, invalid split/taxa, taxa filtering, consistency, transforms from config, output mapping, data root, audio processing, str repr, index error, get_available_labels
All 37 tests pass locally
All pre-commit hooks pass (ruff lint + format, codespell, trailing whitespace, etc.)

Made with Cursor

AudioSkillsXL: Multi-task audio understanding dataset combining conversation-style QA from nvidia/AudioSkills with AudioSet audio. Six sources (counting_qa, wavcaps, fsd50k, clotho_v2, audioset, audioset_sl) loadable individually or as a union via the "all" split. Includes structured messages column and pre-resampled audio paths. F0Bioacoustic: ~250k non-human vocalizations from 13 taxa with ground-truth fundamental frequency contours. Supports taxa filtering, pre-resampled 16k/32k audio, and F0 contour parsing into DataFrames. Both datasets follow existing esp-data conventions with full test suites (37 tests total). Made-with: Cursor

david-rx requested a review from a team as a code owner April 1, 2026 05:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AudioSkillsXL and F0Bioacoustic datasets#259

Add AudioSkillsXL and F0Bioacoustic datasets#259
david-rx wants to merge 1 commit into
mainfrom
david/audio-skills-f0

david-rx commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

david-rx commented Apr 1, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant