Skip to content

Add AudioSkillsXL and F0Bioacoustic datasets#259

Open
david-rx wants to merge 1 commit into
mainfrom
david/audio-skills-f0
Open

Add AudioSkillsXL and F0Bioacoustic datasets#259
david-rx wants to merge 1 commit into
mainfrom
david/audio-skills-f0

Conversation

@david-rx
Copy link
Copy Markdown
Contributor

@david-rx david-rx commented Apr 1, 2026

Summary

  • AudioSkillsXL: Multi-task audio understanding dataset combining conversation-style QA from nvidia/AudioSkills with AudioSet audio. Six sources (counting_qa, wavcaps, fsd50k, clotho_v2, audioset, audioset_sl) loadable individually or as a union via the "all" split. Handles structured messages column (JSON chat turns), dual data roots for local vs AudioSet-backed audio, and pre-resampled 16k/32k audio paths.

  • F0Bioacoustic: ~250k non-human vocalizations from 13 taxa with ground-truth F0 contours (Musikhin et al. 2025). Supports taxa filtering, pre-resampled 16k/32k audio, inline TSV F0 contour parsing into DataFrames, and Nyquist flagging. CC0-1.0 licensed.

Both follow existing esp-data conventions: @register_dataset, DatasetInfo, from_config, full numpy-style docstrings, _process/__getitem__/__iter__ protocol.

Test plan

  • test_audio_skills_xl.py (17 tests): info, data, columns, splits, sources, length, getitem, iteration, invalid split/sources, consistency, transforms from config, output mapping, audio processing, str repr, index error, messages parsing
  • test_f0_bioacoustic.py (20 tests): info, data, columns, splits, taxa, length, getitem, F0 contour parsing, iteration, invalid split/taxa, taxa filtering, consistency, transforms from config, output mapping, data root, audio processing, str repr, index error, get_available_labels
  • All 37 tests pass locally
  • All pre-commit hooks pass (ruff lint + format, codespell, trailing whitespace, etc.)

Made with Cursor

AudioSkillsXL: Multi-task audio understanding dataset combining
conversation-style QA from nvidia/AudioSkills with AudioSet audio.
Six sources (counting_qa, wavcaps, fsd50k, clotho_v2, audioset,
audioset_sl) loadable individually or as a union via the "all" split.
Includes structured messages column and pre-resampled audio paths.

F0Bioacoustic: ~250k non-human vocalizations from 13 taxa with
ground-truth fundamental frequency contours. Supports taxa filtering,
pre-resampled 16k/32k audio, and F0 contour parsing into DataFrames.

Both datasets follow existing esp-data conventions with full test
suites (37 tests total).

Made-with: Cursor
@david-rx david-rx requested a review from a team as a code owner April 1, 2026 05:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant