Add spanish carrion crows dataset by mcusi · Pull Request #255 · earthspecies/esp-data

mcusi · 2026-03-17T04:33:46Z

I'm adding the unsynchronized, audio-only version of the carrion crow biologger dataset. It is a detection dataset (Voxaboxen bounding boxes). I followed WABAD/ArcticBirdSounds as templates.

GaganNarula · 2026-03-17T20:20:10Z

+
+        audio, sr = read_audio(audio_path)
+        # Should all be mono
+        # audio = audio_stereo_to_mono(audio, mono_method="average").astype(np.float32)


you can just remove the line if its already mono (maybe test with an assert in your tests)

Ok removed and added a test. I realized that I took out the .astype(np.float32) as well when I did this. Is it important to keep float32 for any reason? (The tests check for float64 so it's consistent with itself right now)

well float32 saves memory (especially GPU).. and float64 doesn't really offer much more useful precision for audio ?

the other thing is that every dataset (I believe) in esp_data is enforcing float32 so having float64 in just this one will break experiments where this dataset is concatenated / chained with others in a training loop ... wdyt ?

Okay, I'll leave in the conversion to float32

GaganNarula

some comments to discuss but in principle it looks good!

mcusi · 2026-03-18T17:13:56Z

I think I addressed all the comments so far now. Jules said it would be useful to add call types, I will do that in a second version

mcusi added 2 commits March 16, 2026 23:58

Add Spanish carrion crows dataset and passing tests

2a7af2f

Remove file without selection table

1810ddc

mcusi requested a review from a team as a code owner March 17, 2026 04:33