Skip to content

Moving GBIFConverter to esp-data#226

Merged
benjaminsshoffman merged 6 commits into
mainfrom
gagan/add-gbif-converter
Jan 22, 2026
Merged

Moving GBIFConverter to esp-data#226
benjaminsshoffman merged 6 commits into
mainfrom
gagan/add-gbif-converter

Conversation

@GaganNarula
Copy link
Copy Markdown
Collaborator

  • Port GBIFConverter from taxonomy repo to esp-data as part of the new data discovery module (calling it discover for now, the idea would be to have a cli tool first like uv run discover .. some queries ... Having taxonomy in this module will help to build a more comprehensive DatasetInfo c.f. Data discovery 177 #189 and do taxonomy validation.
  • Added a AddTaxonomy transform that can help users add GBIF taxonomy to their datasets.
  • Tests

Copy link
Copy Markdown
Contributor

@benjaminsshoffman benjaminsshoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

App looks good, but should have the preprocessing steps included for the future.

Also, adding some versioning would be really helpful. Could have a new folder in esp-ml-datasets for taxonomy, with gbif_animals_0_0_1.tsv as the first version? I'm guessing there's a more comprehensive way of doing versioning in the future, but this would be good until then

Comment thread esp_data/discover/gbif_taxonomy.py Outdated

TAXONOMY_RANKS = ["kingdom", "phylum", "class", "order", "family", "genus"]
# TODO: need a more managed location for this file
DEFAULT_LOCATION = "gs://sound-event-detection/taxonomy/gbif_animals.tsv"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to have the preprocessing script as well? it's probably good in case we need to make changes in the future. It's located here, with explanation in the top docstring (currently, it requires to download the darwin core archive to local)

https://github.com/earthspecies/taxonomy/blob/main/scripts/v2_source_to_tsv.py

@GaganNarula GaganNarula marked this pull request as ready for review January 22, 2026 09:00
@benjaminsshoffman benjaminsshoffman self-requested a review January 22, 2026 16:28
Copy link
Copy Markdown
Contributor

@benjaminsshoffman benjaminsshoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@benjaminsshoffman benjaminsshoffman merged commit d00401e into main Jan 22, 2026
8 checks passed
@benjaminsshoffman benjaminsshoffman deleted the gagan/add-gbif-converter branch January 22, 2026 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants