Moving GBIFConverter to esp-data#226
Conversation
benjaminsshoffman
left a comment
There was a problem hiding this comment.
App looks good, but should have the preprocessing steps included for the future.
Also, adding some versioning would be really helpful. Could have a new folder in esp-ml-datasets for taxonomy, with gbif_animals_0_0_1.tsv as the first version? I'm guessing there's a more comprehensive way of doing versioning in the future, but this would be good until then
|
|
||
| TAXONOMY_RANKS = ["kingdom", "phylum", "class", "order", "family", "genus"] | ||
| # TODO: need a more managed location for this file | ||
| DEFAULT_LOCATION = "gs://sound-event-detection/taxonomy/gbif_animals.tsv" |
There was a problem hiding this comment.
Do we want to have the preprocessing script as well? it's probably good in case we need to make changes in the future. It's located here, with explanation in the top docstring (currently, it requires to download the darwin core archive to local)
https://github.com/earthspecies/taxonomy/blob/main/scripts/v2_source_to_tsv.py
discoverfor now, the idea would be to have a cli tool first likeuv run discover .. some queries ... Having taxonomy in this module will help to build a more comprehensive DatasetInfo c.f. Data discovery 177 #189 and do taxonomy validation.AddTaxonomytransform that can help users add GBIF taxonomy to their datasets.