Skip to content

Concat datasets are slower than they need to be. #15

Description

@HesitantlyHuman

When you concat multiple datasets together, you get a deeply nested structure.
Something like this:

concat_dataset/
├─ concat_dataset/
│  ├─ concat_dataset/
│  │  ├─ concat_dataset/
│  │  │  ├─ dataset_0
│  │  │  ├─ dataset_1
│  │  ├─ dataset_2
│  ├─ dataset_3
├─ dataset_4

This is inefficient when instead we could flatten nested concat datasets into:

concat_dataset/
├─ dataset_0
├─ dataset_1
├─ dataset_2
├─ dataset_3
├─ dataset_4

I don't have numbers on the actual performance implications, but it will become significant if a user is doing many splits and concats.

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions