Skip to content

Update lerobot datasource to support multiple lerobot datasets #49

@shorbaji

Description

@shorbaji

Summary

Add multi-root support to LeRobotDatasource and read_lerobot. Accept a single root or a list of roots; output looks like one large dataset with a dataset_index column identifying which root each row came from. episode_index and index retain per-root local values.

Key changes

  • root parameter widens to str | Path | list[str | Path]
  • One LeRobotDatasourceMetadata per root; validate that video_keys, fps, and feature names match across roots
  • _slice tags each range with a root_index; slicers themselves are unchanged
  • LeRobotReadTask takes segments: list[(root_idx, start, end)] + metas list
  • _read_fn loops over segments; each segment runs the existing pipeline independently
  • _build_batch appends dataset_index: int32
  • Single-root API is fully backward-compatible (dataset_index is always present, 0 for single-root)

Out of scope

No stats merging, no episode/row offset remapping, no new classes.

Acceptance criteria

  • read_lerobot(["/data/ds1", "/data/ds2"]) returns a single ray.data.Dataset
  • Each row has a dataset_index (int32) column
  • Single-root API unchanged
  • Tests cover multi-dataset round-trip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions