User Guide · Data Pipelines · LeRobot export

LeRobot export

A replayed trajectory converts cleanly into the HuggingFace LeRobot dataset format — the input to the verified LeRobot and StarVLA training paths. utils/convert_to_lerobot.py reads the observation-augmented .h5 produced by replay and writes a LeRobot dataset directory.

Convert a replayed trajectory

Point --traj-path at the replayed file (the one whose name encodes obs / control / backend, e.g. merged.rgb.pd_joint_pos.physx_cpu.h5 — use the recording's control mode) — not the raw recording. --task-name is the natural-language instruction baked into every frame; keep it consistent with what you'll pass at eval time.

$ python utils/convert_to_lerobot.py \
    --traj-path demos/grasp_part/merged.rgb.pd_joint_pos.physx_cpu.h5 \
    --output-dir demos/grasp_part/lerobot_grasp_part \
    --task-name "Grasp the cap of the bottle." \
    --fps 30 \
    --robot-type panda

Flags

Flag	Default	Notes
`--traj-path`	—	The replayed (obs-augmented) `.h5`
`--output-dir`	—	LeRobot dataset directory to write
`--task-name`	`None`	Natural-language instruction stored per frame
`--fps`	30	Dataset frame rate
`--robot-type`	`None`	e.g. `panda`
`--chunks-size`	default	Parquet chunk size
`--image-size`	`None`	Optional resize, e.g. `224x224`
`--max-episodes`	`None`	Cap episodes converted (debugging)

Feeding training

The output directory is a standard LeRobot dataset. Two verified training paths consume it:

LeRobot — point the standard LeRobot training pipeline at --output-dir.
StarVLA — copy the dataset under core/policies/starvla/starVLA/playground/Dataset/. StarVLA expects LeRobot 2.1; if you produced a 3.0 dataset, convert it with lerobot_v30_to_v21 and add a modality.json to meta/. See core/policies/starvla/train/README.md.

Canonical source: python utils/convert_to_lerobot.py --help.