User Guide · Data Pipelines · LeRobot export

LeRobot export

A replayed trajectory converts cleanly into the HuggingFace LeRobot dataset format — the input to the verified LeRobot and StarVLA training paths. utils/convert_to_lerobot.py reads the observation-augmented .h5 produced by replay and writes a LeRobot dataset directory.

Convert a replayed trajectory

Point --traj-path at the replayed file (the one whose name encodes obs / control / backend, e.g. merged.rgb.pd_joint_pos.physx_cpu.h5 — use the recording's control mode) — not the raw recording. --task-name is the natural-language instruction baked into every frame; keep it consistent with what you'll pass at eval time.

$ python utils/convert_to_lerobot.py \
    --traj-path demos/grasp_part/merged.rgb.pd_joint_pos.physx_cpu.h5 \
    --output-dir demos/grasp_part/lerobot_grasp_part \
    --task-name "Grasp the cap of the bottle." \
    --fps 30 \
    --robot-type panda

Flags

FlagDefaultNotes
--traj-pathThe replayed (obs-augmented) .h5
--output-dirLeRobot dataset directory to write
--task-nameNoneNatural-language instruction stored per frame
--fps30Dataset frame rate
--robot-typeNonee.g. panda
--chunks-sizedefaultParquet chunk size
--image-sizeNoneOptional resize, e.g. 224x224
--max-episodesNoneCap episodes converted (debugging)

Feeding training

The output directory is a standard LeRobot dataset. Two verified training paths consume it:

  • LeRobot — point the standard LeRobot training pipeline at --output-dir.
  • StarVLA — copy the dataset under core/policies/starvla/starVLA/playground/Dataset/. StarVLA expects LeRobot 2.1; if you produced a 3.0 dataset, convert it with lerobot_v30_to_v21 and add a modality.json to meta/. See core/policies/starvla/train/README.md.

Canonical source: python utils/convert_to_lerobot.py --help.