User Guide · Getting Started · Quickstart

Quickstart

The end-to-end MetaFine pipeline: record demonstrations, merge & replay them, convert to a training format, train a policy, then evaluate it closed-loop in the simulator. Every command below is the real, verified invocation.

Step 1 — Record demonstrations

The recorder drives a Franka arm with a motion-planning solver, so every saved trial is a successful expert demonstration.

Single skill — pick a registered env id, an asset, and a part:

$ python record.py -e grasp_part --object-name 100221 --part-name cap -n 5 --only-count-success
# → demos/grasp_part/trial_0001/trajectory.h5  (one dir per trial)

Composite task — drive a multi-stage task graph (a multi-task env) instead of a single skill:

$ python record.py --task-graph configs/example_grasp_cap.yaml -n 5 --only-count-success

Env ids include grasp_part, peg_in_hole, toggle_switch, lid_opening, slide_along, plug_charger, stack_pyramid, draw_triangle, … (full list in the Task catalogue).

Output layout is <record-dir>/<env>/trial_NNNN/{trajectory.h5, trajectory.json} — one sub-directory per trial (override --record-dir, default demos/). With --only-count-success the recorder retries until -n trials succeed; failed attempts leave empty trial dirs, successful ones also carry a .mp4.

Step 2 — Merge trajectory shards

Recording produces one trajectory.h5 per trial_NNNN/ sub-dir. Point -i at the env directory; merge recurses into the trial dirs and writes one combined HDF5:

$ python utils/merge_trajectory.py \
    -i demos/grasp_part \
    -o demos/grasp_part/merged.h5 \
    -p trajectory.h5

-p is the filename pattern to collect; -s True (default) keeps only successful trajectories — pass -s False to keep all.

Step 3 — Replay (render observations)

Recording stores actions + env states only. Replay re-runs each trajectory to render observations, writing a new .h5 whose name encodes the obs/control/backend choice.

Match the recording's control mode. Each env records in its own control mode (check trajectory.jsonenv_kwargs.control_mode; e.g. grasp_part records pd_joint_pos). Pass that same mode to -c and use --use-env-states for a deterministic, faithful replay. Converting to a different control mode is allowed but does action-replay (no --use-env-states) and noticeably lowers reproduction success.

$ python utils/replay_trajectory.py \
    --traj-path demos/grasp_part/merged.h5 \
    -o rgb \
    -c pd_joint_pos # ← same control mode the recording used \
    -b physx_cpu \
    --use-env-states \
    --save-traj \
    --save-video \
    --shader default
# → demos/grasp_part/merged.rgb.pd_joint_pos.physx_cpu.h5

For task-graph data also pass --allow-failure: replay is task-graph-agnostic (it can't re-evaluate the goal predicate), so success is decided at record time — replay must not re-filter on its own success check.

$ python utils/replay_trajectory.py --traj-path <merged>.h5 \
    -o rgb -c pd_joint_pos -b physx_cpu --use-env-states --allow-failure --save-traj

Step 4 — Convert for training

Convert the replayed trajectory into a training format. Two targets:

LeRobot (for LeRobot / StarVLA training):

$ python utils/convert_to_lerobot.py \
    --traj-path demos/grasp_part/merged.rgb.pd_joint_pos.physx_cpu.h5 \
    --output-dir demos/grasp_part/lerobot_grasp_part \
    --task-name "Grasp the cap of the bottle." \
    --fps 30 \
    --robot-type panda

RLDS (TFDS-style, for OpenVLA):

$ python utils/convert_to_rlds.py \
    -i demos/grasp_part/100221 \
    -o demos/datasets/rlds \
    --dataset-name grasp_part_rlds \
    --image-size 256

Step 5 — Train

Training runs inside the policy's own framework. Two verified paths:

  • LeRobot — point the standard LeRobot training pipeline at the convert_to_lerobot output directory.
  • StarVLA — place the LeRobot dataset under core/policies/starvla/starVLA/playground/Dataset/ and register it in dataloader/gr00t_lerobot/mixtures.py. StarVLA expects LeRobot 2.1, so convert a 3.0 dataset with lerobot_v30_to_v21 (add modality.json to meta/), then run bash run_libero_train.sh. Full steps in core/policies/starvla/train/README.md.

Step 6 — Evaluate / infer in the platform

Run the trained checkpoint closed-loop in the MetaFine simulator. Example with π0.5 (per-policy flags are documented in each core/policies/<name>/README.md):

$ python core/policies/pi05/evaluate.py \
    --policy-path /path/to/pretrained_model \
    --env-id grasp_part \
    --object-name 100221 \
    --part-name cap \
    --obs-mode rgb \
    --control-mode pd_joint_delta_pos \
    --n-episodes 50 \
    --device cuda \
    --task "Grasp the cap of the bottle." \
    --record-dir eval_out --save-video

For the three-stage diagnostic eval (semantic-intervention / object-swap — the protocol behind the Understanding axis), use the wrapper:

$ bash core/policies/pi05/run_eval_three_stage.sh --policy-path ... --env-id ... --task "..."

Next steps