Quickstart
The end-to-end MetaFine pipeline: record demonstrations, merge & replay them, convert to a training format, train a policy, then evaluate it closed-loop in the simulator. Every command below is the real, verified invocation.
Step 1 — Record demonstrations
The recorder drives a Franka arm with a motion-planning solver, so every saved trial is a successful expert demonstration.
Single skill — pick a registered env id, an asset, and a part:
$ python record.py -e grasp_part --object-name 100221 --part-name cap -n 5 --only-count-success # → demos/grasp_part/trial_0001/trajectory.h5 (one dir per trial)
Composite task — drive a multi-stage task graph (a multi-task env) instead of a single skill:
$ python record.py --task-graph configs/example_grasp_cap.yaml -n 5 --only-count-success
Env ids include grasp_part, peg_in_hole, toggle_switch, lid_opening, slide_along, plug_charger, stack_pyramid, draw_triangle, … (full list in the Task catalogue).
Output layout is <record-dir>/<env>/trial_NNNN/{trajectory.h5, trajectory.json} — one sub-directory per trial (override --record-dir, default demos/). With --only-count-success the recorder retries until -n trials succeed; failed attempts leave empty trial dirs, successful ones also carry a .mp4.
Step 2 — Merge trajectory shards
Recording produces one trajectory.h5 per trial_NNNN/ sub-dir. Point -i at the env directory; merge recurses into the trial dirs and writes one combined HDF5:
$ python utils/merge_trajectory.py \ -i demos/grasp_part \ -o demos/grasp_part/merged.h5 \ -p trajectory.h5
-p is the filename pattern to collect; -s True (default) keeps only successful trajectories — pass -s False to keep all.
Step 3 — Replay (render observations)
Recording stores actions + env states only. Replay re-runs each trajectory to render observations, writing a new .h5 whose name encodes the obs/control/backend choice.
Match the recording's control mode. Each env records in its own control mode (check trajectory.json → env_kwargs.control_mode; e.g. grasp_part records pd_joint_pos). Pass that same mode to -c and use --use-env-states for a deterministic, faithful replay. Converting to a different control mode is allowed but does action-replay (no --use-env-states) and noticeably lowers reproduction success.
$ python utils/replay_trajectory.py \ --traj-path demos/grasp_part/merged.h5 \ -o rgb \ -c pd_joint_pos # ← same control mode the recording used \ -b physx_cpu \ --use-env-states \ --save-traj \ --save-video \ --shader default # → demos/grasp_part/merged.rgb.pd_joint_pos.physx_cpu.h5
For task-graph data also pass --allow-failure: replay is task-graph-agnostic (it can't re-evaluate the goal predicate), so success is decided at record time — replay must not re-filter on its own success check.
$ python utils/replay_trajectory.py --traj-path <merged>.h5 \ -o rgb -c pd_joint_pos -b physx_cpu --use-env-states --allow-failure --save-traj
Step 4 — Convert for training
Convert the replayed trajectory into a training format. Two targets:
LeRobot (for LeRobot / StarVLA training):
$ python utils/convert_to_lerobot.py \ --traj-path demos/grasp_part/merged.rgb.pd_joint_pos.physx_cpu.h5 \ --output-dir demos/grasp_part/lerobot_grasp_part \ --task-name "Grasp the cap of the bottle." \ --fps 30 \ --robot-type panda
RLDS (TFDS-style, for OpenVLA):
$ python utils/convert_to_rlds.py \ -i demos/grasp_part/100221 \ -o demos/datasets/rlds \ --dataset-name grasp_part_rlds \ --image-size 256
Step 5 — Train
Training runs inside the policy's own framework. Two verified paths:
- LeRobot — point the standard LeRobot training pipeline at the
convert_to_lerobotoutput directory. - StarVLA — place the LeRobot dataset under
core/policies/starvla/starVLA/playground/Dataset/and register it indataloader/gr00t_lerobot/mixtures.py. StarVLA expects LeRobot 2.1, so convert a 3.0 dataset withlerobot_v30_to_v21(addmodality.jsontometa/), then runbash run_libero_train.sh. Full steps incore/policies/starvla/train/README.md.
Step 6 — Evaluate / infer in the platform
Run the trained checkpoint closed-loop in the MetaFine simulator. Example with π0.5 (per-policy flags are documented in each core/policies/<name>/README.md):
$ python core/policies/pi05/evaluate.py \ --policy-path /path/to/pretrained_model \ --env-id grasp_part \ --object-name 100221 \ --part-name cap \ --obs-mode rgb \ --control-mode pd_joint_delta_pos \ --n-episodes 50 \ --device cuda \ --task "Grasp the cap of the bottle." \ --record-dir eval_out --save-video
For the three-stage diagnostic eval (semantic-intervention / object-swap — the protocol behind the Understanding axis), use the wrapper:
$ bash core/policies/pi05/run_eval_three_stage.sh --policy-path ... --env-id ... --task "..."
Next steps
- Architecture — how skills, task graphs, and the eval pipeline fit together.
- Task graphs — compose multi-step tasks from atomic skills.
- Understanding the three-dimension protocol — what each axis means and how to read it.
- Onboarding a new URDF — bring your own asset into MetaFine.