Action Chunking with Transformers (ACT)#

This page documents the ACT baseline used in FGManip. This implementation is based on Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware and adapted from the original ACT codebase.

Integration status: ACT currently runs as a dedicated policy pipeline. Command interfaces and config layout are not yet fully unified with all other policy backends.

Installation#

Recommended setup uses conda/mamba for environment isolation.

conda create -n act-ms python=3.9
conda activate act-ms
pip install -e .

Setup#

Before training, read the imitation learning setup guide: imitation learning setup documentation. It covers demo download, preprocessing, fair evaluation protocol, and common failure modes.

Training#

ACT learns from expert trajectories and is sensitive to episode horizon. For slower demonstrations (e.g., motion planning or teleoperation), increase --max-episode-steps so policy rollouts have enough time to complete the task.

Practical guideline: set --max-episode-steps to about 2x the mean trajectory length of your training demos.

seed=1
demos=100
python train.py --env-id PickCube-v1 \
  --demo-path ~/.maniskill/demos/PickCube-v1/motionplanning/trajectory.state.pd_ee_delta_pos.physx_cpu.h5 \
  --control-mode "pd_ee_delta_pos" --sim-backend "physx_cpu" --num_demos $demos --max_episode_steps 100 \
  --total_iters 30000 --log_freq 100 --eval_freq 5000 \
  --exp-name=act-PickCube-v1-state-${demos}_motionplanning_demos-$seed \
  --track

Project-specific example:

python core/policies/act/train.py \
  --env-id grasp_part \
  --object-name bottle \
  --part-name cap \
  --demo-path demos/output.state+rgb.pd_ee_delta_pos.physx_cpu.h5 \
  --control-mode pd_ee_delta_pos \
  --num_demos 40 \
  --total_iters 100000 \
  --exp-name act-grasp-cabinet-handle \
  --max_episode_steps 100

Maintenance Checklist#

Use this checklist to keep ACT documentation consistent across contributors.

  • Task + Data: environment ID, control mode, demo source, number of demonstrations, train/val split.

  • Core Hyperparameters: total iters, batch size, chunk size, learning rate, max episode steps.

  • Compute + Runtime: GPU type, wall-clock training time, average iteration speed.

  • Evaluation Protocol: number of eval episodes, seeds, success metric definition, termination rules.

  • Results Table: success rate mean/std over seeds, failure mode notes, demo videos.

  • Reproducibility: exact command, commit hash, dataset checksum/path, config snapshot.

Citation#

If you use this baseline, please cite:

@inproceedings{DBLP:conf/rss/ZhaoKLF23,
  author       = {Tony Z. Zhao and
                  Vikash Kumar and
                  Sergey Levine and
                  Chelsea Finn},
  editor       = {Kostas E. Bekris and
                  Kris Hauser and
                  Sylvia L. Herbert and
                  Jingjin Yu},
  title        = {Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware},
  booktitle    = {Robotics: Science and Systems XIX, Daegu, Republic of Korea, July
                  10-14, 2023},
  year         = {2023},
  url          = {https://doi.org/10.15607/RSS.2023.XIX.016},
  doi          = {10.15607/RSS.2023.XIX.016},
  timestamp    = {Thu, 20 Jul 2023 15:37:49 +0200},
  biburl       = {https://dblp.org/rec/conf/rss/ZhaoKLF23.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}