Action Chunking with Transformers (ACT)#
This page documents the ACT baseline used in FGManip. This implementation is based on Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware and adapted from the original ACT codebase.
Integration status: ACT currently runs as a dedicated policy pipeline. Command interfaces and config layout are not yet fully unified with all other policy backends.
Installation#
Recommended setup uses conda/mamba for environment isolation.
conda create -n act-ms python=3.9
conda activate act-ms
pip install -e .Setup#
Before training, read the imitation learning setup guide: imitation learning setup documentation. It covers demo download, preprocessing, fair evaluation protocol, and common failure modes.
Training#
ACT learns from expert trajectories and is sensitive to episode horizon.
For slower demonstrations (e.g., motion planning or teleoperation), increase
--max-episode-steps
so policy rollouts have enough time to complete the task.
Practical guideline: set
--max-episode-steps
to about 2x the mean trajectory length of your training demos.
seed=1
demos=100
python train.py --env-id PickCube-v1 \
--demo-path ~/.maniskill/demos/PickCube-v1/motionplanning/trajectory.state.pd_ee_delta_pos.physx_cpu.h5 \
--control-mode "pd_ee_delta_pos" --sim-backend "physx_cpu" --num_demos $demos --max_episode_steps 100 \
--total_iters 30000 --log_freq 100 --eval_freq 5000 \
--exp-name=act-PickCube-v1-state-${demos}_motionplanning_demos-$seed \
--trackProject-specific example:
python core/policies/act/train.py \
--env-id grasp_part \
--object-name bottle \
--part-name cap \
--demo-path demos/output.state+rgb.pd_ee_delta_pos.physx_cpu.h5 \
--control-mode pd_ee_delta_pos \
--num_demos 40 \
--total_iters 100000 \
--exp-name act-grasp-cabinet-handle \
--max_episode_steps 100Maintenance Checklist#
Use this checklist to keep ACT documentation consistent across contributors.
Task + Data: environment ID, control mode, demo source, number of demonstrations, train/val split.
Core Hyperparameters: total iters, batch size, chunk size, learning rate, max episode steps.
Compute + Runtime: GPU type, wall-clock training time, average iteration speed.
Evaluation Protocol: number of eval episodes, seeds, success metric definition, termination rules.
Results Table: success rate mean/std over seeds, failure mode notes, demo videos.
Reproducibility: exact command, commit hash, dataset checksum/path, config snapshot.
Citation#
If you use this baseline, please cite:
@inproceedings{DBLP:conf/rss/ZhaoKLF23,
author = {Tony Z. Zhao and
Vikash Kumar and
Sergey Levine and
Chelsea Finn},
editor = {Kostas E. Bekris and
Kris Hauser and
Sylvia L. Herbert and
Jingjin Yu},
title = {Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware},
booktitle = {Robotics: Science and Systems XIX, Daegu, Republic of Korea, July
10-14, 2023},
year = {2023},
url = {https://doi.org/10.15607/RSS.2023.XIX.016},
doi = {10.15607/RSS.2023.XIX.016},
timestamp = {Thu, 20 Jul 2023 15:37:49 +0200},
biburl = {https://dblp.org/rec/conf/rss/ZhaoKLF23.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}