Baselines#

We provide a number of different baselines that learn from rewards via online reinforcement learning.

As part of these baselines we establish standardized reinforcement learning benchmarks that cover a wide range of difficulties (easy to solve for verification but not saturated) and diversity in types of robotics task, including but not limited to classic control, dextrous manipulation, table-top manipulation, mobile manipulation etc.

Online Reinforcement Learning Baselines#

List of already implemented and tested online reinforcement learning baselines. The results link takes you to the respective wandb pages for the results. You can change filters/views in the wandb workspace to view results with other settings (e.g. state based or RGB based training). Note that there are also reinforcement learning (offline RL, online imitation learning) baselines that leverage demonstrations, see the learning from demos page for more information.

Baseline

Code

Results

Paper

Proximal Policy Optimization (PPO)

Link

Link

Link

Soft Actor Critic (SAC)

Link

WIP

Link

Temporal Difference Learning for Model Predictive Control (TD-MPC2)

WIP

WIP

Link

Standard Benchmark#

The standard benchmark for RL in ManiSkill consists of two groups, a small set of 8 tasks, and a large set of 50 tasks, both with state based and visual based settings. All standard benchmark tasks come with normalized dense reward functions. A recommended small set is created so researchers without access to a lot of compute can still reasonably benchmark/compare their work. The large set is still being developed and tested.

These tasks span an extremely wide range of problems in robotics/reinforcement learning, namely: high dimensional observations/actions, large initial state distributions, articulated object manipulation, generalizable manipulation, mobile manipulation, locomotion etc.

Small Set Environment IDs:

PushCube-v1, PickCube-v1, PegInsertionSide-v1, PushT-v1, HumanoidPlaceAppleInBowl-v1, AnymalC-Reach-v1, OpenCabinetDrawer-v1

Evaluation#

For proper evaluation of RL policies, see how that code is setup in the evaluation section in the RL setup page. All results reported in the results linked above follow the same evaluation setup.