Learning contact-rich whole-body manipulation with example-guided reinforcement learning

Zhang, Mengchao 1 ; Barreiros, Jose1 ; Önol, Aykut1 ; Creasey, Sam1; Goncalves, Aimee1; Beaulieu, Andrew1; Bhat, Aditya1; Tsui, Kate1; Alspach, Alex1

Published Aug 18, 2025 on Dryad. https://doi.org/10.5061/dryad.ncjsxkt80

Data files

Aug 18, 2025 version files 25.40 MB

Punyo_EGRL.zip

25.39 MB
README.md

8.22 KB

Abstract

Humans employ a diversity of skills and strategies to effectively manipulate various objects, ranging from dexterous in-hand manipulation (fine motor skills) to complex whole-body manipulation (gross motor skills). The latter involves full-body engagement and extensive contact with various body parts beyond just the hands, where the compliance of our skin and muscles plays a crucial role in increasing contact stability and mitigating uncertainty. For robots, synthesizing such contact-rich behaviors has fundamental challenges due to the rapidly growing combinatorics inherent to this amount of contact, making explicit reasoning about all contact interactions intractable. We explore the use of example-guided reinforcement learning to generate robust whole-body skills for the manipulation of large and unwieldy objects. Our method’s effectiveness is demonstrated on Toyota Research Institute’s Punyo robot, a humanoid upper-body with highly deformable, pressure-sensing skin. Training is conducted in simulation with only a single example motion per object manipulation task, and policies are easily transferred to hardware owing to domain randomization and the robot’s compliance. The resulting agent can manipulate various everyday objects, such as a water jug and large boxes, in a similar fashion to the example motion. Additionally, we show blind dexterous whole-body manipulation, relying solely on proprioceptive and tactile feedback without object pose tracking. Our analysis highlights the critical role of compliance in facilitating whole-body manipulation with humanoid robots.

This dataset contains CC0-licensed components authored by us for the Punyo_EGRL project. NVIDIA and other third‑party software are not redistributed here; instead, they are linked as a simultaneous Zenodo software record under a BSD 3-Clause ‘New’ or ‘Revised’ license, per upstream terms. This README describes the CC0 data and code included, file structure, relationships, and reuse guidance.

Description of the data and file structure

Top-level layout in this dataset (Punyo_EGRL.zip):

assets/
- punyo/ — URDF and mesh assets for the Punyo robot we authored:
  - punyo_v2.urdf
  - meshes/*.obj
code/
- isaacgymenvs/
  - tasks/
    - punyo_amp.py — task implementation for Punyo AMP
    - amp/punyo_amp_base.py — base task utilities used by punyo_amp.py
    - punyo_helpers/ — utilities for visualization and helpers
  - cfg/
    - task/
      - PunyoV2AMP.yaml — training/eval configuration for Punyo AMP
      - PunyoV2AMP_test.yaml — test configuration for Punyo AMP
    - train/
      - PunyoV2AMPPPO.yaml — training hyperparameters
      - PunyoV2AMP_testPPO.yaml — testing/training variant hyperparameters
data/
- models/
  - task_over_shoulder_lift_jug.pth — trained checkpoint produced by our PunyoV2AMP method

Relationships:

code/isaacgymenvs/tasks/punyo_amp.py consumes assets/punyo/punyo_v2.urdf and associated meshes.
code/isaacgymenvs/cfg/task/PunyoV2AMP*.yaml configure training/evaluation for the Punyo AMP task provided by punyo_amp.py.
code/isaacgymenvs/cfg/train/PunyoV2AMP*.yaml specify PPO/AMP hyperparameters for training runs that produced the included checkpoint(s).

Units, conventions:

Geometry meshes use meters; densities/materials follow Isaac Gym defaults unless specified in the URDF.
Config YAMLs follow standard IsaacGymEnvs fields (documented in their repository).
Checkpoints are PyTorch state dicts loadable with torch.load(path, map_location='cpu').

Sharing/Access information

Non‑CC0 software dependency (not redistributed here):
- NVIDIA Isaac Gym + IsaacGymEnvs and related upstream code.
Our CC0 materials (this Dryad dataset) are intended to be combined with the Zenodo software to reproduce results and run Punyo tasks.

Data was derived from:

Our authored Punyo URDF and meshes; task, base, helper code; training and task configuration YAMLs; and trained checkpoints we produced.
Execution requires the upstream NVIDIA frameworks.

Code/Software

Where to place files in the NVIDIA IsaacGymEnvs repo

Place the following paths relative to the root of your local IsaacGymEnvs checkout:

Robot assets
- From this dataset: assets/punyo/ → Place into: assets/urdf/punyo/
  - Resulting paths:
    - assets/urdf/punyo/punyo_v2.urdf
    - assets/urdf/punyo/meshes/*.obj
Task code and helpers
- From this dataset: code/isaacgymenvs/tasks/punyo_amp.py → Place into: isaacgymenvs/tasks/punyo_amp.py
- From this dataset: code/isaacgymenvs/tasks/amp/punyo_amp_base.py → Place into: isaacgymenvs/tasks/amp/punyo_amp_base.py
- From this dataset: code/isaacgymenvs/tasks/punyo_helpers/ → Place into: isaacgymenvs/tasks/punyo_helpers/
Task and training configs
- From this dataset: code/isaacgymenvs/cfg/task/PunyoV2AMP.yaml → Place into: isaacgymenvs/cfg/task/PunyoV2AMP.yaml
- From this dataset: code/isaacgymenvs/cfg/task/PunyoV2AMP_test.yaml → Place into: isaacgymenvs/cfg/task/PunyoV2AMP_test.yaml
- From this dataset: code/isaacgymenvs/cfg/train/PunyoV2AMPPPO.yaml → Place into: isaacgymenvs/cfg/train/PunyoV2AMPPPO.yaml
- From this dataset: code/isaacgymenvs/cfg/train/PunyoV2AMP_testPPO.yaml → Place into: isaacgymenvs/cfg/train/PunyoV2AMP_testPPO.yaml
Trained checkpoint
- From this dataset: data/models/task_over_shoulder_lift_jug.pth → Place into: isaacgymenvs/data/task_over_shoulder_lift_jug.pth

Set up

Install Anaconda by following the instructions.

Tip: If you don’t want anaconda to modify your shell script, then choose “no” for step 8 (which is the default choice).
If you do so, then in the future, when you want to use anaconda, you will need to:
source path_to_anaconda/bin/activate (activate conda)
conda activate your_environment_name

For more conda related command, check out this CONDA CHEAT SHEET.

Add channels to conda:
```
conda config --add channels conda-forge
```

Fork this repo in GitHub and clone it.

mkdir punyo_rl_isaac
cd punyo_rl_isaac
git clone {your_fork_of_punyo_rl}

Install IsaacGym

Download IsaacGym and extract the file in punyo_rl_isaac via tar -xvzf ~/Downloads/IsaacGym_Preview_4_Package.tar.gz

Run the following commands:

conda env create -f rlgpu.yml
conda activate rlgpu
cd isaacgym/python
pip install -e .
export LD_LIBRARY_PATH=~/anaconda3/envs/rlgpu/lib:$LD_LIBRARY_PATH
export ISAAC_GYM_PATH=~/IsaacGymEnvs/isaacgym

Try to run an example:

cd python/examples
python joint_monkey.py  # You should see a bunch of humanoids.

Install IsaacGymEnvs

cd punyo_rl
pip install -e .

Try the example:

cd isaacgymenvs
python train.py task=PunyoV2AMP

Run the code

To train a policy,

Set up the robot and object for your environment, e.g., in PunyoV2AMP.yaml:

asset:
  assetFilePunyo: "urdf/punyo/punyo_v2.urdf"

task:
task_name:
   assetFileBox: "urdf/objects/jug_5_gallon_onshape.urdf"

Set up the demonstration file for your training, e.g., in PunyoV2AMP.yaml:

asset:
  motion_file_path: 'data/paper/task_over_shoulder_lift_jug/teleop_eigen/original/'

The program will collect all the .pkl files in the specified directory to form the motion library. There can be only one or multiple .pkl files in the folder.

Set up the moving range for the manipuland's initial position, for example:
```
object:
   xRange: 0.05
   yRange: 0.05
   yawRange: 0.05
```

The purpose of these parameters is for domain randomization.

Set up the target pose for the manipuland, for example:

task:
   target_x: 0.13
   target_y: 0.3
   target_z: 0.64
   target_roll: 0.0
   target_pitch: -1.5708
   target_yaw: 0.0

Set up the observation for the discriminator and the policy, for example:
```
env:
   ampObservation: [robot_dof]
   policyObservation: [robot_dof, box_pose, previous_actions]
   criticObservation: [robot_dof, box_pose, previous_actions]
```
All the possible options are robot_dof (14), robot_vel (14), box_pose (7 position+quaternion), ee_pose (72 left p+q, right p+q), ee_binary_contact (2 left, right), floatie_binary_contact (72 left shoulder to hand, right shoulder to hand).
Set up the initialization for the robot and the box, for example:
```
env:
  stateInit: "Default"
```
All the possible options are:
a. Default: Set the robot to the default state (if any), and set the box to the default state (if any) plus specified disturbance.
b. Start: Set the robot and the box to the start state of the demonstration.
c. Random: Set the robot and the box to a random state of the demonstration.
d. Hybrid: A combination of Default and Random.

Start your training:

python train.py task=PunyoV2AMP wandb_activate=True wandb_project=YOUR_PROJECT_NAME wandb_logcode_dir=ABSOLUTE_PUNYO_RL_PATH
# The visualization can be toggled with a "v" key press.

# The GPU to be used can be specified by adding the flags sim_device and rl_device
# (e.g. sim_device=cuda:1 rl_device=cuda:1).

Test your policy:

python play_policy.py --checkpoint_file data/task_over_shoulder_lift_jug.pth