Learning contact-rich whole-body manipulation with example-guided reinforcement learning
Data files
Aug 18, 2025 version files 25.40 MB
-
Punyo_EGRL.zip
25.39 MB
-
README.md
8.22 KB
Abstract
Humans employ a diversity of skills and strategies to effectively manipulate various objects, ranging from dexterous in-hand manipulation (fine motor skills) to complex whole-body manipulation (gross motor skills). The latter involves full-body engagement and extensive contact with various body parts beyond just the hands, where the compliance of our skin and muscles plays a crucial role in increasing contact stability and mitigating uncertainty. For robots, synthesizing such contact-rich behaviors has fundamental challenges due to the rapidly growing combinatorics inherent to this amount of contact, making explicit reasoning about all contact interactions intractable. We explore the use of example-guided reinforcement learning to generate robust whole-body skills for the manipulation of large and unwieldy objects. Our method’s effectiveness is demonstrated on Toyota Research Institute’s Punyo robot, a humanoid upper-body with highly deformable, pressure-sensing skin. Training is conducted in simulation with only a single example motion per object manipulation task, and policies are easily transferred to hardware owing to domain randomization and the robot’s compliance. The resulting agent can manipulate various everyday objects, such as a water jug and large boxes, in a similar fashion to the example motion. Additionally, we show blind dexterous whole-body manipulation, relying solely on proprioceptive and tactile feedback without object pose tracking. Our analysis highlights the critical role of compliance in facilitating whole-body manipulation with humanoid robots.
This dataset contains CC0-licensed components authored by us for the Punyo_EGRL project. NVIDIA and other third‑party software are not redistributed here; instead, they are linked as a simultaneous Zenodo software record under a BSD 3-Clause ‘New’ or ‘Revised’ license, per upstream terms. This README describes the CC0 data and code included, file structure, relationships, and reuse guidance.
Description of the data and file structure
Top-level layout in this dataset (Punyo_EGRL.zip):
- assets/
- punyo/ — URDF and mesh assets for the Punyo robot we authored:
punyo_v2.urdfmeshes/*.obj
- punyo/ — URDF and mesh assets for the Punyo robot we authored:
- code/
- isaacgymenvs/
- tasks/
punyo_amp.py— task implementation for Punyo AMPamp/punyo_amp_base.py— base task utilities used bypunyo_amp.pypunyo_helpers/— utilities for visualization and helpers
- cfg/
- task/
PunyoV2AMP.yaml— training/eval configuration for Punyo AMPPunyoV2AMP_test.yaml— test configuration for Punyo AMP
- train/
PunyoV2AMPPPO.yaml— training hyperparametersPunyoV2AMP_testPPO.yaml— testing/training variant hyperparameters
- task/
- tasks/
- isaacgymenvs/
- data/
- models/
task_over_shoulder_lift_jug.pth— trained checkpoint produced by our PunyoV2AMP method
- models/
Relationships:
code/isaacgymenvs/tasks/punyo_amp.pyconsumesassets/punyo/punyo_v2.urdfand associated meshes.code/isaacgymenvs/cfg/task/PunyoV2AMP*.yamlconfigure training/evaluation for the Punyo AMP task provided bypunyo_amp.py.code/isaacgymenvs/cfg/train/PunyoV2AMP*.yamlspecify PPO/AMP hyperparameters for training runs that produced the included checkpoint(s).
Units, conventions:
- Geometry meshes use meters; densities/materials follow Isaac Gym defaults unless specified in the URDF.
- Config YAMLs follow standard IsaacGymEnvs fields (documented in their repository).
- Checkpoints are PyTorch state dicts loadable with
torch.load(path, map_location='cpu').
Sharing/Access information
- Non‑CC0 software dependency (not redistributed here):
- NVIDIA Isaac Gym + IsaacGymEnvs and related upstream code.
- Our CC0 materials (this Dryad dataset) are intended to be combined with the Zenodo software to reproduce results and run Punyo tasks.
Data was derived from:
- Our authored Punyo URDF and meshes; task, base, helper code; training and task configuration YAMLs; and trained checkpoints we produced.
- Execution requires the upstream NVIDIA frameworks.
Code/Software
Where to place files in the NVIDIA IsaacGymEnvs repo
Place the following paths relative to the root of your local IsaacGymEnvs checkout:
- Robot assets
- From this dataset:
assets/punyo/→ Place into:assets/urdf/punyo/- Resulting paths:
assets/urdf/punyo/punyo_v2.urdfassets/urdf/punyo/meshes/*.obj
- Resulting paths:
- From this dataset:
- Task code and helpers
- From this dataset:
code/isaacgymenvs/tasks/punyo_amp.py→ Place into:isaacgymenvs/tasks/punyo_amp.py - From this dataset:
code/isaacgymenvs/tasks/amp/punyo_amp_base.py→ Place into:isaacgymenvs/tasks/amp/punyo_amp_base.py - From this dataset:
code/isaacgymenvs/tasks/punyo_helpers/→ Place into:isaacgymenvs/tasks/punyo_helpers/
- From this dataset:
- Task and training configs
- From this dataset:
code/isaacgymenvs/cfg/task/PunyoV2AMP.yaml→ Place into:isaacgymenvs/cfg/task/PunyoV2AMP.yaml - From this dataset:
code/isaacgymenvs/cfg/task/PunyoV2AMP_test.yaml→ Place into:isaacgymenvs/cfg/task/PunyoV2AMP_test.yaml - From this dataset:
code/isaacgymenvs/cfg/train/PunyoV2AMPPPO.yaml→ Place into:isaacgymenvs/cfg/train/PunyoV2AMPPPO.yaml - From this dataset:
code/isaacgymenvs/cfg/train/PunyoV2AMP_testPPO.yaml→ Place into:isaacgymenvs/cfg/train/PunyoV2AMP_testPPO.yaml
- From this dataset:
- Trained checkpoint
- From this dataset:
data/models/task_over_shoulder_lift_jug.pth→ Place into:isaacgymenvs/data/task_over_shoulder_lift_jug.pth
- From this dataset:
Set up
-
Install Anaconda by following the instructions.
Tip: If you don’t want anaconda to modify your shell script, then choose “no” for step 8 (which is the default choice).
If you do so, then in the future, when you want to use anaconda, you will need to:
source path_to_anaconda/bin/activate (activate conda)
conda activate your_environment_nameFor more conda related command, check out this CONDA CHEAT SHEET.
Add channels to conda:
conda config --add channels conda-forge -
Fork this repo in GitHub and clone it.
mkdir punyo_rl_isaac cd punyo_rl_isaac git clone {your_fork_of_punyo_rl} -
Install IsaacGym
Download IsaacGym and extract the file in
punyo_rl_isaacviatar -xvzf ~/Downloads/IsaacGym_Preview_4_Package.tar.gzRun the following commands:
conda env create -f rlgpu.yml conda activate rlgpu cd isaacgym/python pip install -e . export LD_LIBRARY_PATH=~/anaconda3/envs/rlgpu/lib:$LD_LIBRARY_PATH export ISAAC_GYM_PATH=~/IsaacGymEnvs/isaacgymTry to run an example:
cd python/examples python joint_monkey.py # You should see a bunch of humanoids. -
Install IsaacGymEnvs
cd punyo_rl pip install -e .Try the example:
cd isaacgymenvs python train.py task=PunyoV2AMP
Run the code
To train a policy,
-
Set up the robot and object for your environment, e.g., in PunyoV2AMP.yaml:
asset: assetFilePunyo: "urdf/punyo/punyo_v2.urdf" task: task_name: assetFileBox: "urdf/objects/jug_5_gallon_onshape.urdf" -
Set up the demonstration file for your training, e.g., in PunyoV2AMP.yaml:
asset: motion_file_path: 'data/paper/task_over_shoulder_lift_jug/teleop_eigen/original/'
The program will collect all the .pkl files in the specified directory to form the motion library. There can be only one or multiple .pkl files in the folder.
-
Set up the moving range for the manipuland's initial position, for example:
object: xRange: 0.05 yRange: 0.05 yawRange: 0.05
The purpose of these parameters is for domain randomization.
-
Set up the target pose for the manipuland, for example:
task: target_x: 0.13 target_y: 0.3 target_z: 0.64 target_roll: 0.0 target_pitch: -1.5708 target_yaw: 0.0 -
Set up the observation for the discriminator and the policy, for example:
env: ampObservation: [robot_dof] policyObservation: [robot_dof, box_pose, previous_actions] criticObservation: [robot_dof, box_pose, previous_actions]All the possible options are robot_dof (14), robot_vel (14), box_pose (7 position+quaternion), ee_pose (72 left p+q, right p+q), ee_binary_contact (2 left, right), floatie_binary_contact (72 left shoulder to hand, right shoulder to hand).
-
Set up the initialization for the robot and the box, for example:
env: stateInit: "Default"All the possible options are:
a. Default: Set the robot to the default state (if any), and set the box to the default state (if any) plus specified disturbance.
b. Start: Set the robot and the box to the start state of the demonstration.
c. Random: Set the robot and the box to a random state of the demonstration.
d. Hybrid: A combination of Default and Random. -
Start your training:
python train.py task=PunyoV2AMP wandb_activate=True wandb_project=YOUR_PROJECT_NAME wandb_logcode_dir=ABSOLUTE_PUNYO_RL_PATH # The visualization can be toggled with a "v" key press. # The GPU to be used can be specified by adding the flags sim_device and rl_device # (e.g. sim_device=cuda:1 rl_device=cuda:1). -
Test your policy:
python play_policy.py --checkpoint_file data/task_over_shoulder_lift_jug.pth
