Skip to main content
Dryad

Robotic manipulation datasets for offline compositional reinforcement learning

Abstract

Offline reinforcement learning (RL) is a promising direction that allows RL agents to be pre-trained from large datasets avoiding recurrence of expensive data collection. To advance the field, it is crucial to generate large-scale datasets. Compositional RL is particularly appealing for generating such large datasets, since 1) it permits creating many tasks from few components, and 2) the task structure may enable trained agents to solve new tasks by combining relevant learned components. This submission provides four offline RL datasets for simulated robotic manipulation created using the 256 tasks from CompoSuite [Mendez et al., 2022] (https://github.com/Lifelong-ML/CompoSuite). In every task in CompoSuite, a *robot* arm is used to manipulate an *object* to achieve an *objective* all while trying to avoid an *obstacle*. There are for components for each of these four axes that can be combined arbitrarily leading to a total of 256 tasks. The component choices are 

* Robot: IIWA, Jaco, Kinova3, Panda
* Object: Hollow box, box, dumbbell, plate
* Objective: Push, pick and place, put in shelf, put in trashcan
* Obstacle: None, wall between robot and object, wall between goal and object, door between goal and object

The four included datasets are collected using separate agents each trained to a different degree of performance, and each dataset consists of 256 million transitions. The degrees of performance are expert data, medium data, warmstart data and replay data:

* Expert dataset: Transitions from an expert agent that was trained to achieve 90% success on every task.
* Medium dataset: Transitions from a medium agent that was trained to achieve 30% success on every task.
* Warmstart dataset: Transitions from a Soft-actor critic agent trained for a fixed duration of one million steps.
* Medium-replay-subsampled dataset: Transitions that were stored during the training of a medium agent up to 30% success.

These datasets are intended for the combined study of compositional generalization and offline reinforcement learning.