Sequential human assembly and disassembly motions in human-robot coexisting environments

Liu, Zhihao 1 ; Wang, Tianyu2; Ji, Zhenrui3; Xu, Wenjun3; Wang, Lihui1; Wang, Xi Vincent1

Published Sep 17, 2025 on Dryad. https://doi.org/10.5061/dryad.ncjsxkt6f

Data files

Sep 17, 2025 version files 31.79 GB

Abstract

As human-robot systems and autonomous robots become increasingly prevalent, the need for task-oriented datasets to study human behaviors in shared spaces has grown significantly. We present a novel dataset focusing on sequential human assembly and disassembly motions in human-robot coexisting environments. It contains over 10,000 samples recorded from multi-view camera setups, each comprising synchronized RGB videos and 2D and 3D human skeletons. Data were collected from 33 participants with diverse physical characteristics and behavior preferences. This dataset highlights practical challenges such as partial occlusions, similar repetitive motions, and varying human behaviors, which are often overlooked in existing datasets and research. Technical validation using benchmarking with state-of-the-art deep learning models reveals significant potential in using this dataset for practical applications. To support diverse research applications, this dataset provides raw and processed data with detailed annotations, including precise timestamps, procedure annotations, and Python codes for reproducibility. It aims to advance research in human motion prediction, task-oriented robotic sequential decision-making, motion and task planning of autonomous robots, and human-robot collaborative policies.

https://doi.org/10.5061/dryad.ncjsxkt6f

This dataset is about sequential human assembly and disassembly motion in a human-robot coexisting environment. Raw videos, video clips, human skeleton data, task procedure-related frame information, and the CAD model of the gear system used in the dataset are provided.

Description of the data and file structure

In this dataset, we have five .zip files listed as follows. Users can download them separately. Please refer to the Scientific Data paper with the same title for more details, like the naming rules.

Files and variables

1 File: Scenario-A-Assembly.zip

Description: Data collected in scenario A assembly task.

Scenario-A-Assembly.zip
|- Assembly/
|  |- raw video/ (The raw videos in this scenario and task.)
|  |- raw video frames/ (The frame information about raw videos.)
|  |- procedural video / (Video clips were generated using the raw video and the task procedure anchors by the Python scripts.)
|  |- skeleton frames / (Skeleton frames are about the unmerged skeleton data generated by OpenPose (for 2D) and MMPose (for 3D)  when taking video clips as input.)
|  |- procedural skeleton sequence / (A procedural skeleton sequence is a sequence of skeleton data from a single camera within one procedure in a task. Here is the sequence in 2D.)
|  |- procedural skeleton sequence 3D / (A procedural skeleton sequence is a sequence of skeleton data from a single camera within one procedure in a task. Here is the sequence in 3D.)
|  |- procedural skeleton sequence normalized / (A procedural skeleton sequence is a sequence of skeleton data from a single camera within one procedure in a task. Here is the sequence with normalization.)
|  |- procedure_anchor.csv (Task procedure anchors are the keyframe numbers that can annotate the start and end of any sequential human motions in the assembly and disassembly tasks.)

2 File: Scenario-A-Disassembly.zip

Description: Data collected in scenario A disassembly task.

Scenario-A-Disassembly.zip
|- Disassembly/
|  |- raw video/ (The raw videos in this scenario and task.)
|  |- raw video frames/ (The frame information about raw videos.)
|  |- procedural video / (Video clips were generated using the raw video and the task procedure anchors by the Python scripts.)
|  |- skeleton frames / (Skeleton frames are about the unmerged skeleton data generated by OpenPose (for 2D) and MMPose (for 3D)  when taking video clips as input.)
|  |- procedural skeleton sequence / (A procedural skeleton sequence is a sequence of skeleton data from a single camera within one procedure in a task. Here is the sequence in 2D.)
|  |- procedural skeleton sequence 3D / (A procedural skeleton sequence is a sequence of skeleton data from a single camera within one procedure in a task. Here is the sequence in 3D.)
|  |- procedural skeleton sequence normalized / (A procedural skeleton sequence is a sequence of skeleton data from a single camera within one procedure in a task. Here is the sequence with normalization.)
|  |- procedure_anchor.csv (Task procedure anchors are the keyframe numbers that can annotate the start and end of any sequential human motions in the assembly and disassembly tasks.)

3 File: Scenario-B-Assembly.zip

Description: Data collected in scenario B assembly task.

Scenario-B-Assembly.zip
|- Assembly/
|  |- raw video/ (The raw videos in this scenario and task.)
|  |- raw video frames/ (The frame information about raw videos.)
|  |- procedural video / (Video clips were generated using the raw video and the task procedure anchors by the Python scripts.)
|  |- skeleton frames / (Skeleton frames are about the unmerged skeleton data generated by OpenPose (for 2D) and MMPose (for 3D)  when taking video clips as input.)
|  |- procedural skeleton sequence / (A procedural skeleton sequence is a sequence of skeleton data from a single camera within one procedure in a task. Here is the sequence in 2D.)
|  |- procedural skeleton sequence 3D / (A procedural skeleton sequence is a sequence of skeleton data from a single camera within one procedure in a task. Here is the sequence in 3D.)
|  |- procedural skeleton sequence normalized / (A procedural skeleton sequence is a sequence of skeleton data from a single camera within one procedure in a task. Here is the sequence with normalization.)
|  |- procedure_anchor.csv (Task procedure anchors are the keyframe numbers that can annotate the start and end of any sequential human motions in the assembly and disassembly tasks.)

4 File: Scenario-B-Disassembly.zip

Description: Data collected in scenario B disassembly task.

Scenario-B-Disassembly.zip
|- Disassembly/
|  |- raw video/ (The raw videos in this scenario and task.)
|  |- raw video frames/ (The frame information about raw videos.)
|  |- procedural video / (Video clips were generated using the raw video and the task procedure anchors by the Python scripts.)
|  |- skeleton frames / (Skeleton frames are about the unmerged skeleton data generated by OpenPose (for 2D) and MMPose (for 3D)  when taking video clips as input.)
|  |- procedural skeleton sequence / (A procedural skeleton sequence is a sequence of skeleton data from a single camera within one procedure in a task. Here is the sequence in 2D.)
|  |- procedural skeleton sequence 3D / (A procedural skeleton sequence is a sequence of skeleton data from a single camera within one procedure in a task. Here is the sequence in 3D.)
|  |- procedural skeleton sequence normalized / (A procedural skeleton sequence is a sequence of skeleton data from a single camera within one procedure in a task. Here is the sequence with normalization.)
|  |- procedure_anchor.csv (Task procedure anchors are the keyframe numbers that can annotate the start and end of any sequential human motions in the assembly and disassembly tasks.)

5 File: CAD_model.zip

Description: CAD models of the product, previous SIEMENS robot learning challenge. These models could be 3D printed.

Access information

Other publicly accessible locations of the data:

N/A.

Data was derived from the following sources:

N/A.

Code/software

We have code on the GitHub repository: https://github.com/KTH-IPS/SD-Dataset

Please refer to the Scientific Data paper for more details.

Human subjects data

All participants were informed about the types of data to be collected, how those data would be stored and processed, and the overall purpose of the study. All participants provided consent for data collection and public release of de-identified videos. Participants had the option to wear face masks during data collection. To ensure privacy, all raw videos in the public dataset have been anonymized by automatically blurring faces.