Development of compositionality through interactive learning of language and action of robots
Data files
Jan 03, 2025 version files 12.35 GB
-
groupA1testcomp.h5
395.24 MB
-
groupA1train.h5
1.58 GB
-
groupA2testcomp.h5
790.47 MB
-
groupA2testpos.h5
395.24 MB
-
groupA2train.h5
1.19 GB
-
groupB1testcomp.h5
296.43 MB
-
groupB1testpos.h5
1.19 GB
-
groupB1train.h5
1.19 GB
-
groupB2testcomp.h5
592.86 MB
-
groupB2testpos.h5
296.43 MB
-
groupB2train.h5
889.28 MB
-
groupC1testcomp.h5
148.22 MB
-
groupC1testpos.h5
592.86 MB
-
groupC1train.h5
592.86 MB
-
groupC2testcomp.h5
296.43 MB
-
groupC2testpos.h5
148.22 MB
-
groupC2train.h5
444.64 MB
-
groupD1testcomp.h5
148.22 MB
-
groupD1testpos.h5
355.72 MB
-
groupD1train.h5
296.43 MB
-
groupD2testcomp.h5
148.22 MB
-
groupD2testpos.h5
88.93 MB
-
groupD2train.h5
296.43 MB
-
README.md
1.93 KB
Abstract
Humans excel at applying learned behavior to unlearned situations. A crucial component of this generalization behavior is our ability to compose/decompose a whole into reusable parts, an attribute known as compositionality. One of the fundamental questions in robotics concerns this characteristic. "How can linguistic compositionality be developed concomitantly with sensorimotor skills through associative learning, particularly when individuals only learn partial linguistic compositions and their corresponding sensorimotor patterns?" To address this question, we propose a brain-inspired neural network model that integrates vision, proprioception, and language into a framework of predictive coding and active inference, based on the free-energy principle. The effectiveness and capabilities of this model were assessed through various simulation experiments conducted with a robot arm. Our results show that generalization in learning to unlearned verb-noun compositions, is significantly enhanced when training variations of task composition are increased. We attribute this to self-organized compositional structures in linguistic latent state space being influenced significantly by sensorimotor learning. Ablation studies show that visual attention and working memory are essential to accurately generate visuo-motor sequences to achieve linguistically represented goals. These insights advance our understanding of mechanisms underlying development of compositionality through interactions of linguistic and sensorimotor experience.
README: Development of compositionality through interactive learning of language and action of robots
https://doi.org/10.5061/dryad.xsj3tx9qc
The dataset used for training and evaluating the model referenced in the article titled "Development of compositionality through interactive learning of language and action of robots". The dataset was collected from robotic arm (Torobo Arm) and an external RGB camera.
Description of the data and file structure
The dataset contains individual h5py files of synchronized visuo-proprioceptive data. Each image is 64x64x3 vector and joint angles used for proprioception data are 60 dimensional vectors. The dataset was preprocessed using openCV python for images. The dataset with synchronized visuo-proprioceptive sequences and language were created with 'h5py' python library. Each file in the dataset also contains the language corresponding to the action performed by the robot. The language is provided in one-hot-vector format.
'GroupA1testcomp.h5' is the test dataset for testing unlearned compositions (U-C) and 'GroupA1testpos.h5' is the test dataset for testing unlearned positions (U-P) for group A1 (5x8 compositions, 80%); and 'GroupA1train.h5' is the dataset used for training group A1.
The same notations is followed for all groups Details on how to use the dataset for training and evaluation are provided in the Github repository below.
Sharing/Access information
This is the only way to publicly access the dataset. Please contact the author of the dataset if you cannot access this data.
Code/Software
The scripts for training the model is written in Python. The code can be found in the Github repository https://github.com/oist-cnru/FEP-based-model-of-Embodied-Language. The details of the workflow is also available in the repository.
Methods
The data was collected with a robotic arm (called Torobo Arm) and an external RGB camera. The vision data was preprocessed with openCV in python. More details can be found in the github repository https://github.com/oist-cnru/FEP-based-model-of-Embodied-Language.