Data from: Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers
Data files
Jan 23, 2024 version files 804.43 MB
-
data-v1.0.0.zip
-
README.md
Feb 22, 2024 version files 804.83 MB
-
data-v1.0.1.zip
-
README.md
Abstract
- Machine learning‐based behaviour classification using acceleration data is a powerful tool in bio‐logging research. Deep learning architectures such as convolutional neural networks (CNN), long short‐term memory (LSTM) and self‐attention mechanisms as well as related training techniques have been extensively studied in human activity recognition. However, they have rarely been used in wild animal studies. The main challenges of acceleration‐based wild animal behaviour classification include data shortages, class imbalance problems, various types of noise in data due to differences in individual behaviour and where the loggers were attached and complexity in data due to complex animal‐specific behaviours, which may have limited the application of deep learning techniques in this area.
- To overcome these challenges, we explored the effectiveness of techniques for efficient model training: data augmentation, manifold mixup and pre‐training of deep learning models with unlabelled data, using datasets from two species of wild seabirds and state‐of‐the‐art deep learning model architectures.
- Data augmentation improved the overall model performance when one of the various techniques (none, scaling, jittering, permutation, time‐warping and rotation) was randomly applied to each data during mini‐batch training. Manifold mixup also improved model performance, but not as much as random data augmentation. Pre‐training with unlabelled data did not improve model performance. The state‐of‐the‐art deep learning models, including a model consisting of four CNN layers, an LSTM layer and a multi‐head attention layer, as well as its modified version with shortcut connection, showed better performance among other comparative models. Using only raw acceleration data as inputs, these models outperformed classic machine learning approaches that used 119 handcrafted features.
- Our experiments showed that deep learning techniques are promising for acceleration‐based behaviour classification of wild animals and highlighted some challenges (e.g. effective use of unlabelled data). There is scope for greater exploration of deep learning techniques in wild animal studies (e.g. advanced data augmentation, multimodal sensor data use, transfer learning and self‐supervised learning). We hope that this study will stimulate the development of deep learning techniques for wild animal behaviour classification using time‐series sensor data.
This abstract is cited from the original article "Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers" in Methods in Ecology and Evolution (Otsuka et al., 2024).
Please see README for the details of the datasets.
README: Data from: Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers
https://doi.org/10.5061/dryad.2ngf1vhwk
This repository contains the datasets of two seabird species (streaked shearwaters and black-tailed gulls) used in the following paper (Otsuka et al., 2024).
Otsuka, R., Yoshimura, N., Tanigaki, K., Koyama, S., Mizutani, Y., Yoda, K., & Maekawa, T. (2024). Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers. Methods in Ecology and Evolution.
The paper aimed to classify the behaviour of these two seabird species using tri-axial acceleration data and deep learning. It explored the effectiveness of deep learning models and related training techniques, such as data augmentation.
Description of the data and file structure
The directory structure of the data is as follows:\
(After unzipping the data-v1.0.1.zip
file, you will see the following directories and files.)
data
├─id-files/*.csv
└─raw-data/
├─omizunagidori/*.csv
└─umineko/*.csv
└─supplementary-materials/*.pdf
The names of directories and files in the datasets often include the term "omizunagidori" or "umineko." These terms represent the Japanese names for streaked shearwaters (Calonectris leucomelas) and black-tailed gulls (Larus crassirostris) in alphabetical form, respectively.
The raw-data/omizunagidori
and raw-data/umineko
directories include raw data CSV files from streaked shearwaters and black-tailed gulls, respectively. Each raw data CSV file includes "timestamp", tri-axial acceleration data ("acc_x", "acc_y", and "acc_z"), and behaviour class label ("label").
These acceleration data were collected using bio-logging devices attached to the seabirds from 2018 to 2022. Acceleration data were recorded at sampling rates of 25 or 31 Hz. Please note that most rows in the "label" column are empty because only a limited portion of the data has been labelled. The data were labelled primarily using video footage captured by animal-borne cameras.
Please refer to id-files/animal_id.csv
for metadata on each CSV file. The "animal_id" column includes unique identification numbers, such as "OM2214" or "UM1913", and these are used in both the paper and source code to differentiate data for each individual bird (or data acquired through each attachment of a data logger). The "animal_tag" column consists of ID numbers used during raw data collection, and each raw data file name includes this information, as shown in the "csv_file_name" column. The "correct_timestamp" column contains values of 0 or 1, and when it is 1, it indicates that the timestamps of the raw data need correction. The "back" column has values of 0 or 1, with 1 indicating the back attachment of the data logger and 0 indicating the abdomen attachment. For information regarding acceleration sensor devices and their axes' orientations, please refer to "acc_sensor" column in id-files/animal_id.csv
and supplementary-materials/acceleration_sensor_orientation_birds.pdf
.
For the correspondence between the behavior class labels within the CSV files and the six behavior classes used in the paper, please see the following CSV files: label_id_omizunagidori.csv
and label_id_umineko.csv
for streaked shearwaters and black-tailed gulls, respectively. For example, the label "surface_seizing" in the raw data CSV files for streaked shearwaters refers to the label "dipping" in the paper.
For more details about the datasets, please refer to the "Datasets" subsection in the "Materials and Methods" section of the paper, as well as Table S2-3 and Figure S1-3 in the Supporting Information file of the paper. Please refer to the source code for data preparation.
Code/Software
The source code used in this study is available from the link below.\
https://github.com/ryoma-otsuka/dl-wabc