Data from: Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers

Otsuka, Ryoma 1 ; Yoshimura, Naoya1; Tanigaki, Kei1; Koyama, Shiho2; Mizutani, Yuichi2; Yoda, Ken2; Maekawa, Takuya1

Published Jan 23, 2024; Updated Feb 22, 2024 on Dryad. https://doi.org/10.5061/dryad.2ngf1vhwk

Data files

Jan 23, 2024 version files 804.43 MB

data-v1.0.0.zip

804.42 MB
README.md

3.69 KB

Feb 22, 2024 version files 804.83 MB

data-v1.0.1.zip

804.83 MB
README.md

3.96 KB

Feb 22, 2024 version files 804.83 MB

data-v1.0.1.zip

804.83 MB
README.md

4.92 KB

Abstract

Machine learning‐based behaviour classification using acceleration data is a powerful tool in bio‐logging research. Deep learning architectures such as convolutional neural networks (CNN), long short‐term memory (LSTM) and self‐attention mechanisms as well as related training techniques have been extensively studied in human activity recognition. However, they have rarely been used in wild animal studies. The main challenges of acceleration‐based wild animal behaviour classification include data shortages, class imbalance problems, various types of noise in data due to differences in individual behaviour and where the loggers were attached and complexity in data due to complex animal‐specific behaviours, which may have limited the application of deep learning techniques in this area. To overcome these challenges, we explored the effectiveness of techniques for efficient model training: data augmentation, manifold mixup and pre‐training of deep learning models with unlabelled data, using datasets from two species of wild seabirds and state‐of‐the‐art deep learning model architectures. Data augmentation improved the overall model performance when one of the various techniques (none, scaling, jittering, permutation, time‐warping and rotation) was randomly applied to each data during mini‐batch training. Manifold mixup also improved model performance, but not as much as random data augmentation. Pre‐training with unlabelled data did not improve model performance. The state‐of‐the‐art deep learning models, including a model consisting of four CNN layers, an LSTM layer and a multi‐head attention layer, as well as its modified version with shortcut connection, showed better performance among other comparative models. Using only raw acceleration data as inputs, these models outperformed classic machine learning approaches that used 119 handcrafted features. Our experiments showed that deep learning techniques are promising for acceleration‐based behaviour classification of wild animals and highlighted some challenges (e.g. effective use of unlabelled data). There is scope for greater exploration of deep learning techniques in wild animal studies (e.g. advanced data augmentation, multimodal sensor data use, transfer learning and self‐supervised learning). We hope that this study will stimulate the development of deep learning techniques for wild animal behaviour classification using time‐series sensor data.

https://doi.org/10.5061/dryad.2ngf1vhwk

This repository contains the datasets of two seabird species (streaked shearwaters and black-tailed gulls) used in the following paper (Otsuka et al., 2024).

Otsuka, R., Yoshimura, N., Tanigaki, K., Koyama, S., Mizutani, Y., Yoda, K., & Maekawa, T. (2024). Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers. Methods in Ecology and Evolution.

The paper aimed to classify the behaviour of these two seabird species using tri-axial acceleration data and deep learning. It explored the effectiveness of deep learning models and related training techniques, such as data augmentation.

⚠️ WARNING (2025-07-02)
We found that the data collected using the BMX-055 sensor was likely not sampled consistently at the intended frequency of 31 Hz. Additionally, due to the FIFO buffering mechanism, some samples may have been dropped during the data acquisition process. The dataset contains both 31 Hz and 25 Hz acceleration data for both species. However, since the majority of the data for Streaked Shearwaters is at 31 Hz, and that for Black-tailed Gulls is at 25 Hz, we believe this issue has limited impact on experiments or validations conducted within this dataset. That said, if you train a model using the 31 Hz data and apply it to other datasets, please be aware that discrepancies may arise. Fine-tuning or other corrective measures may be necessary depending on your use case. We sincerely apologize for any inconvenience this may have caused.

Description of the data and file structure

The directory structure of the data is as follows:
(After unzipping the data-v1.0.1.zip file, you will see the following directories and files.)

data
  ├─id-files/*.csv
  └─raw-data/
      ├─omizunagidori/*.csv
      └─umineko/*.csv
  └─supplementary-materials/*.pdf

The names of directories and files in the datasets often include the term "omizunagidori" or "umineko." These terms represent the Japanese names for streaked shearwaters (Calonectris leucomelas) and black-tailed gulls (Larus crassirostris) in alphabetical form, respectively.

The raw-data/omizunagidori and raw-data/umineko directories include raw data CSV files from streaked shearwaters and black-tailed gulls, respectively. Each raw data CSV file includes "timestamp", tri-axial acceleration data ("acc_x", "acc_y", and "acc_z"), and behaviour class label ("label").

These acceleration data were collected using bio-logging devices attached to the seabirds from 2018 to 2022. Acceleration data were recorded at sampling rates of 25 or 31 Hz. Please note that most rows in the "label" column are empty because only a limited portion of the data has been labelled. The data were labelled primarily using video footage captured by animal-borne cameras.

Please refer to id-files/animal_id.csv for metadata on each CSV file. The "animal_id" column includes unique identification numbers, such as "OM2214" or "UM1913", and these are used in both the paper and source code to differentiate data for each individual bird (or data acquired through each attachment of a data logger). The "animal_tag" column consists of ID numbers used during raw data collection, and each raw data file name includes this information, as shown in the "csv_file_name" column. The "correct_timestamp" column contains values of 0 or 1, and when it is 1, it indicates that the timestamps of the raw data need correction. The "back" column has values of 0 or 1, with 1 indicating the back attachment of the data logger and 0 indicating the abdomen attachment. For information regarding acceleration sensor devices and their axes' orientations, please refer to "acc_sensor" column in id-files/animal_id.csv and supplementary-materials/acceleration_sensor_orientation_birds.pdf.

For the correspondence between the behavior class labels within the CSV files and the six behavior classes used in the paper, please see the following CSV files: label_id_omizunagidori.csv and label_id_umineko.csv for streaked shearwaters and black-tailed gulls, respectively. For example, the label "surface_seizing" in the raw data CSV files for streaked shearwaters refers to the label "dipping" in the paper.

For more details about the datasets, please refer to the "Datasets" subsection in the "Materials and Methods" section of the paper, as well as Table S2-3 and Figure S1-3 in the Supporting Information file of the paper. Please refer to the source code for data preparation.

Code/Software

The source code used in this study is available from the link below.
https://github.com/ryoma-otsuka/dl-wabc

Data from: Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers

Data files

Abstract

README: Data from: Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers

Description of the data and file structure

Code/Software

Works referencing this dataset