Systematic review of validation of supervised machine learning models in accelerometer-based animal behaviour classification literature
Data files
Jun 24, 2025 version files 46.22 KB
-
README.md
2.53 KB
-
Systematic_Review_Supplementary.xlsx
43.69 KB
Abstract
Supervised machine learning has been used to detect fine-scale animal behaviour from accelerometer data, but a standardised protocol for implementing this workflow is currently lacking. As the application of machine learning to ecological problems expands, it is essential to establish technical protocols and validation standards that align with those in other "big data" fields. Overfitting is a prevalent and often misunderstood challenge in machine learning. Overfit models overly adapt to the training data to memorise specific instances rather than to discern the underlying signal. Associated results can indicate high performance on the training set, yet these models are unlikely to generalise to new data. Overfitting can be detected through rigorous validation using independent test sets. Our systematic review of 119 studies using accelerometer-based supervised machine learning to classify animal behaviour reveals that 79% (94 papers) did not validate their models sufficiently well to robustly identify potential overfitting. Although this does not inherently imply that these models are overfit, the absence of independent test sets limits the interpretability of their results. To address these challenges, we provide a theoretical overview of overfitting in the context of animal accelerometry and propose guidelines for optimal validation techniques. We aim to equip ecologists with the tools necessary to adapt general machine learning validation theory to the specific requirements of biologging, facilitating reliable overfitting detection and advancing the field.
https://doi.org/10.5061/dryad.fxpnvx14d
Description of the data and file structure
Files and variables
File: Systematic_Review_Supplementary.xlsx
Description: Methods information from animal accelerometer-based behaviour classification literature utilising supervised machine learning techniques.
Variables
- Citation: Citation information for paper
- Title: Extracted title from citation information
- Year: Year of publication
- ModelCategory: General category of the supervised machine learning model used (e.g., all Support Vector Machines are listed as SVM)
- DT — Decision Tree
- EM — Expectation Maximisation
- Ensemble — Ensemble methods (e.g., boosting, bagging)
- HMM — Hidden Markov Model
- Isolation Forest — Anomaly detection using Isolation Forest
- kNN — k-Nearest Neighbours
- Multiple — Multiple models trialled and compared
- NB — Naive Bayes
- NN — Neural Network (any architecture)
- QDA — Quadratic Discriminant Analysis
- RF — Random Forest
- SVM — Support Vector Machine
- Tree — Other tree-based models (e.g., CART)
- Species: Main research species (common name)
- Free/Captive: Whether the species was free-roaming or captive for the duration of the study. (free-roaming/captive/split designation)
- SampleSize: Number of individuals' data included in the study (numeric)
- Overlap: % overlap between windows during feature generation (numeric, 'vague' if unclear from publication, or publication descriptor, e.g., "rolling")
- FeatureSelection: Whether feature selection was performed prior to model construction (yes/no/blank for not reported)
- HyperparameterTuning: Whether model hyperparameters were tuned prior to the selection of the final model (yes/no/blank for not reported)
- ValidationSplit: How data was stratified betwene training, validation, and test sets. (Random/chronological/ individual)
- ValidationSet: Inclusion of a dataset specifically for model tuning (yes/no/blank for not reported)
- ValidationMethod: Use of single or cross-validated validation (single/cross-validation/blank for not reported)
- Fscore - AUC: Performance metrics reported in publication (numeric)
- Other_performance_metrics: report any other metrics in publication (name and numeric)
We defined eligibility criteria as 'peer-reviewed primary research papers published 2013-present that use supervised machine learning to identify specific behaviours from raw, non-livestock animal accelerometer data'. We elected to ignore analysis of livestock behaviour as agricultural methods often operate within different constraints to the analyses conducted on wild animals and this body of literature has mostly developed in isolation to wild animal research. Our search was conducted on 27/09/2024. Initial keyword search across 3 databases (Google Scholar, PubMed, and Scopus) yielded 249 unique papers. Papers outside of the search criteria — including hardware and software advances, non-ML analysis, insufficient accelerometry application (e.g., research focused on other sensors with accelerometry providing minimal support), unsupervised methods, and research limited to activity intensity or active and inactive states— were excluded, resulting in 119 papers.
