Automated analysis of bird head motion in unconstrained settings: A foundational study on semicircular canal evolution in archosaurs
Data files
Apr 24, 2025 version files 691.68 MB
-
Birdgaze.zip
691.67 MB
-
README.md
8.39 KB
Abstract
This study presents a framework to automatically analyze head motion in birds from videos of natural behaviors. The process involves detecting birds, identifying key points on their heads, and tracking changes in their positions over time. Bird detection and key point extraction were trained on publicly available datasets, featuring videos and images of diverse bird species in uncontrolled settings. Initial challenges with complex video backgrounds causing misidentifications and inaccurate key points were addressed through validation, refinement, filtering, and smoothing. Head angular velocities and rotation frequencies were computed from the refined key points. The algorithm performed well at moderate speeds but was limited by the 30 Hz frame rate of most videos, which constrained measurable angular velocities and frequencies and caused motion blur, affecting key point detection. Our findings suggest that the framework may provide plausible estimates of head motion but also emphasize the importance of high frame rate videos in future research, including extensive comparisons against ground truth data, to fully characterize bird head movements. Importantly, this work is a foundational effort to understand the evolutionary drivers of the semicircular canals, the biosensor that monitors head rotations, for both extinct and extant tetrapods.
GENERAL INFORMATION
Dataset overview
This dataset includes:
- BirdGaze image annotations (birdgaze.zip) - Anotations are provided for selected contents derived from the following sources: Animal Kingdom [1], NABirds [2], Birdsnap [3], eBird [4], CUB-200-2011 [5] and 3D-POP [6]
- Code for bird detection and keypoint estimation (PoseEstimation.zip) - Includes the code for bird detection and estimation of the four selected 2D key points: top of head, tip of beak, left eye, right eye.
Related publication
Marco Santos-Lopes, Ricardo Araújo, Romain David, Paulo L. Correia; "Automated Analysis of Bird Head Motion in Unconstrained Settings: A Foundational Study on Semicircular Canal Evolution in Archosaurs". Journal of the Royal Society Interface; (ACCEPTED; April 2025)
preliminary version available at: https://www.biorxiv.org/content/10.1101/2024.12.20.629664v1
NOTES FOR FILES
BirdGaze image annotations (birdgaze.zip)
The annotation files are organized using the same names as in the original datasets.
The directory structure is:
Birdgaze
├── 3D-POP
│ ├── Pigeon01 (annotations for pigeon number 1 in the multi-pigeon setup)
│ │ └── Sequence*N*_n01_*nnnnnnnn*
│ │ └── Annotation
│ └── SinglePigeon (annotations for single pigeon sequences)
│ └── Annotation
├── ak_P3_bird (annotations for the Animal Kingdom dataset, for pose estimation subset labelled as ak_P3_bird)
│ └── annot (annotations for the selected subset of the Animal Kingdom dataset)
├── birdsnap (annotations for the Birdsnap dataset)
│ └── annot (annotations for the selected subset of the Birdsnap dataset)
├── cub_200_2011_xml (annotations for the Cub_200_2011 dataset)
│ ├── bird_species (the sequences of available birds species are organized according to their usage for training, validation or testing in the performed tests)
│ │ ├── test
│ │ │ └── labels
│ │ │ └── *bird_name*_*sequence_number*.txt (annotation of the 4 selected keypoint coordinates)
│ │ ├── train
│ │ │ └── labels
│ │ │ └── *bird_name*_*sequence_number*.txt (annotation of the 4 selected keypoint coordinates)
│ │ └── val
│ │ └── labels
│ │ └── *bird_name*_*sequence_number*.txt (annotation of the 4 selected keypoint coordinates)
│ ├── train_labels
│ │ └── *bird_name*_*sequence_number*.xml (annotation information in xml format)
│ └── valid_labels
│ └── *bird_name*_*sequence_number*.xml (annotation information in xml format)
├── finetune (annotations for for finetuning purposes in json format)
│ └── annot
│ ├── test.json (annotations for the test set)
│ └── train.json (annotations for the training set)
├── nabirds_ak_birdsnap (annotations in json format for the training and testing including sequences from the NAbirds, Animal Kingdom and Birdsnap datasets)
│ └── annot
│ ├── test.json (annotations for the test set)
│ └── train.json (annotations for the training set)
├── nabirds_ak_birdsnap_ebird (annotations in json format for the training and testing including sequences from the NAbirds, Animal Kingdom, Birdsnap and eBird datasets)
│ └── annot
│ ├── test.json (annotations for the test set)
│ └── train.json (annotations for the training set)
└── birdgaze (annotations in json format for the training and testing including sequences from the NAbirds, Animal Kingdom, Birdsnap, CUB_200_2011 and eBird datasets)
└── annot
├── test.json (annotations for the test set)
└── train.json(annotations for the training set)
Image anotations are for selected contents derived from the following sources: Animal Kingdom [1], NABirds [2], Birdsnap [3], eBird [4], CUB-200-2011 [5] and 3D-POP [6]
[1] Ng, X. L., Ong, K. E., Zheng, Q., Ni, Y., Yeo, S. Y., & Liu, J. “Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding.” arXiv 2204.08129, 2022. https://doi.org/10.48550/arXiv.2204.08129.
[2] Berg, T., Liu, J., Lee, S. W., Alexander, M. L., Jacobs, D. W., & Belhumeur, P. N. “Birdsnap: Large-Scale Fine-Grained Visual Categorization of Birds.” Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014, pp. 2019–2026. doi: 10.1109/CVPR.2014.259.
[3] Van Horn, G., Branson, S., Farrell, R., et al. “Building a Bird Recognition App and Large-Scale Dataset with Citizen Scientists: The Fine Print in Fine-Grained Dataset Collection.” Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015, pp. 595–604. doi: 10.1109/CVPR.2015.7298658.
[4] Sullivan, B. L., Wood, C. L., Iliff, M. J., Bonney, R. E., Fink, D., & Kelling, S. “eBird: A Citizen-Based Bird Observation Network in the Biological Sciences.” Biological Conservation, vol. 142, 2009, pp. 2282–2292.
[5] Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. The CUB-200-2011 Caltech-UCSD Birds-200-2011 Dataset. Tech. Rep., California Institute of Technology, 2011. Available online: http://www.vision.caltech.edu/visipedia/CUB-200-2011.html
[6] Naik, H., Chan, A. H. H., Yang, J., Delacoux, M., Couzin, D. I., Kano, F., & Nagy, M. “3D-POP - An Automated Annotation Approach to Facilitate Markerless 2D-3D Tracking of Freely Moving Birds with Marker-Based Motion Capture.” Edmond, V6, 2023. https://doi.org/10.17617/3.HPBBC7.
Code for bird detection and keypoint estimation (PoseEstimation.zip)
This repository contains code for performing pose estimation on birds using a combination of YOLOv8 for object detection and HRNet for pose estimation. The code is structured to facilitate easy usage and modification.
Directory Structure
- models/: Contains the model implementations for pose estimation.
- lib/: Contains utility functions, configuration management, and core functions for pose estimation.
- data/: Contains trained models and input videos.
- experiments/: Contains configuration files for inference.
- videos_experiments/: Directory for saving output results, including videos and scores.
- main.py: The main script to run the pose estimation.
Requirements
Make sure to install the following dependencies:
bash
pip install torch torchvision torchaudio
pip install opencv-python
pip install matplotlib
pip install pandas
pip install scipy
pip install ultralytics
Usage
- Prepare Your Data:
- Place your input video files in the
data/videos/directory. - Ensure the YOLOv8 model file (
YOLOv8Bird.pt) is located in themodels/trained/directory.
- Place your input video files in the
- Configure the Model:
- Modify the
experiments/inference.yamlfile to set the parameters for the model and inference.
- Modify the
- Run the Pose Estimation:
- Execute the main script with the following command:
- Replace
your_video.mp4with the name of your video file. - Replace
inference.yamlwith the name of your configuration file.
- Output:
- The results will be saved in the
videos_experiments/your_video/directory, including annotated videos and pose estimation scores.
- The results will be saved in the
- Visualize Results:
- You can also find the generated plots and scores in the
videos_experiments/your_video/directory.
- You can also find the generated plots and scores in the
Notes
- Ensure that your environment has access to a GPU (for training) for optimal performance.
- The code to test models in the main.py file is designed to run with a CPU.
- The code is designed to be modular, allowing for easy updates and modifications to the model and functions.
For the development of the bird head pose estimation (BHPE) module a new 2D BHPE annotated dataset is proposed, here entitled BirdGaze, which includes images from four prominent sources: the Animal Kingdom, NABirds, Birdsnap and eBird. These datasets represent the largest publicly available collections and are widely recognized in the literature for their significant role in avian research. Their extensive morphological diversity is crucial for this study. Besides the bird images, the proposed BirdGaze dataset includes a set of annotations, notably:
- Center of the bounding box containing the bird body, described by its 2D coordinates;
- Scale factor, defining a multiplying factor to apply to the bird bounding box for resizing it to fit a fixed rectangle size, which is used as input to the adopted key point extraction model;
- Coordinates of the four selected 2D key points: top of head, tip of beak, left eye, right eye.
