Automated analysis of bird head motion in unconstrained settings: A foundational study on semicircular canal evolution in archosaurs

Correia, Paulo 1 ; Araujo, Ricardo1 ; David, Romain2 ; Lopes, Marco1

Published Apr 24, 2025 on Dryad. https://doi.org/10.5061/dryad.bk3j9kdpb

Data files

Apr 24, 2025 version files 691.68 MB

Birdgaze.zip

691.67 MB
README.md

8.39 KB

Abstract

This study presents a framework to automatically analyze head motion in birds from videos of natural behaviors. The process involves detecting birds, identifying key points on their heads, and tracking changes in their positions over time. Bird detection and key point extraction were trained on publicly available datasets, featuring videos and images of diverse bird species in uncontrolled settings. Initial challenges with complex video backgrounds causing misidentifications and inaccurate key points were addressed through validation, refinement, filtering, and smoothing. Head angular velocities and rotation frequencies were computed from the refined key points. The algorithm performed well at moderate speeds but was limited by the 30 Hz frame rate of most videos, which constrained measurable angular velocities and frequencies and caused motion blur, affecting key point detection. Our findings suggest that the framework may provide plausible estimates of head motion but also emphasize the importance of high frame rate videos in future research, including extensive comparisons against ground truth data, to fully characterize bird head movements. Importantly, this work is a foundational effort to understand the evolutionary drivers of the semicircular canals, the biosensor that monitors head rotations, for both extinct and extant tetrapods.

GENERAL INFORMATION

Dataset overview

This dataset includes:

BirdGaze image annotations (birdgaze.zip) - Anotations are provided for selected contents derived from the following sources: Animal Kingdom [1], NABirds [2], Birdsnap [3], eBird [4], CUB-200-2011 [5] and 3D-POP [6]
Code for bird detection and keypoint estimation (PoseEstimation.zip) - Includes the code for bird detection and estimation of the four selected 2D key points: top of head, tip of beak, left eye, right eye.

Related publication

Marco Santos-Lopes, Ricardo Araújo, Romain David, Paulo L. Correia; "Automated Analysis of Bird Head Motion in Unconstrained Settings: A Foundational Study on Semicircular Canal Evolution in Archosaurs". Journal of the Royal Society Interface; (ACCEPTED; April 2025)

preliminary version available at: https://www.biorxiv.org/content/10.1101/2024.12.20.629664v1

NOTES FOR FILES

BirdGaze image annotations (birdgaze.zip)

File names

The annotation files are organized using the same names as in the original datasets.
The directory structure is:

Birdgaze
├── 3D-POP
│   ├── Pigeon01 (annotations for pigeon number 1 in the multi-pigeon setup)
│   │   └── Sequence*N*_n01_*nnnnnnnn*
│   │       └── Annotation
│   └── SinglePigeon (annotations for single pigeon sequences)
│       └── Annotation
├── ak_P3_bird (annotations for the Animal Kingdom dataset, for pose estimation subset labelled as ak_P3_bird)
│       └── annot (annotations for the selected subset of the Animal Kingdom dataset)
├── birdsnap (annotations for the Birdsnap dataset)
│       └── annot (annotations for the selected subset of the Birdsnap dataset)
├── cub_200_2011_xml (annotations for the Cub_200_2011 dataset)
│       ├── bird_species (the sequences of available birds species are organized according to their usage for training, validation or testing in the performed tests)
│		│	├── test
│		│	│	└── labels
│		│  	│		└── *bird_name*_*sequence_number*.txt (annotation of the 4 selected keypoint coordinates)
│		│  	├── train
│		│  	│	└── labels
│		│  	│		└── *bird_name*_*sequence_number*.txt (annotation of the 4 selected keypoint coordinates)
│		│  	└── val
│		│  	 	└── labels
│		│  			└── *bird_name*_*sequence_number*.txt (annotation of the 4 selected keypoint coordinates)
│    	├── train_labels
│		│   	└── *bird_name*_*sequence_number*.xml (annotation information in xml format)
│    	└── valid_labels
│           	└── *bird_name*_*sequence_number*.xml (annotation information in xml format)
├── finetune (annotations for for finetuning purposes in json format)
│       └── annot 
│       	├── test.json (annotations for the test set)
│       		└── train.json (annotations for the training set)
├── nabirds_ak_birdsnap (annotations in json format for the training and testing including sequences from the NAbirds, Animal Kingdom and Birdsnap datasets)
│		└── annot 
│       	├── test.json (annotations for the test set)
│       	└── train.json (annotations for the training set)
├── nabirds_ak_birdsnap_ebird (annotations in json format for the training and testing including sequences from the NAbirds, Animal Kingdom, Birdsnap and eBird datasets)
│		└── annot 
│       	├── test.json (annotations for the test set)
│       	└── train.json (annotations for the training set)
└── birdgaze (annotations in json format for the training and testing including sequences from the NAbirds, Animal Kingdom, Birdsnap, CUB_200_2011 and eBird datasets)
 	└── annot 
        ├── test.json (annotations for the test set)
        └── train.json(annotations for the training set)

Image anotations are for selected contents derived from the following sources: Animal Kingdom [1], NABirds [2], Birdsnap [3], eBird [4], CUB-200-2011 [5] and 3D-POP [6]

[1] Ng, X. L., Ong, K. E., Zheng, Q., Ni, Y., Yeo, S. Y., & Liu, J. “Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding.” arXiv 2204.08129, 2022. https://doi.org/10.48550/arXiv.2204.08129.
[2] Berg, T., Liu, J., Lee, S. W., Alexander, M. L., Jacobs, D. W., & Belhumeur, P. N. “Birdsnap: Large-Scale Fine-Grained Visual Categorization of Birds.” Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014, pp. 2019–2026. doi: 10.1109/CVPR.2014.259.
[3] Van Horn, G., Branson, S., Farrell, R., et al. “Building a Bird Recognition App and Large-Scale Dataset with Citizen Scientists: The Fine Print in Fine-Grained Dataset Collection.” Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015, pp. 595–604. doi: 10.1109/CVPR.2015.7298658.
[4] Sullivan, B. L., Wood, C. L., Iliff, M. J., Bonney, R. E., Fink, D., & Kelling, S. “eBird: A Citizen-Based Bird Observation Network in the Biological Sciences.” Biological Conservation, vol. 142, 2009, pp. 2282–2292.
[5] Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. The CUB-200-2011 Caltech-UCSD Birds-200-2011 Dataset. Tech. Rep., California Institute of Technology, 2011. Available online: http://www.vision.caltech.edu/visipedia/CUB-200-2011.html
[6] Naik, H., Chan, A. H. H., Yang, J., Delacoux, M., Couzin, D. I., Kano, F., & Nagy, M. “3D-POP - An Automated Annotation Approach to Facilitate Markerless 2D-3D Tracking of Freely Moving Birds with Marker-Based Motion Capture.” Edmond, V6, 2023. https://doi.org/10.17617/3.HPBBC7.

Code for bird detection and keypoint estimation (PoseEstimation.zip)

This repository contains code for performing pose estimation on birds using a combination of YOLOv8 for object detection and HRNet for pose estimation. The code is structured to facilitate easy usage and modification.

Directory Structure

models/: Contains the model implementations for pose estimation.
lib/: Contains utility functions, configuration management, and core functions for pose estimation.
data/: Contains trained models and input videos.
experiments/: Contains configuration files for inference.
videos_experiments/: Directory for saving output results, including videos and scores.
main.py: The main script to run the pose estimation.

Requirements

Make sure to install the following dependencies:

bash
pip install torch torchvision torchaudio
pip install opencv-python
pip install matplotlib
pip install pandas
pip install scipy
pip install ultralytics

Usage

Prepare Your Data:
- Place your input video files in the data/videos/ directory.
- Ensure the YOLOv8 model file (YOLOv8Bird.pt) is located in the models/trained/ directory.
Configure the Model:
- Modify the experiments/inference.yaml file to set the parameters for the model and inference.
Run the Pose Estimation:
- Execute the main script with the following command:
```
python main.py --cfg experiments/inference.yaml --video data/videos/your_video.mp4 --write_obj --write_pose
```
- Replace your_video.mp4 with the name of your video file.
- Replace inference.yaml with the name of your configuration file.
Output:
- The results will be saved in the videos_experiments/your_video/ directory, including annotated videos and pose estimation scores.
Visualize Results:
- You can also find the generated plots and scores in the videos_experiments/your_video/ directory.

Notes

Ensure that your environment has access to a GPU (for training) for optimal performance.
The code to test models in the main.py file is designed to run with a CPU.
The code is designed to be modular, allowing for easy updates and modifications to the model and functions.