The use of machine learning technologies to process large quantities of remotely-collected audio data is a powerful emerging research tool in ecology and conservation.

We applied these methods to a field study of tinamou (Tinamidae) biology in Madre de Dios, Peru, a region expected to have high levels of interspecies competition and niche partitioning as a result of high tinamou alpha diversity. We used autonomous recording units to gather environmental audio over a period of several months at lowland rainforest sites in the Los Amigos Conservation Concession and developed a Convolutional Neural Network-based data processing pipeline to detect tinamou vocalizations in the dataset.

The classified acoustic event data are comparable to similar metrics derived from an ongoing camera trapping survey at the same site, and it should be possible to combine the two datasets for future explorations of the target species’ niche space parameters.

Here we provide an overview of the methodology used in the data collection and processing pipeline, offer general suggestions for processing large amounts of environmental audio data, and demonstrate how data collected in this manner can be used to answer questions about bird biology.

This dataset has two components: training and testing data used to create an acoustic detection model, and a csv containing detections in the survey data.

The acoustic dataset was derived from audio downloaded from the Macaulay Library of Natural Sounds (https://macaulaylibrary.org) and Xeno-Canto (http://www.xeno-canto.org) databases (S2) as well as from exemplar cuts in the audio we collected in the field. Effort was taken to ensure that the training examples covered the full breadth of the acoustic parameter space of tinamous, including for the two species in this study, C. soui and C. variegatus, that were observed to use distinct call and song types in the survey audio. All audio was checked to ensure correct assignment to species before use. Training and testing datasets were subsets of this larger dataset.

Survey audio was collected using Swift recorders at sites in lowland Peruvian rainforest from July-October 2019 and processed using a convolutional neural network to obtain tinamou detections. The csv dataset includes all non-independent detections; independent detections as defined in the manuscript are all detections of a particular species separated from one another by more than an hour.

Deployments are labeled from 1-4; each was about two weeks long, though not every recorder recorded for the entire period due to battery issues. Swift recorders are labeled from 1-10, but recorders were deployed to different locations ("ct_code") in each deployment.

Automated audio recording as a means of surveying Tinamous (Tinamidae) in the Peruvian Amazon

Data files

Abstract

Automated audio recording as a means of surveying Tinamous (Tinamidae) in the Peruvian Amazon

Data files

Abstract

Methods

Usage notes

Works referencing this dataset