The impacts of fine-tuning, phylogenetic distance, and sample size on big-data bioacoustics
Cite this dataset
Provost, Kaiya; Yang, Jiaying; Carstens, Bryan (2022). The impacts of fine-tuning, phylogenetic distance, and sample size on big-data bioacoustics [Dataset]. Dryad. https://doi.org/10.5061/dryad.8pk0p2nrb
Vocalizations in animals, particularly birds, are critically important behaviors that influence their reproductive fitness. While recordings of bioacoustic data have been captured and stored in collections for decades, the automated extraction of data from these recordings has only recently been facilitated by artificial intelligence methods. These have yet to be evaluated with respect to accuracy of different automation strategies and features. Here, we use a recently published machine learning framework to extract syllables from ten bird species ranging in their phylogenetic relatedness from 1 to 85 million years, to compare how phylogenetic relatedness influences accuracy. We also evaluate the utility of applying trained models to novel species. Our results indicate that model performance is best on conspecifics, with accuracy progressively decreasing as phylogenetic distance increases between taxa. However, we also find that the application of models trained on multiple distantly related species can improve the overall accuracy to levels near that of training and analyzing a model on the same species. When planning big-data bioacoustics studies, care must be taken in sample design to maximize sample size and minimize human labor without sacrificing accuracy.
National Science Foundation, Award: DEB 2016189