Data and code from: Disentangling the evolutionary cause-effect relationships of environment, sexual selection and body size with birdsong frequency
Data files
Apr 01, 2026 version files 2.16 MB
-
birdsong_data.csv
77.07 KB
-
birdstrees_100_McTavish.nex
2.07 MB
-
Code_PPA_birdsong-evolution.R
8.72 KB
-
coefficients-1.csv
1.45 KB
-
README.md
3.94 KB
Abstract
This project integrates a large comparative dataset and phylogenetic information to study the evolution of birdsong across 472 Neotropical passerine species. The dataset includes acoustic, morphological, and ecological variables (birdsong_data.csv), additional model coefficients (coefficients-1.csv), and 100 phylogenetic trees (birdstrees_100_McTavish.nex). An accompanying R script (Code_PPA_birdsong-evolution.r) performs all analyses, model selection, and figure generation.
Using these data, the study employs Phylogenetic Path Analysis to test causal relationships among habitat structure, sexual dimorphism, morphology, and song frequency parameters. Across all phylogenies, a single causal structure was consistently supported. The analyses show that greater tree cover increases minimum, peak, and maximum song frequencies, while bandwidth remains unaffected. Sexual dimorphism decreases bandwidth and influences frequency values, whereas morphological traits impose biomechanical constraints on song frequencies and shape bandwidth differently. Habitat structure and sexual dimorphism also affect morphological traits, producing additional indirect pathways that influence birdsong. Furthermore, tree cover itself impacts sexual dimorphism, embedding it within a broader causal network.
Together, the dataset and analyses reveal that the evolution of birdsong emerges from interacting environmental, sexual, and morphological forces. The results support key hypotheses—including acoustic adaptation, sexual selection, and morphological constraints—and demonstrate that trait evolution is best understood through multicausal and phylogenetically informed models, rather than simple linear associations.
Dataset DOI: 10.5061/dryad.g1jwstqtc
Description of the data and file structure
Description of the code and data to generate the results, from: “Disentangling the evolutionary cause-effect relationships of environment, sexual selection, and body size with birdsong frequency”
By
Rivera-Gutierrez HF, Gomez-Gomez O, Montoya-Jaramillo V, Toro-Cardona F, Pinzon-Cardenas P.
If you use any of the Supporting information, we suggest citing both the original paper and this Supporting information
Cite these data as: Rivera-Gutierrez HF, Gomez-Gomez O, Montoya-Jaramillo V, Toro-Cardona F, Pinzon-Cardenas P. Data and code from: Disentangling the evolutionary cause-effect relationships of environment, sexual selection, and body size with birdsong frequency.
To perform this analysis, you will need the data consisting of a dataset with acoustic and other traits for 472 bird species and a file containing 100 phylogenetic trees with
472 species. In addition, you need the code to perform the analysis and generate the results and figures. The code has been developed for R.
In this dataset you will find the following files:
Dataset: birdsong_data.csv
Dataset: coefficients-1.csv
Phylogenetic trees: birdstrees_100_McTavish.nex
Code: Code_PPA_birdsong-evolution.r
You must open the R program and run the script. The script is documented and you will fin all the instructions for running the analyses.
Data description:
birdsong_data.csv
Comma separated values table. Contains:
472 obs. of 17 variables:
species: character, species name for 472 species included in the analysis
Peak_F: peak frequency; Type: numeric; units: (Hz)
Min_F: minimum frequency; Type: numeric; units: (Hz)
Max_F: maximum frequency; Type: numeric; units: (Hz)
BW: bandwidth; Type: numeric; units: (Hz)
Peso: bodymass; Type: numeric; units: (g)
Culmen: exposedCulmen; Type: numeric; units: (mm)
Alto_pico: billDepth; Type: numeric; units: (mm)
Ancho_pico: billWidth; Type: numeric; units: (mm)
Longitud_ala: wing; Type: numeric; units: (mm)
Longitud_cola: tail; Type: numeric; units: (mm)
Tarso: tarsus; Type: numeric; units: (mm)
SSD: body size differences between male and female within species; Type: numeric
TREECOVER: mean tree cover percentage; Type: numeric
PC1: principal component 1; Type: numeric
PC2: principal component 2; Type: numeric
coefficients-1.csv
Comma separated values table. Contains:
32 obs. of 6 variables:
Variables paired : character: “SSD _ PC2”, “SSD _ PC1”, “SSD _ Min_F”, “PC2 _ Min_F”, “PC1 _ Min_F”, “TREECOVER _ SSD”, “TREECOVER _ PC2”, “TREECOVER _ PC1”, “TREECOVER _ Peak_F”, “SSD _ PC2”, “SSD _ PC1”, “SSD _ Peak_F”, “PC2 _ Peak_F, “PC1 _ Peak_F”, “TREECOVER _ SSD”, “TREECOVER _ PC2”, “TREECOVER _ PC1”, “TREECOVER _ BW”, “SSD _ PC2”, “SSD _ PC1”, “SSD _ BW”, “PC2 _ BW”, “PC1 _ BW”, “TREECOVER _ SSD”, “TREECOVER _ PC2”, “TREECOVER _ PC1”, “TREECOVER _ Max_F”, “SSD _ PC2”, “SSD _ PC1”, “SSD _ Max_F”, “PC2 _ Max_F”, “PC1 _ Max_F”
coefficient: numeric
SE : numeric, Starndard error for coefficient
Lower : numeric, lower 95% coefficient interval
Upper : numeric, upper 95% coefficient interval
birdstrees_100_McTavish.nex
100 phylogenetic trees, each tree with 472 tips and 409 internal nodes.
Rooted; includes branch lengths.
Code/software:
Code_PPA_birdsong-evolution.R
Script for R. The script is documented, explaining the analysis process. It contains code for performing analysis and obtaining figures used in the paper.
Datasets
We collected data for 472 bird species belonging to Passeriformes group (91 Oscines, 94 Suboscines) distributed in Colombia. For each species we obtained song recordings, morphological data, and a proxy of the environment in which they live, tree cover percentage. In addition, we used imputed sexual size dimorphism from Bulla et al. (2020) as a proxy of sexual selection. Details for our dataset are presented below.
Acoustic data
Song recordings were obtained from Macaulay Library and Xenocanto. Recordings from Macaulay were in WAV format, sampling rate: 44KHz, 16 bit. The recordings from Xenocanto are in MP3 format and were transformed with the help of Ocenaudio V. 3.6.3 to comply with the same characteristics of Macaulay´s recordings. All acoustic data were analyzed using Avisoft (Avisoft SAS-LAB Pro V. 5.2, Berlin, Germany). First, sonograms of all recordings were visually inspected to determine their quality (signal-to-noise ratio). Sonogram parameters were: Hamming window, FFT Length 512, frame size 75%, overlap: 50%. After this, recordings that were of sufficient quality were considered in our analysis. At least three different recordings per species, and a minimum of five strophes per recording were analyzed. Strophes were selected in Avisoft by using an automatic selection method with a -30 dB threshold relative to the peak amplitude. This threshold excluded background noise while capturing variation within the frequency characteristics of the song and avoiding bias for manual selection.
Morphological data
A total of nine morphological measurements from males (culmen, bill depth, bill width, gape, wing length, tail length, tarsus length, hallux, body mass) were used to evaluate morphological variation. Measurements were obtained from a published dataset for Colombian species (Montoya et al. 2018), by measuring museum specimens following a standardized protocol (Lopez-Ordoñez et al. 2016), or from a database collected by the Ecology and Evolution of Vertebrates Research Group. Since both the published dataset and the database included several individuals per species, average values were calculated for each species. We visited the Museo Universitario de la Universidad de Antioquia and the Museo de Ciencias Naturales de La Salle in Medellín to collect measurements of museum specimens. OG collected all measurements, and several individuals per species were measured and averaged.
Environment
we used the species distribution polygons from BirdLife (BirdLife International & World 2023) and the tree canopy cover raster data from Hansen et al. (2013). This raster layer is defined as canopy closure for all vegetation higher than five meters, expressed as a percentage within each 30-meter grid cell. To improve computational efficiency, we processed the tree cover data for the Americas using Google Earth Engine and resampled it from its original 30-meter resolution to a 1-kilometer resolution, employing the mean value for each resampled cell. Finally, we applied the zonal statistics tool in ArcGIS Pro (ESRI 2024) to estimate the mean tree cover percentage within the distribution range of each species.
Phylogenetic tree
We selected 472 Passeriform species belonging to different families (228 Oscines, 244 Suboscines). To build a reliable phylogeny, we used a recently published complete bird phylogeny (McTavish et al., 2025) which provides a standardized phylogeny with a robust, validated background. This is suitable in the absence of a complete phylogenetic analysis of all the species in our study. Implementing a random imputation procedure for 10 species that were not included in McTavish phylogeny, a total of 100 different trees were generated with the help of rtrees package in R (Li, 2023). All trees were used in the analysis to account for phylogenetic uncertainty.
All data was analysed in R
