Data from: Genotype predicts quantitative song variety in a chickadee hybrid zone despite limited sampling
Data files
Sep 09, 2025 version files 6.96 GB
-
chickadee-ms-analyses.Rmd
23.43 KB
-
Chickadee-MS-directory.zip
286.28 MB
-
chickadee-ms-figs-rmd.Rmd
32.35 KB
-
Chickadee-MS-output.zip
87.57 MB
-
chickadee-ms-rangewide.Rmd
34.18 KB
-
chickadee-ms-song-measurements.R
23.54 KB
-
chickadee-ms-song-quality-control.R
9.27 KB
-
README.md
21.91 KB
-
Sparrowfoot_Park_original_recordings.zip
6.59 GB
Abstract
In the avian sub-order Passeri (the songbirds), song develops according to both a flexible neural template and auditory input from conspecifics, making innately-constrained characters of song difficult to isolate. In a hybridizing population of Black-capped Chickadees (Poecile atricapillus) and Carolina Chickadees (Poecile carolinensis), we found that genetic ancestry was weakly predictive of a multidimensional measure of song variety (a continuously distributed quantitative alternative to categorical song repertoire size) but did not successfully predict one-dimensional song variety. We used species-diagnostic autosomal markers to genotype 55 individuals inside and outside of the Black-capped Chickadee/Carolina Chickadee hybrid zone in Missouri and Kansas. Using active recording methods, we then obtained high-volume, high-quality song recordings of 10 genotyped chickadees from a single hybrid zone population on a small, lake-bounded peninsula in west-central Missouri. We extracted acoustic data from these recordings to generate measurements of song variety across one, two and three dimensions of multivariate acoustic space for each individual. We tested how well, and in what direction, genetic ancestry predicted song variety for each of these dimensionalities, after predicting that song variety would increase with Carolina Chickadee ancestry. Linear models predicting song variety in two and three dimensions from genetic ancestry ranging from carolinensis-like backcrosses to pure carolinensis explained 41% and 43% of the variation respectively, with slope values in the predicted direction. A linear model predicting song variety in one dimension from genetic ancestry explained 12% of the variation. Our results are suggestive but not conclusive of genetic predispositions for song variety. Our findings provide support for the continued use of multidimensional song variety measurements and offer future directions for tackling the question of the genotype-song relationship in hybrid zones between species with vocal learning.
Dataset DOI: 10.5061/dryad.v6wwpzh6g
Description of the data and file structure
Files and variables
Folders
Sparrowfoot_Park_original_recordings.zip
Description: This folder contains the recordings made at Sparrowfoot Park from which data were extracted in their original state. Recordings are split into folders by individual bird, and each folder is named with the bird's individual ID code used in publication and, in parentheses, an informal name with which it was referred to in the field. Metadata is included verbally at the end of each recording and embedded within the file names in the following format: Poecile.sp_LeftColorBands.RightColorBands_Date_Locality.County.State_RecordistInitials_RecordingSequence. All files are in .wav format.
Chickadee-MS-directory.zip
Description: This folder is the working directory for all scripts associated with this manuscript. All code used to run the analyses for this study can be reproduced using the files within. File descriptions are listed below.
- fig1songs: Folder containing
.wavfiles used to generate the spectrograms in Figure 1 - LateAdditionSongs: Folder containing recordings added to the dataset after the initial song measurement had been run
- ml-recordings: Empty folder intended to house the recordings sourced from the Macaulay Library of Natural Sounds (https://www.macaulaylibrary.org/) used in the range-wide song analysis in Supplemental Materials.
- To obtain the recordings for this subdirectory, submit a data acquisition request with the Macaulay Library (Submit a ticket : Help Center) with the file ml-metadata.csv attached clarifying that you would like to download the recordings in the
ML Catalog Numbercolumn to reproduce analyses from this study. In a few business days, you should receive an email with a link allowing you to download a compressed.tarfile containing all recordings and a metadata sheet that should look identical to ml-metadata.csv. Extract the files and move them to the ml-recordings folder.
- To obtain the recordings for this subdirectory, submit a data acquisition request with the Macaulay Library (Submit a ticket : Help Center) with the file ml-metadata.csv attached clarifying that you would like to download the recordings in the
- xc_recordings: Empty folder intended to house the recordings sourced from xeno-canto (xeno-canto.org) used in the range-wide song analysis in Supplemental Materials.
- To obtain the recordings for this subdirectory, run the code in the first chunk of the chickadee-ms-rangewide.Rmd R Markdown script. This should download the necessary recordings into the xc_recordings folder.
- xc_ml_recordings: Empty folder that should contain all publicly sourced recordings (from both Macaulay Library and xeno-canto). Once all of these recordings are acquired, paste them into this subdirectory before
- Quality_check_spectros: Empty folder into which spectrograms will be written for the visual quality check step of the hybrid zone song analysis in the main manuscript.
- rangewide_spectros: Empty folder into which spectrograms will be written for the visual quality check step of the range-wide song analysis in the Supplemental Materials.
- redlist_species_data_BCCH: Empty folder into which shapefiles from the IUCN Red List website may be downloaded to create the atricapillus portion of the range map in Figure 2.
- To obtain the contents for this folder, navigate to https://www.iucnredlist.org/species/22711716/118687681 and select 'Download' > 'Range data - Polygons (SHP)'. You will be asked to make an account with IUCN Red List and then to provide a description of intended data use. IUCN Red List should approve your data request shortly and you should get an email indicating as such and directing you to the 'Available downloads' page. From there, you should be able to download the necessary files for this folder.
- redlist_species_data_CACH: Empty folder into which shapefiles from the IUCN Red List website may be downloaded to create the carolinensis portion of the range map in Figure 2.
- To obtain the contents for this folder, navigate to https://www.iucnredlist.org/species/22711708/94306849 and select 'Download' > 'Range data - Polygons (SHP)'. You will be asked to make an account with IUCN Red List and then to provide a description of intended data use. IUCN Red List should approve your data request shortly and you should get an email indicating as such and directing you to the 'Available downloads' page. From there, you should be able to download the necessary files for this folder.
- sels_ml: Folder containing selection tables (
.txtfiles) created in and exported from Raven Pro that delineate the portions of a recording that contain song (Macaulay Library recordings) - sels_xc: Folder containing selection tables (
.txtfiles) created in and exported from Raven Pro that delineate the portions of a recording that contain song (xeno-canto recordings) - Sparrowfoot_Park_chopped_recordings: Folder containing
.wavfiles segmented from the raw recordings in Sparrowfoot_Park_original_recordings.zip that each contain a single rendition of song. File names are identical to that of the raw recording from which each recording was chopped, with the addition of a timestamp indicating the start time of that song within the raw recording (e.g., 1.23 for 1 minute, 23 seconds) - structure_11march25: Folder containing STRUCTURE output files
- HZCH_StructureInput_100325.txt: Text file containing the PCR-RFLP molecular marker data and relevant metadata used in the program STRUCTURE. This file has the following structure:
- Row 1: Individual identifying codes for each marker, maintaining the naming system from McQuillan et al. (2017)
- Row 2: Map distances (in base pairs) between the marker in the corresponding column in Row 1 and the sequentially following marker. Physically unlinked markers (those mapping to different chromosome assemblies in the P. atricapillus reference genome) have a value of -1.
- Rows 3-112, Column 1: Individual identifying codes for each bird in the final dataset. Each ID occupies 2 cells for the 2 alleles at each locus.
- Rows 3-112, Column 2: Numeric identifiers for the broad sampling locality of the individual in the corresponding row.
1: Manhattan, Riley County, Kansas, USA2: Lawrence, Douglas County, Kansas, USA3: Lake Quivira, Johnson County, Kansas, USA4: Post Oak Township, Johnson County, Missouri, USA5: Butler Lake, Bates County, Missouri, USA6: Clinton City Park, Henry County, Missouri, USA7: Sparrowfoot Park, Henry County, Missouri, USA8: Bird Song Conservation Area, St. Clair County, Missouri, USA9: Valley Water Mill, Greene County, Missouri, USA10: Table Rock Lake, Stone County, Missouri, USA
- Rows 3-112, Columns 3-11: Numeric identifiers for alleles scored from the PCR-RFLP protocol outlined in the main manuscript, for the individual in the corresponding row and the allele in the corresponding cell of Row 1:
1: atricapillus-specific allele2: carolinensis-specific allele-9: missing data
- ml-metadata.csv: Metadata for recordings downloaded from the Macaulay Library of Natural Sounds (https://www.macaulaylibrary.org/) for the range-wide song analysis in Supplemental Materials. This file is provided by email along with the requested recordings.
- xc-metadata.csv: Metadata for recordings sourced from xeno-canto.org for the range-wide song analysis in Supplemental Materials. This file is automatically downloaded
- structure_data_localities.csv: STRUCTURE output data for each individual combined with sampling locality data. Column header definitions are as follows
ind_num: Individual identifying codes for each bird in the final dataset.pop_ID: Numeric identifiers for the broad sampling locality of the individual in the corresponding row, following the same numbering scheme as HZCH_StructureInput_100325.txtprob_BC: Probability of assignment to the atricapillus-like population cluster generated by STRUCTUREprob_CA: Probability of assignment to the carolinensis-like population cluster generated by STRUCTUREspecies_range: Identifies the range assignment of the individual in the corresponding row.BC: atricapillusCA: carolinensisHZ: hybrid zone
- structure_data_localities_latlong.csv: structure_data_localities.csv with the addition of latitude and longitude for each sampling locality. File structure is identical to structure_data_localities_latlong.csv except for the addition of
latitudeandlongitudecolumns (in decimal degrees). - PCR-RFLP_data_11april2025.csv: PCR-RFLP and sampling locality data for each individual. Column header definitions are as follows:
Ind: Individual identifying codes for each bird in the final dataset.Pop: Identifies the range assignment of the individual in the corresponding row.BC: atricapillusCA: carolinensisHZ_HENMO: hybrid zone
- Headers 3-20: Individual identifying codes for each marker, maintaining the naming system from McQuillan et al. (2017). Columns for each of the two alleles at each locus are named as such:
- Allele 1:
c0p### - Allele 2:
c0p###_2
- Allele 1:
Chickadee-MS-output.zip
Description: This folder contains all of the output files of the scripts in this repository. These can be cross-checked with files written out from the R scripts in the main directory. File descriptions are below.
.pngfiles with the naming structure figure#_[figure-description].png are the figures in the main manuscript.
.pngfiles with the naming structure figureS#_[figure-description].png are the figures in the supplementary materials file.
.txtfiles with the naming structure table#_[table-description].txt are the tables in the main manuscript.
.txtfiles with the naming structure tableS#_[table-description].txt are the tables in the main supplementary materials file..txtfiles with the naming structure [one,two,three]D_[lm,qm]_coef.txt are tables of model coefficients concatenated to make the regression table in Table S7.- The first part of the file name defines the dimensionality of the acoustic space used to generate the 'song variety' response variable in this model (oneD = one-dimensional; twoD = two-dimensional; threeD = three-dimensional).
- The second part of the file name defines the type of model (lm = linear model; qm = quadratic model).
.csvfiles with the naming structure **[lm,qm]_[one,two,three]D_summary.csv **are files containing summary statistics used to label figures displaying linear (Figure 7) and quadratic (Figure S5) regression models.- The first part of the file name defines the type of model (lm = linear model; qm = quadratic model).
- The second part of the file name defines the dimensionality of the acoustic space used to generate the 'song variety' response variable in this model (oneD = one-dimensional; twoD = two-dimensional; threeD = three-dimensional).
.csvfiles with the naming structure HZCH_note-level-measurements_#.csv are the acoustic measurements taken of individual notes of songs recorded from chickadees at Sparrowfoot Park. The number at the end of the file name denotes seqential runs of measurements after quality checking steps that are documented in the measurement script chickadee-ms-song-measurements.R. Column header names indicate the following:note_num: Sequential position of the measured note within the whole songfile_name: Name of the.wavfile associated with the measurements for that rowmax_freq: Maximum dominant frequency (kHz)min_freq: Minimum dominant frequency (kHz)mean_freq: Mean dominant frequency (kHz)median_freq: Median dominant frequency (kHz)sd_freq: Standard deviation of the dominant frequency traceabs_max_slope: Maximum absolute value of the slope of the dominant frequency traceduration: Note duration (s)
.csvfiles with the naming structure HZCH_song-level-measurements_#.csv are the song-level acoustic measurements made using the data from the note-level dataset. The number at the end of the file name denotes seqential runs of measurements after quality checking steps and addition of new recordings that are documented in the measurement script chickadee-ms-song-measurements.R. Column header names indicate the following:file_name: Name of the.wavfile associated with the measurements for that rowind_ID: Individual identification code unique to the bird whose songs are being measured (Abbreviated color band combination)number_notes: Number of notes in the songduration: Song duration (s)max_note_dur: Duration of the longest note in the song (s)min_note_dur: Duration of the shortest note in the song (s)mean_note_dur: Mean of all note durations (s)stdev_note_dur: Standard deviation of all note durationssignal_pause_ratio: Ratio of signal to silence across the temporal domain of the songmax_freq: Highest value of maximum dominant frequencies across notes (kHz)min_freq: Lowest value of minimum dominant frequencies across notes (kHz)stdev_note_max_freq: Standard deviation of note maximum frequencies (kHz)abs_max_slope: Highest of maximum absolute values of the slope of the dominant frequency trace across notesstdev_slope: Standard deviation of maximum absolute values of the slope of the dominant frequency trace across notes
- SPPUA_song_level_measurements_final.csv: Final dataset of acoustic measurements after removal of a few low-quality recordings that were repeatedly measured incorrectly by our custom measurement functions. File structure follows HZCH_song-level-measurements_#.csv.
- Quality_check_spectros: folder containing spectrograms overlain by amplitude profiles and
timer()signal delineation periods of Sparrowfoot Park recordings to check the effectiveness of signal detection thresholds - rangewide_spectros: folder containing spectrograms overlain by amplitude profiles and
timer()signal delineation periods of range-wide atricapillus and carolinensis recordings to check the effectiveness of signal detection thresholds - pc_scores_all_songs.csv: PC scores for the PCA run on the Sparrowfoot Park dataset.
- Column headers with the naming structure
PC#indicate which principal component's scores are contained in that column - The column named
ind_IDcontains individual identification codes unique to the birds in the final dataset (Abbreviated color band combination)
- Column headers with the naming structure
- PCR-RFLP_triangle-plot_data.csv: file with data from PCR-RFLP_data_11april2025.csv with the followng differences:
- atricapillus-like allele is coded as
0and carolinensis-like allele is coded as1 hybrid_index: Added column containing hybrid index values such that pure atricapillus =0and pure carolinensis =1interspecies_heterozygosity: Added column containing interspecies heterozygosity values such that fully homozygous individuals have values of0and fully heterozygous individuals have values of1
- atricapillus-like allele is coded as
- cleaned_metadata_rangewide.csv: Combined metadata from ml-metadata.csv and xc-metadata.csv
- rangewide_song_data.csv: Song-level acoustic measurements of recordings in the range-wide atricapillus and carolinensis dataset. Column headers follow the naming system of HZCH_song-level-measurements_#.csv, other than the replacement of the
ind_IDcolumn with thespeciescolumn. - sels_ml_conc.csv: Concatenated selection tables from the folder sels_ml with added columns that define alterations to be made to the signal filtering and delineation processes determined on a case-by-case basis after checking images in rangewide_spectros. Columns 1-8 follow naming conventions of Raven Pro selection tables (Charif et al. 2010). The remaining column headers indicate the following:
file.type:mp3orwavtaken from the file extension of each recordinguse.y.nIndicator of whether the recording is of high enough quality for use in analyses. CodedYfor yes andNfor no.threshold: Designated amplitude threshold argument in the signal delineation functiontimer().bpf.from: Lower limit for bandpass filter.bpf.to: Upper limit for bandpass filter.
- sels_xc_conc.csv: Concatenated selection tables from the folder **sels_xc **with added columns that define alterations to be made to the signal filtering and delineation processes determined on a case-by-case basis after checking images in rangewide_spectros. File structure is identical to sels_ml_conc.csv.
- singers_all_data.csv: All necessary data to run linear and quadratic models predicting song variety from genotype and to generate Table 2, which summarizes the generation of song variety scores in multiple dimensions of acoustic space. Column headers indicate the following:
ind_num: Individual identification code for each birdind_ID: Abbreviated color band combination for each bird- Columns 3-8 are means and standard errors of repeatedly-generated song variety scores by randomly sampling 100 songs per individual bird. These columns have the following naming system:
[one,two,three]D_[mean,se]- The first part of the name defines the dimensionality of the acoustic space used to generate the song variety score (oneD = one-dimensional; twoD = two-dimensional; threeD = three-dimensional)
- The second part of the name designates mean (
mean) or standard error (se) of repeatedly-generated song variety scores by randomly sampling 100 songs per individual
num_songs: Number of individual songs in the final dataset for that individual birdpop_ID: Population assignment; follows the naming system from HZCH_StructureInput_100325.txtprob_BC: Probability of assigment to the atricapillus-like genetic clusterprob_CA: Probablilty of assignment to the carolinensis-like genetic clusterspecies_range: Range assignment; follows the naming system of structure_data_localities_latlong.csv- Columns 14-16 are means of repeatedly-generated song variety scores by randomly sampling 100 songs per individual bird, scaled across the different dimensional spaces. These columns have the following naming system:
[one,two,three]D_mean_scaled- The first part of the name defines the dimensionality of the acoustic space used to generate the song variety score (oneD = one-dimensional; twoD = two-dimensional; threeD = three-dimensional)
- SPPUA_Hz_Quality-and-exceptions.csv and SPPUA_Hz_Quality-and-exceptions_2.csv define alterations to be made to the signal filtering and delineation processes determined on a case-by-case basis after two rounds of checking images in the folder Quality_check_spectros. Column headings define the following:
file_names: Name of.wavfile segmented to the song levelincludeIndicator of whether the recording is of high enough quality for use in analyses. CodedYfor yes andNfor no.threshold: Designated amplitude threshold argument in the signal delineation functiontimer().trim.s: Time (s) into the recording indicating where to start trim usingcutw()function (to exclude extraneous noise)trim.f: Time (s) into the recording indicating where to end trim usingcutw()function (to exclude extraneous noise)
R Scripts
To reproduce analyses done for this manuscript, the following R scripts should be run in the order listed using the folder 'Chickadee-MS-directory.zip' as a working directory.
File: chickadee-ms-song-quality-control.R
Description: This script does the following: 1) generates spectrograms of all .wav files to be used in analyses to allow for visual evaluation of signal quality; 2) generates a .csv file so that necessary alterations can be documented during spectrogram checks.
File: chickadee-ms-song-measurements.R
Description: This script does the following: 1) extracts acoustic measurements from all .wav files that passed the visual quality check; 2) checks measurements for erroneous values
File: chickadee-ms-analyses.Rmd
Description: This script runs all analyses detailed in the main body of the manuacript and generates tables for the Supplementary Materials.
File: chickadee-ms-figs-rmd.Rmd
Description: This script generates all figures in the main body of the manuscript and Fig. S5 in the Supplementary Materials.
File: chickadee-ms-rangewide.Rmd
Description: This script runs all analyses and generates Figs. S3 and S4 for the range-wide song analysis in the Supplementary Materials.
References
McQuillan, M. A., A. V. Huynh, S. A. Taylor, and A. M. Rice (2017). Development of 10 novel SNP-RFLP markers for quick genotyping within the black-capped (Poecile atricapillus) and Carolina (P. carolinensis) chickadee hybrid zone. Conservation Genetics Resources 9:261–264.
Charif, R. A., L. M. Strickman, and A. M. Waack (2010). Raven Pro 1.4 User’s Manual. The Cornell Lab of Ornithology, Ithaca, NY. Available from https://www.ravensoundsoftware.com/knowledge-base/
