Song and genetic divergence within a subspecies of white-crowned sparrow (Zonotrichia leucophrys nuttalli)
Data files
May 23, 2024 version files 229.24 MB
-
dropped_samples_maf05_missloc80.recode.vcf
207.93 MB
-
Fst_matrix.csv
17.99 KB
-
genetic_sample_coordinates.csv
8.81 KB
-
playback_experiment_raw_data.xlsx
48.14 KB
-
README.md
10.92 KB
-
song_dissimilarity_dataset_recordings.zip
21.19 MB
-
song_dissimilarity_sampling_coordinates.csv
13.69 KB
-
song_trait_dataset.csv
18.92 KB
May 23, 2024 version files 229.24 MB
Abstract
Animal culture evolves alongside genomes, and the two modes of inheritance—culture and genes—interact in myriad ways. For example, stable geographic variation in culture can act as a reproductive barrier, thereby producing genetic divergence between “cultural populations.” White-crowned sparrows (Zonotrichia leucophrys) are a well-established model species for bird song learning and cultural evolution, as they have distinct, geographically discrete, and culturally transmitted song types (i.e., song dialects). In this study, we tested the hypothesis that divergence between culturally transmitted songs drives genetic divergence within Nuttall’s white-crowned sparrows (Z. l. nuttalli). In accordance with sexual selection theory, we hypothesized that cultural divergence between mating signals both preceded and generated genetic divergence.We characterized the population structure and song variation in the subspecies and found two genetically differentiated populations whose boundary coincides with a major song boundary at Monterey Bay, California. We then conducted a song playback experiment that demonstrated males discriminate between songs based on their degree of divergence from their local dialect. These results support the idea that discrimination against non-local songs is driving genetic divergence between the northern and southern populations. Altogether, this study provides evidence that culturally transmitted bird songs can act as the foundation for speciation by sexual selection.
README: Song and genetic divergence within a subspecies of white-crowned sparrow (Zonotrichia leucophrys nuttalli)
These are white-crowned sparrow (Zonotrichia leucophrys) data from Luo et al. 2024. The data are in three parts: genetic data, song recordings and trait data, and playback experiment data. The genetic data include two peripatric white-crowned sparrow subspecies (Z. l. nuttalli and Z. l. pugetensis), and the song and playback experiment data are just from Z. l. nuttalli.
Corresponding author information:
Amy Luo
aluo@vols.utk.edu
Department of Ecology and Evolutionary Biology
University of Tennessee, Knoxville
This dataset includes three components: genetic data, song data, and playback experiment results. For the genetic and song data, genetic or acoustic dissimilarity was calculated between all samples. These dissimilarities were used to map geographic boundaries between groups, so there are files containing coordinates for each sample. There is also a second song dataset used to run a PCA. The last file contains results of a playback experiment that was run to determine whether males discriminated between songs with varying degrees of acoustic dissimilarity.
List of files
This is a brief list of all the files in this dataset. For more details on each file, see the last section of this document, "Descriptions of data."
- dropped_samples_maf05_missloc80.recode.vcf. Genetic data. Filtered single nucleotide polymorphisms (SNPs) data with 26977 loci from 263 individuals derived from genotype-by-sequencing (GBS) reads.
- Fst_matrix.csv. Genetic data. Pairwise Fst values between all sampled localities.
- genetic_sample_coordinates.csv. Geographic data for genetic samples. GPS points where each genetic sample was collected. This sheet includes all samples in dropped_samples_maf05_missloc80.recode.vcf and the samples that were dropped in the filtering process.
- song_dissimilarity_dataset_recordings. Song data. Folder with the 175 songs (each wav file contains one song) comprising the song dissimilarity dataset.
- song_dissimilarity_sampling_coordinates.csv. Geographic data for song data. GPS points where each song in the song dissimilarity dataset was recorded.
- song_trait_dataset.csv. Song data. Second song dataset. Trait measurements of the 113 songs used in the PCA.
- playback_experiment_raw_data.xlsx. The raw data from the playback experiment.
Usage notes
- The csv and xlsx files are spreadsheets that can be accessed with, for example, LibreOffice.
- wav files in the song_dissimilarity_dataset_recordings folder can be opened with any software created to play music or bioacoustics recordings, though we viewed them in Audacity.
- vcf files can be viewed in text editors like Sublime Text. vcf files, including the one in this dataset, are often large. They may take a moment to load.
Sharing/Access Information
Some of the songs in the song dissimilarity dataset were extracted from recordings on Xeno-Canto
Descriptions of data
All the data are from the West Coast of the United States, in the coastal scrub habitat of these two subspecies of white-crowned sparrow. The genetic data were collected in 2004, 2005, and 2014. Songs from the song dissimilarity dataset were recorded 2010-2022, and songs from the song trait dataset were recorded 2004-2014. The playback experiment was run in the spring of 2022.
Details on genetic data
dropped_samples_maf05_missloc80.recode.vcf
Includes 251 individuals from Z. l. nuttalli, Z. l. pugetensis, and their hybrids. It also includes 12 individuals from outgroups, which are two other white-crowned sparrow subspecies (Z. l. gambellii and Z. l. oriantha) and the closely-related golden-crowned sparrow (Z. atricapilla). Biallelic SNPs were called using the STACKS pipeline. To produce dropped_samples_maf05_missloc80.recode.vcf, we filtered out SNPs with a minor allele frequency of less than five percent and individuals missing over 80 percent of loci.
Fst_matrix.csv
Using the filtered data, we calculated the pairwise Fst between all the localities sampled, including Z. l. pugetensis, Z. l. nuttalli and their hybrid zone. We used the hierfstat package in R to calculate these values, which compare genetic variation within and between populations. Each cell in the matrix contains the pairwise dissimilarity between the localities in that row and column; higher values indicate high between-population variation, relative to within-population variation.
genetic_sample_coordinates.csv
Contains three columns:
- key: The ID of the genetic sample.
- lat: Latitude.
- lon: Longitude.
Details on song data
song_dissimilarity_dataset_recordings
Contains wav files that were used to calculate acoustic dissimilarity across song dialects using a dynamic time warping algorithm implemented in the warbleR package in R. Each wav file contains an individual song. The acoustic dissimilarity was used to estimate the presence and strength of cultural boundaries across the Z. l. nuttalli range.
song_dissimilarity_sampling_coordinates.csv
Contains coordinates for the songs in song_dissimilarity_dataset_recordings, with these variables:
- site: The locality and dialect of the recording.
- genetic_group: The genetic population of the male that sang the song, as determined by the genetic analyses in this study.
- male: The ID of the singer.
- song: The ID of the song.
- lat: Latitude.
- lon: Longitude.
- year: The year that the song was recorded.
song_trait_dataset.csv
Includes all the songs and measurements used to run the song trait PCA. Each row contains the traits for one song, and the variables are as listed below:
- bird: The ID of the male that sang the song.
- band: The USGS Bird Banding Lab band number. Some of the birds were banded before recording. If they were, this column contains their band number.
- group: The genetic group of the singer, as determined by the genetic analyses in this study. There is a northern population, southern population, and an area in which the two populations meet and admix.
- song_length: Length of the entire song, in seconds.
- whistle_length: Length of the introductory whistle, in seconds. All white-crowned sparrow songs start with a whistle note.
- avg_com_length: Average length of each complex note, in seconds. Not all song dialects have complex notes, but many have one or multiple complex notes. This measure is the average length of each complex note, when there are multiple.
- trill_length: Length of the trill, in seconds. Most, but not all, songs have trills, or rapid simple notes. This is the length of the entire trill.
- avg_trill_dur: Average length of each trill note, in seconds. This is the average length of each trill note in the trill.
- trill_rate: Notes per second in the trill.
- song_bandwd: Frequency bandwidth of the entire song (difference between maximum and minimum frequencies), in Hz.
- com_bandwd: Frequency bandwidth of the complex notes (difference between maximum and minimum frequencies), in Hz.
- trill_bandwd: Frequency bandwidth of the trill (difference between maximum and minimum frequencies), in Hz.
- song_max_f: Maximum frequency of the entire song, in Hz.
- song_min_f: Minimum frequency of the entire song, in Hz.
- com_max_f: Maximum frequency of the complex notes, in Hz.
- com_min_f: Minimum frequency of the complex notes, in Hz.
- trill_max_f: Maximum frequency of the trill, in Hz.
- trill_min_f: Minimum frequency of the trill, in Hz.
- whistle_dom_f: Dominant frequency of the introductory whistle, in Hz.
- term_buzz_length: Length of the terminal buzz, in seconds. Not all song dialects have a terminal buzz.
Details on the playback experiment data
Each male was given three song stimulus treatments, given three-minute playback periods, and up to 15 males were tested at each site (two sites each in the northern population, southern population, and admixture zone).
All the data from this experiment are in playback_experiment_raw_data.xlsx. Below is a description of the sheets in this file.
The "raw_data" sheet contains the dataset used for analysis. Two localities ("site") were tested in each genetic population/admixture zone ("genetic_group"). Each male is represented in three rows, showing the results from each treatment. "order" indicates the position of the treatment in the sequence of three stimuli. In the treatment column, "same_group" is a song from the same genetic population but not the local dialect, and "different_group" is a song from the other population. For males at the admixed sites, songs from a different group were drawn from either the northern or southern population, as indicated in "admix_treatment". The variables are listed below:
- genetic_group: The genetic group of the focal male.
- site: The location of the experimental trial, and also the dialect of the focal male.
- date: Date that the male was tested.
- order: Whether the treatment was played first, second, or third.
- treatment: The name of the treatment, which is the relationship between the focal male and the stimulus dialect.
- fst: Fst, a population-level measure of genetic distance between the focal male's locality and the stimulus' origin locality
- songs: The number of times the male sang in response to the stimulus during the playback period.
- flights: The number of times the male flew over or at the speaker.
- distance: The average distance between the male and the speaker, in meters, during the playback period.
- post_distance: The average distance between the male and the speaker, in meters, for three minutes after the playback period.
The "distance" sheet contains the individual distance estimates at every 10 or 20 second interval. Distances were estimated at 10 second intervals during the playback period and 20 second intervals in the post-playback period, then averaged across the measurement period. These averages are the values found in "raw_data" Columns C-T are the distance estimates during the playback period, and columns U-AC are the distance estimates during the post-playback period.
The "toss" and "toss_distances" sheets are the same as "raw_data" and "distances", respectively. These trials were not used in the analysis because of some error in the field (e.g., setting up on a territorial border between two males or possibly testing the same male twice on different days).