Machine learning reveals that climate, geography, and cultural drift all predict bird song variation in coastal Zonotrichia leucophrys

Yang, Jiaying1 ; Provost, Kaiya 2 ; Carstens, Bryan3

Published Dec 15, 2023 on Dryad. https://doi.org/10.5061/dryad.tx95x6b4j

Data files

Dec 15, 2023 version files 58.12 MB

Abstract

Previous work has demonstrated that there is extensive variation in the songs of White-crowned Sparrow (Zonotrichia leucophrys) throughout the species range, including between neighboring (and genetically distinct) subspecies Z. l. nuttalli and Z. l. pugetensis. Using a machine learning approach to bioacoustic analysis, we demonstrate that variation in song is correlated with year of recording (representing cultural drift), geographic distance, and climatic differences, but the response is subspecies- and season-specific. Automated machine learning methods of bird song annotation can process large datasets more efficiently, allowing us to examine 1,913 recordings across ~60 years. We utilize a recently published artificial neural network to automatically annotate White-crowned Sparrow vocalizations. By analyzing differences in syllable usage and composition, we recapitulate the known pattern where Z. l. nuttalli and Z. l. pugetensis have significantly different songs. Our results are consistent with the interpretation that these differences are caused by the changes in characteristics of syllables in the White-crowned Sparrow repertoire. This supports the hypothesis that the evolution of vocalization behavior is affected by the environment, in addition to population structure.

Authors:

Jiaying Yang, Kaiya Provost, Bryan Carstens
Last updated: 15 December 2023

This README file describes the files associated with the above publication. For any
questions or comments, please contact Kaiya L. Provost at kprovost@adelphi.edu.

This dataset is used to replicate our analyses for understanding the relative role
of climate, geography, and cultural drift on song in White-crowned Sparrows. We used a
previously-published Machine Learning framework to automatically annotate syllables of
~1,900 recordings from the 1960s to the 2010s. We find that in the two subspecies we
examined, climate, geographic distance, and cultural drift significantly predict song
traits.

Below we describe each dataset used, its contents, and its file types.

##########################################################################################

SCRIPTS

##########################################################################################

Scripts to perform these calculations are found here:
https://github.com/kaiyaprovost/bioacoustics

##########################################################################################

DATA

##########################################################################################

There are four main sections of the “DATA” portion of this README:
“TrainedTweetyNetModel.zip”, “Annotations.zip”, and “Textfiles.zip”.

************ TrainedTweetyNetModel.zip ***************

This zipped folder contains the finished machine learning models for the study.
Includes two checkpoint files “checkpoint.pt” and “max-val-acc-checkpoint.pt”, a
“labelmap.json” describing the labels to be predicted, and a “StandardizeSpect” which
gives the standardization protocol for the spectrograms used as inputs. The two checkpoint
files differ in that “checkpoint.pt” is the last checkpoint output, while the
“max-val-acc-checkpoint.pt” is the final, highest-accuracy model. This latter is the one
that should be used for future analyses.

The zipped folder also contains “ZL2023_prep_230531_163934.csv”. This file give the
specific parts of each recording used to train, test, and validate the TweetyNet model.

This file has the following labels:

“audio_path” = the file path for the wavfile used
“spect_path” = the file path for the spectrogram generated
“annot_path” = the file path for the Annotations file
“annot_format” = the format of the annotations, always birdsong-recognition-dataset
“duration” = the duration of the recording in seconds
“timebin_dur” = the length of a single time bin of the spectrogram in seconds
“split” = whether the recording was used in the training (train), test, or validation
(val) dataset during model training

************** Annotations.zip **************

This folder contains two subfolders which hold annotations per recording file. Files have
one of the following format:
“[taxon].[BLB/XC].[#].resample.48000.wav_[start].[end].Table.1.selections.txt.gz”
or
“[taxon].[BLB/XC].[#].resample.48000.selections.MASTER.txt.gz”

“taxon” gives the subspecies examined, “#” gives the number assigned by the collection
(Borror Lab of Bioacoustics or Xeno-Canto), “start” gives the sample in the original WAV
file that the individual song begins on, and “end” gives the sample in the original WAV
file that the individual song ends on.

These files always have the following labels:

“Selection” = the number of the spectrogram
“View” = whether the view is Spectrogram or Waveform
“Channel” = the channel of the annotation
“Begin Time (s)” = when the annotation starts in seconds
“End Time (s)” = when the annotation starts in seconds
“Low Freq (Hz)” = the lowest frequency of the annotation in Hz
“High Freq (Hz)” = the highest frequency of the annotation in Hz
“TYPE” = what kind of syllable it is. Types include whistle (W), buzz (B), trill (T),
call (C), and complex syllable (S).

The files may also have the following summary statistics calculated, with units given:

“Agg Entropy (bits)”
“Avg Amp (U)”
“Avg Entropy (bits)”
“Avg Power Density (dB FS/Hz)”
“BW 50% (Hz)”
“BW 90% (Hz)”
“Begin Clock Time”
“Begin Date”
“Begin Date Time”
“Begin File”
“Beg File Samp (samples)”
“Begin Hour”
“Begin Path”
“Begin Sample (samples)”
“Center Freq (Hz)”
“Center Time (s)”
“Center Time Rel.”
“Delta Freq (Hz)”
“Delta Power (dB/Hz)”
“Delta Time (s)”
“Dur 50% (s)”
“Dur 90% (s)”
“End Clock Time”
“End Date”
“End File”
“End File Samp (samples)”
“End Path”
“End Sample (samples)”
“Energy (dB FS)”
“File Offset (s)”
“F-RMS Amp (U)”
“Freq 25% (Hz)”
“Freq 5% (Hz)”
“Freq 75% (Hz)”
“Freq 95% (Hz)”
“Freq Contour 5% (Hz)”
“Freq Contour 25% (Hz)”
“Freq Contour 50% (Hz)”
“Freq Contour 75% (Hz)”
“Freq Contour 95% (Hz)”
“Inband Power (dB FS)”
“Length (frames)”
“Leq (dB FS)”
“Max Amp (U)”
“Max Bearing (deg)”
“Max Entropy (bits)”
“Max Freq (Hz)”
“Max Time (s)”
“Min Amp (U)”
“Min Entropy (bits)”
“Min Time (s)”
“Peak Amp (U)”
“Peak Corr (U)”
“Peak Freq (Hz)”
“Peak Freq Contour (Hz)”
“PFC Avg Slope (Hz/ms)”
“PFC Max Freq (Hz)”
“PFC Max Slope (Hz/ms)”
“PFC Min Freq (Hz)”
“PFC Min Slope (Hz/ms)”
“PFC Num Inf Pts”
“PFC Slope (Hz/ms)”
“Peak Lag (s)”
“Peak Power Density (dB FS/Hz)”
“Peak Time (s)”
“Peak Time Relative”
“RMS Amp (U)”
“SEL (dB FS)”
“SNR NIST Quick (dB)”
“Sample Length (samples)”
“Time 25% (s)”
“Time 25% Rel.”
“Time 5% (s)”
“Time 5% Rel.”
“Time 75% (s)”
“Time 75% Rel.”
“Time 95% (s)”
“Time 95% Rel.”

ManualAnnotations.zip
This zipped file contains all of the annotations that were manually assigned per file.
These are always of the following format:
“[taxon].[BLB/XC].[#].resample.48000.wav_[start].[end].Table.1.selections.txt.gz”
PredictedAnnotations.zip
This zipped file contains all of the annotations that were assigned by TweetyNet per
file. These are always of the following format:
“[taxon].[BLB/XC].[#].resample.48000.selections.MASTER.txt.gz”

************** TEXTFILES *****************

This section contains all of the data textfiles needed to replicate our analyses.

“climate_per_locality.csv”

Headers are as follows:

“LATITUDE” = latitude, in decimal degrees
“LONGITUDE” = longitude, in decimal degrees
“bio_1” = WorldClim bioclim variable 1, mean annual temperature (°C)
“bio_2” = WorldClim bioclim variable 2, mean diurnal temperature range (°C)
“bio_3” = WorldClim bioclim variable 3, isothermality (mean temperature / range
temperature) (°C)
“bio_4” = WorldClim bioclim variable 4, temperature seasonality (st. dev x 100) (°C)
“bio_5” = WorldClim bioclim variable 5, max temperature of warmest month (°C)
“bio_6” = WorldClim bioclim variable 6, min temperature of coldest month (°C)
“bio_7” = WorldClim bioclim variable 7, temperature annual range (°C)
“bio_8” = WorldClim bioclim variable 8, mean temperature of wettest quarter (°C)
“bio_9” = WorldClim bioclim variable 9, mean temperature of driest quarter (°C)
“bio_10” = WorldClim bioclim variable 10, mean temperature of warmest quarter (°C)
“bio_11” = WorldClim bioclim variable 11, mean temperature of coldest quarter (°C)
“bio_12” = WorldClim bioclim variable 12, annual precipitation (mm)
“bio_13” = WorldClim bioclim variable 13, precipitation of wettest month (mm)
“bio_14” = WorldClim bioclim variable 14, precipitation of driest month (mm)
“bio_15” = WorldClim bioclim variable 15, precipitation seasonality (coefficient of
variation)
“bio_16” = WorldClim bioclim variable 16, precipitation of wettest quarter (mm)
“bio_17” = WorldClim bioclim variable 17, precipitation of driest quarter (mm)
“bio_18” = WorldClim bioclim variable 18, precipitation of warmest quarter (mm)
“bio_19” = WorldClim bioclim variable 19, precipitation of coldest quarter (mm)

“song_metadata.txt”

Headers are as follows:

“COLLECTION” = either BLB for Borror Lab or XC for Xeno-Canto
“ID” = numerical ID assigned by collection
“COLLID” = previous two columns combined
“SEASON” = breeding (summer) or non-breeding (winter)
“GENUS” = the genus of the sample
“SPECIES” = the species of the sample
“SUBSPECIES” = the subspecies of the sample used in our analysis
“ORIGINALSUBSPECIES” = the original assigned subspecies in the database
“SUBSPPASSIGNED?” = whether the subspecies was changed by our study
“COUNTRY” = country of the recording
“STATE” = state/province of the recording
“COUNTY” = county of the recording
“LATITUDE” = latitude, decimal degrees
“LONGITUDE” = longitude, decimal degrees`
“NPHYBRIDZONE” = if sample is present in the nuttalli-pugetensis hybrid zone
“GLHYBRIDZONE” = if sample is present in the gambelli-leucophrys hybrid zone
“ELEVATION” = recording elevation (m)
“DAY” = day recording made
“MONTH” = month recording made
“YEAR” = year recording made
“SEX” = sex of recorded individual, male (M), female (F), or unknown (U)

“song_pca_importanceRotationMetadata.csv”

Headers for this file are PC1 through PC15 and describe the fifteen estimated principal components.

The following row labels represent importance or metadata variables:

“StDev” = standard deviation of PC
“Prop Var” = proportion of variance explained by PC
“CumulProp” = cumulative proportion of variance explained
“BrokenStick” = value expected under broken stick model
“PCKept” = whether PC kept under broken stick model
“eigenvalues” = eigenvalues of PC

The following row labels represent loading variables onto each PC

“Bandwidth” = the bandwidth of the syllable (Hz)
“Time” = the time duration of the syllable (s)
“Center” = the center frequency of the syllable (Hz)
“Inflection” = the number of inflection points in the syllable
“Slope” = the average slope of the syllable (Hz/s)
“intro_duration” = the length of the introductory phrase (s)
“mean_syll_duration” = the mean syllable duration (s)
“song_dom_whistle_first” = the dominant frequency of the first whistle (Hz)
“song_dom_whistle_mean” = the mean dominant frequency of all whistles (Hz)
“song_max_dom” = the maximum dominant frequency (Hz)
“song_min_dom” = the minimum dominant frequency (Hz)
“trill_bw” = the trill bandwidth (Hz)
“trill_duration” = the trill duration (s)
“trill_rate” = the trill rate (syllables/s)
“whistle_duration” = the duration of the whistle (s)

“spectroAnalysis_withinIndividual_data.txt”

“selec” = the number of the spectrogram
“sound.files” = the WAV file used in the analysis
“View” = whether the view is Spectrogram or Waveform
“Channel” = the channel of the annotation
“start” = when the annotation starts (s)
“end” = when the annotation starts (s)
“bottom.freq” = the lowest frequency of the annotation (Hz)
“top.freq” = the highest frequency of the annotation (Hz)
“Begin.File” = the location of the file used
“type” = what kind of syllable it is
“selec.file” = the selection file used in the analysis
“difference” = the difference in start and end time (s)
“gapprev” = how many seconds are in between this and the previous syllable (s)
“gapnext” = how many seconds are in between this and the next syllable (s)
“paths” = folder paths used for the analysis
“raw_countour” = the raw frequency contour
“slopes” = the raw slopes (Hz/s)
“mean_slope” = the mean slope (Hz/s)
“inflections” = the number of inflection points
“ffreq.XXX” where XXX is 001 to 100 = the fundamental frequency at the sample representing the XXX’th percentile (Hz)

The following headers represent summary statistics of sound properties from
spectro_analysis:

“duration” = the length of the syllable (in seconds)
“meanfreq” = the mean frequency (kHz)
“sd” = the standard deviation of frequency (kHz)
“freq.median” = the median frequency (kHz)
“freq.Q25” = the 25th percentile frequency (kHz)
“freq.Q75” = the 75th percentile frequency (kHz)
“freq.IQR” = the inter-quartile range of frequency (kHz)
“time.median” = the median time (s)
“time.Q25” = the 25th percentile time (s)
“time.Q75” = the 75th percentile time (s)
“time.IQR” = the inter-quartile range of time (s)
“skew” = the skewness of frequency
“kurt” = the kurtosis of frequency
“sp.ent” = spectral entropy
“time.ent” = time entropy
“entropy” = spectrographic entropy
“sfm” = spectral flatness
“meandom” = mean dominant frequency (kHz)
“mindom” = minimum dominant frequency (kHz)
“maxdom” = maximum dominant frequency (kHz)
“dfrange” = dominant frequency range (kHz)
“modindx” = modulation index
“startdom” = starting dominant frequency (kHz)
“enddom” = ending dominant frequency (kHz)
“dfslope” = dominant frequency slope
“meanpeakf” = mean peak frequency (kHz)
“ffreq.Q05” = 5th percentile fundamental frequency (kHz)
“ffreq.Q25” = 25th percentile fundamental frequency (kHz)
“ffreq.median” = median fundamental frequency (kHz)
“ffreq.Q75” = 75th percentile fundamental frequency (kHz)
“ffreq.Q95” = 95th percentile fundamental frequency (kHz)

The following headers represent summary statistics of sound properties from Raven Pro:
“Agg Entropy (bits)”
“Avg Amp (U)”
“Avg Entropy (bits)”
“Avg Power Density (dB FS/Hz)”
“BW 50% (Hz)”
“BW 90% (Hz)”
“Begin Clock Time”
“Begin Date”
“Begin Date Time”
“Begin File”
“Beg File Samp (samples)”
“Begin Hour”
“Begin Path”
“Begin Sample (samples)”
“Center Freq (Hz)”
“Center Time (s)”
“Center Time Rel.”
“Delta Freq (Hz)”
“Delta Power (dB/Hz)”
“Delta Time (s)”
“Dur 50% (s)”
“Dur 90% (s)”
“End Clock Time”
“End Date”
“End File”
“End File Samp (samples)”
“End Path”
“End Sample (samples)”
“Energy (dB FS)”
“File Offset (s)”
“F-RMS Amp (U)”
“Freq 25% (Hz)”
“Freq 5% (Hz)”
“Freq 75% (Hz)”
“Freq 95% (Hz)”
“Freq Contour 5% (Hz)”
“Freq Contour 25% (Hz)”
“Freq Contour 50% (Hz)”
“Freq Contour 75% (Hz)”
“Freq Contour 95% (Hz)”
“Inband Power (dB FS)”
“Length (frames)”
“Leq (dB FS)”
“Max Amp (U)”
“Max Bearing (deg)”
“Max Entropy (bits)”
“Max Freq (Hz)”
“Max Time (s)”
“Min Amp (U)”
“Min Entropy (bits)”
“Min Time (s)”
“Peak Amp (U)”
“Peak Corr (U)”
“Peak Freq (Hz)”
“Peak Freq Contour (Hz)”
“PFC Avg Slope (Hz/ms)”
“PFC Max Freq (Hz)”
“PFC Max Slope (Hz/ms)”
“PFC Min Freq (Hz)”
“PFC Min Slope (Hz/ms)”
“PFC Num Inf Pts”
“PFC Slope (Hz/ms)”
“Peak Lag (s)”
“Peak Power Density (dB FS/Hz)”
“Peak Time (s)”
“Peak Time Relative”
“RMS Amp (U)”
“SEL (dB FS)”
“SNR NIST Quick (dB)”
“Sample Length (samples)”
“Time 25% (s)”
“Time 25% Rel.”
“Time 5% (s)”
“Time 5% Rel.”
“Time 75% (s)”
“Time 75% Rel.”
“Time 95% (s)”
“Time 95% Rel.”

“syllableType_and_spectroAnalysis_centroids_PCA_data.txt”

“Ind” = the individual as subspecies.collection.number
“Bandwidth” = the bandwidth of the syllable (Hz)
“Time” = the time duration of the syllable (s)
“Center” = the center frequency of the syllable (Hz)
“Inflection” = the number of inflection points in the syllable
“Slope” = the average slope of the syllable
“intro_duration” = the length of the introductory phrase (s)
“mean_syll_duration” = the mean syllable duration (s)
“song_dom_whistle_first” = the dominant frequency of the first whistle (Hz)
“song_dom_whistle_mean” = the mean dominant frequency of all whistles (Hz)
“song_max_dom” = the maximum dominant frequency (Hz)
“song_min_dom” = the minimum dominant frequency (Hz)
“trill_bw” = the trill bandwidth (Hz)
“trill_duration” = the trill duration (s)
“trill_rate” = the trill rate (syllables/s)
“whistle_duration” = the duration of the whistle (s)
“PCX” where X is 1-15 = the X’th principal component of the above data

“syllableType_individualAverage_data.csv”

“base.selec.file” = the base information of the recording as taxon.collection.number
“subspecies” = the identified subspecies
“collection” = the collection
“id” = the numerical ID in the collection
“intro_duration” = the length of the introductory phrase (s)
“mean_syll_duration” = the mean syllable duration (s)
“song_dom_whistle_first” = the dominant frequency of the first whistle (Hz)
“song_dom_whistle_mean” = the mean dominant frequency of all whistles (Hz)
“song_max_dom” = the maximum dominant frequency (Hz)
“song_min_dom” = the minimum dominant frequency (Hz)
“trill_bw” = the trill bandwidth (Hz)
“trill_duration” = the trill duration (s)
“trill_rate” = the trill rate (syllables/s)
“whistle_duration” = the duration of the whistle (s)

Machine learning reveals that climate, geography, and cultural drift all predict bird song variation in coastal Zonotrichia leucophrys

Data files

Abstract

README: Data for: Machine learning reveals that climate, geography, and cultural drift all predict bird song variation in coastal Zonotrichia leucophrys.

SCRIPTS

DATA

“climate_per_locality.csv”

“song_metadata.txt”

“song_pca_importanceRotationMetadata.csv”

“spectroAnalysis_withinIndividual_data.txt”

“syllableType_and_spectroAnalysis_centroids_PCA_data.txt”

“syllableType_individualAverage_data.csv”

Works referencing this dataset