Data from: Widespread cultural change in declining populations of Amazon parrots
Data files
Jul 15, 2024 version files 9.97 MB
-
acoustic_measurements.csv
1.14 MB
-
AmazonVocalCulture_SelectionTableMetadata_31May2023.csv
477.11 KB
-
AmazonVocalCulture_SiteLevelMetadata_31May2023.csv
4.12 KB
-
MDS_RandomForests_coordinates.csv
243.13 KB
-
MDS_SPCC_coordinates.csv
261.54 KB
-
MFCC_descriptiveStats.csv
7.83 MB
-
README.md
17.47 KB
Abstract
This dataset of parrot call measurements and metadata is associated with the article "Widespread cultural change in declining populations of Amazon parrots" in Proceedings of the Royal Society B. The data was used to address change and stability in regional vocal dialects of yellow-naped amazon (Amazona auropalliata) contact calls recorded in Costa Rica over three sampling periods that spanned 22 years.
This README.md file was generated on 01 June 2023 and updated on 15 July 2024 by Grace Smith-Vidaurre.
GENERAL INFORMATION
Title of Dataset: Vocal culture in Amazon parrots
Corresponding Author Information:
Christine Dahlin
Department of Biology
University of Pittsburgh at Johnstown
Johnstown, PA, USA
cdahlin@pitt.edu
Dates of data collection: June 2016, with data used from previous work in 1994 and 2005
Geographic location of data collection: Multiple sites along the Pacific coast of Costa Rica
Information about funding sources or sponsorship that supported the collection of the data:
The World Parrot Trusts Parrot Action Grants, New Mexico States College of Arts and Sciences, the University of Pittsburghs Central Research Development Grant Fund, and the University of Pittsburgh at Johnstowns Faculty College Research Council Grant
SHARING/ACCESS INFORMATION
Licenses/restrictions placed on the data, or limitations of reuse: We place no restrictions on reuse of the data we have made available. Raw audio files and roost count data used in these analyses are available from the authors upon request.
Recommended citation for the data: Dahlin, Christine; Smith-Vidaurre, Grace; Genes, Molly; Wright, Timothy (Forthcoming 2024). Vocal culture in Amazon parrots [Dataset]. Dryad. https://doi.org/10.5061/dryad.m905qfv6j
Links to other publicly accessible locations of the data: The data is not available elsewhere.
Links to related files: The code we used to analyze this data is available in the GitHub repository gsvidaurre/amazon-vocal-culture (https://github.com/gsvidaurre/amazon-vocal-culture)
DATA & FILE OVERVIEW
We provide metadata for the pre-processed dataset of calls that used for quantitative analyses, as well as metadata about the sites at which we recorded. We also provide files with measurements that we generated and used for subsequent analyses. Additional metadata on the larger dataset of calls prior to pre-processing is available in the accompanying manuscript.
In this work we rely on audio files from earlier work that are not publicly available:
- Wright, T.F. (1996). Regional dialects in the contact call of a parrot. Proceedings of the Royal Society of London, B, 263, 867872. https://doi.org/10.1098/rspb.1996.0128
- Wright, T.F., Dahlin, C.R., & Salinas-Melgoza, A. (2008). Stability and change in vocal dialects of the yellow-naped amazon. Animal Behavior, 76(3), 10171027. https://doi.org/10.1016/j.anbehav.2008.03.025.Stability
Finally, for this work we also rely on roost count data from the following publication that is not publicly available:
- Wright, T.F., Lewis, T.C., Lezama-Lopez, M., Smith-Vidaurre, G., & Dahlin, C.R. (2019). Yellow-naped Amazon Amazona auropalliata populations are markedly low and rapidly declining in Costa Rica and Nicaragua. Bird Conservation International, 29, 2, 291-307. https://doi.org/10.1017/S0959270918000114
Here we make call measurement and metadata available for calls obtained in previously published work in 1996 (recordings from 1994) and 2008 (recordings from 2005), as well as more recent fieldwork in 2016.
DATA-SPECIFIC INFORMATION
- AmazonVocalCulture_SelectionTableMetadata_31May2023.csv: This is a selection table spreadsheet with call-level metadata in .csv format. The spreadsheet has 2461 rows and 21 columns. Each row corresponds to a different call that was manually selected from original recordings and saved as a separate audio file. The spreadsheet includes the following columns:
- sound.files: The sound file name of the audio file that contains the given call. The suffixes "resamp2019_ampnorm2019" were added to indicate how audio files were standardized prior to quantitative analysis
- selec: The selection ID of the call in the given audio file. Note that this is always 1, since each call was contained in a separate cut of original recordings
- start: The start coordinate of each call (seconds) within the audio file after to tailoring temporal coordinates (this coordinate was used for analyses)
- end: The end coordinate of each call (seconds) within the audio file after to tailoring temporal coordinates (this coordinate was used for analyses)
- original_start: The start coordinate of each call (seconds) within the audio file prior to tailoring temporal coordinates
- original_end: The end coordinate of each call (seconds) within the audio file prior to tailoring temporal coordinates
- Year: The year in which the call was recorded
- new_site_code: The site code for the given call, harmonized across years (this code may not match the site code in the sound file name)
- site_year: An alphanumeric code generated by combining the new_site_code and Year columns
- full_site_nm: The full name of the recording site, also harmonized across years
- orig_site_code: The original site code. This is not harmonized across years
- BirdID: The unique individual ID of each bird sampled within a unique site-year. Determined when manually selecting calls and generating cuts of original recordings
- CallTypesVariants: The call type or variant label from visual and aural classification. The call type is either a historic dialect call type/variant (North, South, Nica-A, Nica-B) or a new variant classified in 2016 (North-A, South-A, South-B, South-C)
- Visual_Scoring_Initials: The initials of two co-authors who performed quality scoring by visual inspection of visible patterns of background noise. Visual inspection yielded information for the next 2 columns
- Visual_Quality_Score: A quality score (H = high, M = medium, L = low) determined through visual inspection of background noise
- Overlapping_Signal: Does another acoustic signal from the same or another species overlap the given call? Y = yes, N = No
- sampling_rate: The sampling rate after normalization to 22.050kHz
- bit: The bit depth after normalization to 16 bits
- bottom.freq: The lower limit of the bandpass filter in kHz used for call measurements
- top.freq: The upper limit of the bandpass filter in kHz used for call measurements
- tailored: Was this signal tailored using the warbleR package? y = Yes, blank = No
- AmazonVocalCulture_SiteLevelMetadata_31May2023.csv: This is a spreadsheet of site-level metadata in .csv format. The spreadsheet has 56 rows and 14 columns. Each row has metadata for a different site-year. This site-level metadata is a mix of metadata from call classification prior to pre-processing, and checking call labels / sample sizes after pre-processing. The column indicating whether more than 1 call type was used was generated following quality control processing. Note that the North dialect calls for the site PEBL (Penas Blancas) were removed during quality control processing prior to quantitative analyses, so in this spreadsheet PEBL is scored as having only a single call type (Supplementary Table 2 contains summary statistics about both call types used at this site prior to quality control processing). Also note that throughout this spreadsheet, the call variant “Nica-A” is the same as “Nicaragua” and the call variant “Nica-B” is the same as “New Nica” in the main text and supplement of the accompanying article. Finally, the specific geographic coordinates for sites have not been made available, but this information can be requested as needed by emailing the lead author (see email address above).
- Country: The country where the site was sampled
- Year: The sampling year
- Site_Code: The site code harmonized across years
- Site_Year: The unique site and year
- Full_Site_Name: The full name of the recording site, also harmonized across years
- Two_CallTypes_Used: Were two call types identified during visual classification? Yes or No. This column was generated after quality control processing
- Call_Type_1: The call type (from visual and aural classification) with the most number of calls for the given site-year, taken from the pre-processed dataset right now
- Call_Type_2: The call type (from visual and aural classification) with the second-most number of calls for the given site-year, taken from the pre-processed dataset right now. This column is set to NA when a second call type was not identified
- Call_Type_3: The call type (from visual and aural classification) with the third-most number of calls for the given site-year, taken from the pre-processed dataset right now. This column is set to NA when a third call type was not identified
- Call_Type_4: The call type (from visual and aural classification) with the least number of calls for the given site-year, taken from the pre-processed dataset right now. This column is set to NA when a fourth call type was not identified
- Number_Calls: The number of calls for the given site-year. Note that this sample size refers to the pre-processed dataset used for quantitative analyses
- Number_Birds: The number of birds sampled for the given site-year. Note that this sample size refers to the pre-processed dataset used for quantitative analyses
- acoustic_measurements.csv: This is a spreadsheet of standard acoustic measurements in .csv format for the pre-processed call dataset. The spreadsheet has 2461 rows and 29 columns. Each row is a different call. The first two columns are metadata columns, followed by 27 numeric acoustic measurements. More information about these call measurements can be found in the supplement of the accompanying manuscript, as well as documentation for the function spectro_analysis() in the warbleR package.
- sound.files: The sound file name of the audio file that contains the given call (see selection table description above)
- selec: The selection ID of the call in the given audio file. Note that this is always 1, since each call was contained in a separate cut of original recordings (see selection table description above)
- duration: Duration of the call (seconds)
- meanfreq: The mean frequency (kHz), calculated on the frequency spectrum
- sd: The standard deviation of frequency (kHz)
- freq.median: The median frequency (kHz) that splits the frequency spectrum into two segments of equal energy
- freq.Q25: The first quartile frequency (kHz)
- freq.Q75: The third quartile frequency (kHz)
- freq.IQR: The interquartile frequency range (kHz)
- time.median: The median time (seconds) that divides the time envelope into two time intervals that contain equal energy
- time.Q25: The first quartile time (seconds)
- time.Q75: The third quartile time (seconds)
- time.IQR: The interquartile time range (seconds)
- skew: Skewness of the frequency spectrum, a measure of asymmetry (unitless)
- kurt: Kurtosis of the frequency spectrum, a measure of peakedness (unitless)
- sp.ent: Spectral entropy, or the distribution of energy of the frequency spectrum (unitless)
- time.ent: Time entropy, or how energy is distributed over time in the call (unitless)
- entropy: The product of time and spectral entropy, or spectrographic entropy (unitless)
- sfm: A measurement of spectral flatness (unitless)
- meandom: The mean dominant frequency (kHz)
- mindom: The minimum dominant frequency (kHz)
- maxdom: The maximum dominant frequency (kHz)
- dfrange: The range of the dominant frequency (kHz)
- modindx: modulation index of the dominant frequency (unitless)
- startdom: The dominant frequency at the start of the call (kHz)
- enddom: The dominant frequency at the end of the call (as measured on the spectrogram) (kHz)
- dfslope: The slope of the dominant frequency (kHz/s)
- meanpeakf: The mean peak frequency (kHz)
- peakf: The peak frequency (the frequency that displayed the highest energy, measured from the frequency spectrum) (kHz)
- MFCC_descriptiveStats.csv: This is a spreadsheet of descriptive statistics of Mel-frequency cepstral coefficients (MFCC) in .csv format for the pre-processed call dataset. The spreadsheet has 2461 rows and 181 columns. Each row is a different call. The first two columns are metadata columns, followed by 179 numeric acoustic measurements. More information about these call measurements can be found in the supplement of the accompanying manuscript. The measurements below represent values obtained by summarizing Mel-scale values per coefficient over time.
- sound.files: The sound file name of the audio file that contains the given call (see selection table description above)
- selec: The selection ID of the call in the given audio file. Note that this is always 1, since each call was contained in a separate cut of original recordings (see selection table description above)
- min.cc* : The minimum value of each of the cepstral coefficients, in which * represents each of the coefficients from 1 to 25. This set of measurements represents 25 separate columns
- max.cc* : The maximum value of each of the cepstral coefficients, in which * represents each of the coefficients from 1 to 25. This set of measurements represents 25 separate columns
- median.cc* : The median value of each of the cepstral coefficients, in which * represents each of the coefficients from 1 to 25. This set of measurements represents 25 separate columns
- mean.cc* : The mean value of each of the cepstral coefficients, in which * represents each of the coefficients from 1 to 25. This set of measurements represents 25 separate columns
- var.cc* : The variance of each of the cepstral coefficients, in which * represents each of the coefficients from 1 to 25. This set of measurements represents 25 separate columns
- skew.cc* : The skewness of each of the cepstral coefficients, in which * represents each of the coefficients from 1 to 25. This set of measurements represents 25 separate columns
- kurt.cc* : The kurtosis of each of the cepstral coefficients, in which * represents each of the coefficients from 1 to 25. This set of measurements represents 25 separate columns
- mean.d1.cc: A single column. The mean value of the first derivative of the 25 MFCC coefficients
- var.d1.cc: A single column. The variance of the first derivative of the 25 MFCC coefficients
- mean.d2.cc: A single column. The mean value of the second derivative of the 25 MFCC coefficients
- var.d2.cc: A single column. The variance of the second derivative of the 25 MFCC coefficients
- MDS_SPCC_coordinates.csv: This file in .csv format contains multidimensional scaling (MDS) coordinates obtaining by reducing the spectrographic cross-correlation (SPCC) matrix for the pre-processed call dataset. The spreadsheet has 2461 rows and 6 columns. Each row is a different call. The first 4 columns are metadata columns from the call selection table, followed by the columns X and Y that contain the first and second dimensions of the “optimal” MDS solution for the similarity matrix, respectively. More information about how these coordinates were generated and used can be found in the supplement of the accompanying manuscript
- sound.files: The sound file name of the audio file that contains the given call (see selection table description above)
- Year: The sampling year in which each call was recorded
- new_site_code: The unique 4-letter site code for the given site, standardized across sampling years
- Call_Type: The call type or variant label assigned to the given call through visual classification
- X: Numeric values for the first dimension of the multidimensional scaling solution (unitless)
- Y: Numeric values for the second dimension of the multidimensional scaling solution (unitless)
- MDS_RandomForests_coordinates.csv: This file in .csv format contains multidimensional scaling (MDS) coordinates obtaining by reducing the unsupervised random forests matrix for the pre-processed call dataset. The spreadsheet has 2461 rows and 6 columns. Each row is a different call. The first 4 columns are metadata columns from the call selection table, followed by the columns X and Y that contain the first and second dimensions of the “optimal” MDS solution for the similarity matrix, respectively. More information about how these coordinates were generated and used can be found in the supplement of the accompanying manuscript
- sound.files: The sound file name of the audio file that contains the given call (see selection table description above)
- Year: The sampling year in which each call was recorded
- new_site_code: The unique 4-letter site code for the given site, standardized across sampling years
- Call_Type: The call type or variant label assigned to the given call through visual classification
- X: Numeric values for the first dimension of the multidimensional scaling solution (unitless)
- Y: Numeric values for the second dimension of the multidimensional scaling solution (unitless)
METHODOLOGICAL INFORMATION
See the associated publication, detailed supplementary methods, and code made available on GitHub (see above) for more information on how data were collected, processed and analyzed. These materials contain information about the software and versions of software used, quality control processing, and those who contributed to this research.
This data was primarily collected by acoustic recording, and was pre-processed for quality control. We have made detailed information about data collection and processing available in the data README as well as the accompanying mansucript.
We primarily use the R software environment for analyses, and have made code to reproduce our analyses available on GitHub (see the link in the README).