README for Genomic divergence in allopatric Northern Cardinals of the North American warm deserts is linked to behavioral differentiation. Provost, Kaiya L, Mauck, William M, III, and Smith, Brian Tilston. 12 September 2018 This README file describes the files associated with the above publication. For any questions or comments, please contact Kaiya L. Provost at kprovost@amnh.org. ########################################################################################## # DATA # ########################################################################################## There are five main sections of the "DATA" portion of this README: "RAW FASTQ FILES", "STRUCTURE FILES", "FASTSIMCOAL FILES", and "PLAYBACK RESULTS". ************************************* RAW FASTQ FILES ************************************ Raw fastq files are uploaded to the Short Read Archive as submission SRP158705, samples SAMN09865710 through SAMN09865798. ************************************* STRUCTURE FILES ************************************ This includes the input STRUCTURE files, output files after processing through Structure Harvester, Clumpp, and Distruct, and results from Structure Selector. It also includes one statistics file from Pyrad. Files that include "ingroup" in the filename exclude six individuals: the one Cardinalis sinuatus, the three Cardinalis cardinalis carneus, and two Cardinalis cardinalis individuals with high missing data. 1) cardcard16.str This is the STRUCTURE input file. Individuals are given by their ID. 2) cardcard16_IDtodesert.txt This is a table relating the ID numbers (as in the "cardcard16.str" file above) to their Desert (aka, Pyrrhuloxia, Outgroup C. c. carneus, Sonoran, or Chihuahuan) and to their Catalog numbers. UWBM = University of Washington Burke Museum of Natural History. MSB = Museum of Southwestern Biology. AMCC = Ambrose Monell Cryo Collection. AMNHDOT = American Museum of Natural History Department of Ornithology TCWC = Texas A&M Biodiversity Research and Teaching Collections. Note that both AMCC and AMNH DOT numbers are given for relevant specimens. All individuals come from the Ornithology/Bird collections from each institution. See also Appendix III Table 5 in manuscript. 3) cardinalis_bothdatasets_forplotting_outgroup.csv These file describes the assignment probabilities of each individual to a cluster for K=2, K=3, K=4, and K=5. K=1 is omitted as all individuals are assigned to the same cluster. These assignments are after processing with Structure Harvester and Clumpp and as such reflect average assignments across N=5 runs. It also includes some relevant metadata helpful for plotting. 4) cardcard16_evannoTable.tab This file shows the results from the Delta K method (Evanno et al. ####) of selection between our K values as well as the log likelihoods for each run. 5) cardcard16_ingroup.str 6) cardcard16_ingroup_IDtodesert.txt 7) cardinalis_bothdatasets_forplotting_ingroup.csv 8) cardcard16_ingroup_evannoTable.tab These files correspond to Files 1-4 above with the six indicated individuals removed. Note that for the ingroup dataset we ran N=10 runs instead of N=5. 9) cardcard16_ingroup.MedK.0.5.tsv 10) cardcard16_ingroup.MedK.0.6.tsv 11) cardcard16_ingroup.MedK.0.7.tsv 12) cardcard16_ingroup.MedK.0.8.tsv 13) cardcard16_ingroup.MedK.0.9.tsv These files are outputs from Structure Selector indicating the results for testing threshold values 0.5 through 0.9. Note that Structure Selector was only run on the ingroup dataset. 14) cardcard16.stats This is not directly relevant to the STRUCTURE analysis, but rather is an output file from Pyrad describing relevant statistics of the data. ************************************ FASTSIMCOAL FILES *********************************** This includes the results from all the model selection, the results from the bootstraps of the best model, the site frequency spectra file, and the VCF file. Note that these analyses were only run on the full dataset. Scripts used to generate these files are available on the GitHub page of Isaac Overcast: https://github.com/isaacovercast/easySFS. 1) cardinalis16_fastsimcoal_modelselection_bootstraps.xlsx This file is a spreadsheet containing four tabs: "key", which gives descriptions of each header, "model_selection", which gives the parameter estimates for all six demographic models, "bootstraps", which gives the bootstrapped estimates for the best model with asymmetric gene flow, and "summary_of_bootstraps", which summarizes the median, mean, and 95% confidence interval of the bootstraps using a custom script (see "SCRIPTS"). 2) cardinalis16_MSFS.obs This file is a site frequency spectrum of the three populations modeled using fastsimcoal: Sonoran, Cardinalis cardinalis carneus, and Chihuahuan. It is projected down to 10 x 2 x 10 individuals. This file is used in all models of fastsimcoal, though it must be renamed to work with the scripts. 3) cardcard16.vcf This file is the VCF precursor to "cardinalis16_MSFS.obs". ************************************ PLAYBACK RESULTS ************************************ These are files describing the results of the playback (call broadcast) experiments. In addition, raw recording files are uploaded to Xeno-Canto as numbers XC434568‐XC434574 and XC434576‐XC434592. 1) combined_both_years_data_all_columns_for_pub.xlsx This file lists all of the raw and computed data resulting from the playback experiments. It includes both a tab of data and a key. A CSV version of this file is also included. 2) playback_latitude_longitudes.xlsx This file lists all of the latitude and longitudes associated with the points in "combined_both_years_data_all_columns_for_pub.xlsx". It includes both a tab of data and a key. *********************************** PLAYBACK RECORDINGS ********************************** These are audio files used for the playback experiments. They are sorted by which Treatment they represent. ########################################################################################## # SCRIPTS # ########################################################################################## The "SCRIPTS" section of this README contains three subsections: "FASTSIMCOAL SCRIPTS", "GLMM SCRIPTS", and "FIGURE MAKING SCRIPTS". *********************************** FASTSIMCOAL SCRIPTS ********************************** Thes files set up the models that fastcsimcoal simulates from and estimates parameters from. See the fastsimcoal2 manual for more details. Scripts used to run these files are available on the GitHub page of Isaac Overcast: https://github.com/isaacovercast/fsc2_scripts. 1) asymmetric.tpl 2) pure-isolation.tpl 3) secondary-contact.tpl 4) symmetrical-gf.tpl 5) unidirection-chi-to-son.tpl 6) unidirection-son-to-chi.tpl These six files are the template files required to set up the demographic scenarios. 7) asymmetric.est 8) pure-isolation.est 9) secondary-contact.est 10) symmetrical-gf.est 11) unidirection-chi-to-son.est 12) unidirection-son-to-chi.est These six files are the distribution files required to set up the parameter estimation windows. ************************************** GLMM SCRIPTS ************************************** These files are R scripts used to generate the generalized linear mixed models (GLMMs) for the playback experiment data. They also include scripts to run the principal components analysis, results of which are found in "combined_both_years_data_all_columns_for_pub.xlsx". 1) aggressionPrincipalComponentsAnalysis.R This script uses the raw aggression measures for the response period, performs a principal components analysis on it, and back-calculates pre-playback aggression using the resulting loadings. This script also includes the loadings and relative contribution of each principal component. Note: this analysis calculates the PCA for the entire dataset (i.e., it is not detections-only.) 2) performGLMM.R This script performs GLMMs on the PC1.SON and PC1.CHI data to test the effect of Treatment (Type, i.e. Local, Distant, Across-Barrier, Control) on aggression. It runs these models for the entire dataset as well as the detections-only (PointWork=1) dataset. ********************************** FIGURE MAKING SCRIPTS ********************************* This part of the package contains R files that make the figures in the main manuscript and the appendices (specifically Appendix III). Figures not listed here were not scripted. 1) Figure4_AppFigure1_structurePlots.R This R script generates the STRUCTURE plots, Figure 1 and Appendix III Figure 1. 2) Figure5_boxplots.R This R script generates the boxplots seen in Figure 5. 3) Figure6_distanceByTime.R This R script generates the distance-by-time plot seen in Figure 6.