Data from: Does genome-wide variation and putatively adaptive variation identify the same set of distinct populations?
Data files
Sep 17, 2024 version files 271.77 MB
-
data_and_scripts_for_publication.zip
271.74 MB
-
README.md
26.41 KB
Abstract
Identifying which populations within species to prioritize for conservation is a major challenge: one question is whether to prioritize populations based on adaptive variation versus considering genome-wide genetic variation. Many authors have advocated focusing solely on adaptive variation due to its direct connection to selection, function, and adaptive capacity. However, there are many limitations in identifying and using adaptive genetic variation for conservation. Patterns of genome-wide genetic variation may be congruent with patterns of adaptive genetic variation, and genome-wide variation is much easier to measure. However, evidence for congruence is mixed. We gather genome-wide and putatively adaptive SNP data across 34 species of plants and animals from published outlier and association studies to test congruence. We ask whether putatively adaptive subsets of genome-wide SNPs identify the same distinctive populations (measured using the Shapley Value of distinctiveness) as genome-wide SNPs. We find that genome-wide and putatively adaptive SNPs generally but variably agree on population prioritizations. As expected, the level of agreement is predicted by the proportion of putatively adaptive SNPs, and the agreement is lower when there is more overall population genetic structure. Interestingly, across our datasets, putatively adaptive SNPs do as well or better at predicting genome-wide population prioritization than sized-matched random subsets of SNPs. Taken together, using genome-wide genetic variation for population prioritization may be a generally sound and cost-effective strategy for prioritizing populations in order to safeguard species-level genetic variation.
README
This README.txt file was generated on 2024-08-27 by Avneet Kaur Chhina
GENERAL INFORMATION
- Title of Dataset: Dataset for "Does genome-wide variation and putatively adaptive variation identify the same set of distinct populations?"
- Authors: Avneet Kaur Chhina, Philippe Fernandez-Fournier, Jayme Lewthwaite, Tom R. Booker, and Arne Mooers.
- Data collection details: We collated genome-wide and putatively adaptive SNPs data from already published studies. We found outlier and/or association type studies from three meta-analyses and recommendations from colleagues and evaluated those studies based on a set of criteria and thresholds. Data were obtained from public data repositories or requested via email from the authors of studies. If data were emailed, permission was granted by the original study authors (via email communication) to provide our converted "012NA" version of their datasets.\ The three meta-analyses that included a list of studies are: a) Collin W Ahrens, Paul D Rymer, Adam Stow, Jason Bragg, Shannon Dillon, Kate DL Umbers, and Rachael Y Dudaniec. The search for loci under selection: trends, biases and progress. Molecular ecology, 27(6):1342–1356, 2018. b) Brandon M Lind, Mitra Menon, Constance E Bolte, Trevor M Faske, and Andrew J Eckert. The genomics of local adaptation in trees: are we out of the woods yet? Tree genetics & genomes, 14:1–30, 2018. c) Moises Exposito-Alonso, Tom R Booker, Lucas Czech, Lauren Gillespie, Shannon Hateley, Christopher C Kyriazis, Patricia LM Lang, Laura Leventhal, David Nogues-Bravo, Veronica Pagowski, et al. Genetic diversity loss in the anthropocene. Science, 377(6613):1431–1435, 2022.
SHARING/ACCESS INFORMATION
Here we provide references to the publications that provided data:
Study number 1. Ruegg et al., 2018; 2021
Kristen Ruegg, Rachael A Bay, Eric C Anderson, James F Saracco, Ryan J Harrigan, Mary Whitfield, Eben H Paxton, and Thomas B Smith. Ecological genomics predicts climate vulnerability in an endangered southwestern songbird. Ecology letters, 21(7):1085–1096, 2018.
Kristen Ruegg, Eric C Anderson, Marius Somveille, Rachael A Bay, Mary Whitfield, Eben H Paxton, and Thomas B Smith. Linking climate niches across seasons to assess population vulnerability in a migratory bird. Global Change Biology, 27(15):3519–3531, 2021.
We utilized "rad_wifl_clean_175_105000.rds" file for raw genome-wide SNPs, and Tables S3, S4, and S5 to obtain information on raw putatively adaptive SNPs.
Dataset publicly available: https://github.com/eriqande/ruegg-et-al-wifl-genoscapemake-the-genoscape.rmdv
Study number 2. Eimanifar et al., 2018
Amin Eimanifar, Samantha A Brooks, Tomas Bustamante, and James D Ellis. Population genomics and morphometric assignment of western honey bees (Apis mellifera l.) in the Republic of South Africa. BMC Genomics, 19:1–26, 2018.
We utilized "Honey bee GBS.vcf" for raw genome-wide SNPS and Tables 4, 5, and 6 to obtain raw putatively adaptive SNPs.
Dataset publicly available: https://doi.org/10.5061/dryad.98jh446
Study number 3. White et al., 2013
Thomas A White, Sarah E Perkins, Gerald Heckel, and Jeremy B Searle. Adaptive evolution during an ongoing range expansion: the invasive bank vole (Myodes glareolus) in Ireland. Molecular Ecology, 22(11):2971–2985, 2013.
We utilized "FOR_DRYAD.xlsx" for raw genome-wide SNPs file and Table 2 to obtain putatively adaptive SNPs.
Dataset publicly available: https://doi.org/10.5061/dryad.fb782
Study number 4. Schweizer et al., 2016
Rena M Schweizer, Jacqueline Robinson, Ryan Harrigan, Pedro Silva, Marco Galverni, Marco Musiani, Richard E Green, John Novembre, and Robert K Wayne. Targeted capture and resequencing of 1040 genes reveal environmentally driven functional variation in grey wolves. Molecular Ecology, 25(1):357–379, 2016.
We used "Variant file in VCF format" as raw-genome-wide SNP data file and raw putatively adaptive SNPs were obtained from Table 1 and 2 and Figure 4. We used "AllSamples_n107_EnvData_wLatLong" for population information.
Dataset publicly available: https://doi.org/10.5061/dryad.8g0s3
Study number 5. Mosca et al., 2016
Elena Mosca, Felix Gugerli, Andrew J Eckert, and David B Neale. Signatures of natural selection on Pinus cembra and P. mugo along elevational gradients in the Alps. Tree Genetics & Genomes, 12:1–15, 2016.
Raw data is also from this study:
Elena Mosca, AJ Eckert, EA Di Pierro, D Rocchini, Nicola La Porta, Piero Belletti, and DB Neale. The geographical and environmental determinants of genetic diversity for four alpine conifers of the European Alps. Molecular Ecology, 21(22):5530–5545, 2012.
We used "Pice_matric_459" for raw genome-wide P.cembra data and "Pimg_matrix_694" for P. mugo raw genome-wide data. Information on populations and putatively adaptive SNPs was obtained from the main paper and the supplementary files.
This folder just examines P.mugo and folder number 24 examines P.cembra.
Dataset publicly available: https://doi.org/10.5061/dryad.tm33d
Study number 6. Mckown et al., 2014
Athena D McKown, Jaroslav Klápště, Robert D Guy, Armando Geraldes, Ilga Porth, Jan Hannemann, Michael Friedmann, Wellington Muchero, Gerald A Tuskan, Jürgen Ehlting, et al. Genome-wide association implicates numerous genes underlying ecological trait variation in natural populations of Populus trichocarpa. New Phytologist, 203(2):535–553, 2014.
We got population information from Geraldes et al., 2014
Armando Geraldes, Nima Farzaneh, Christopher J Grassa, Athena D McKown, Robert D Guy, Shawn D Mansfield, Carl J Douglas, and Quentin CB Cronk. Landscape genomics of Populus trichocarpa: the role of hybridization, limited gene flow, and natural selection in shaping patterns of population structure. Evolution, 68(11):3260–3280, 2014.
Data is in supplementary files of these two papers.
Raw genome-wide dataset and putatively adaptive SNPs are in the supplementary files of Mckown et al., 2014 and population data came from supplementary files of Geraldes et al., 2014
Study number 7. Royer et al., 2016
Anne M Royer, Matthew A Streisfeld, and Christopher Irwin Smith. Population genomics of divergence within an obligate pollination mutualism: Selection maintains differences between Joshua tree species. American Journal of Botany, 103(10):1730–1741, 2016.
Dataset publicly available: https://doi.org/10.5061/dryad.7pj4t
We used "TasselAllGenotypes" for raw genome-wide data file and Table 3 and Appendices S7, S8, S10 for raw putatively adaptive SNPs.
Study number 8. Christmas et al., 2016
Matthew J Christmas, Ed Biffin, Martin F Breed, and Andrew J Lowe. Finding needles in a genomic haystack: targeted capture identifies clear signatures of selection in a nonmodel plant species. Molecular Ecology, 25(17):4216–4233, 2016.
We used "mec13750-sup-0007-AppendixS1.vcf" file (found in supplementary material of the paper) for raw genomewide data file and TableS3 and S4 for putatively adaptive SNPs. Dr. Christmas and Dr. Lowe emailed the population data and granted permission to provide our converted version of their datasets.
Study number 9. Benestan et al., 2016
Laura Benestan, Brady K Quinn, Halim Maaroufi, Martin Laporte, Fraser K Clark, Spencer J Greenwood, Rémy Rochette, and Louis Bernatchez. Seascape genomics provides evidence for thermal adaptation and current-mediated population structure in American lobster (Homarus americanus). Molecular Ecology, 25(20):5073–5092, 2016.
Data publicly available: https://doi.org/10.5061/dryad.5vb8v
We used "13688snps-562individus.recode.vcf" for raw genome-wide SNP data and "28snps-562ind-freq.frq.csv" for raw putatively outlier SNPs.
Study number 10. Roffler et al., 2016
Gretchen H Roffler, Stephen J Amish, Seth Smith, TED Cosart, Marty Kardos, Michael K Schwartz, and Gordon Luikart. SNP discovery in candidate adaptive genes using exon capture in a free-ranging Alpine ungulate. Molecular Ecology Resources,16(5):1147–1164, 2016.
We used "SNPdata.txt" for raw genome-wide data and Table 3 for putatively adaptive SNPs data.
Data publicly available: https://doi.org/10.5061/dryad.kk466
Study number 11. Babin et al., 2017
Charles Babin, Pierre-Alexandre Gagnaire, Scott A Pavey, and Louis Bernatchez. RAD-seq reveals patterns of additive polygenic variation caused by spatially-varying selection in the American eel (Anguilla rostrata). Genome Biology and Evolution, 9(11):2974–2986, 2017.
The authors emailed us their data and with permission we provide our conversion of their data. We obtained putatively outlier SNPs from supplementary files.
Study number 12. Guo et al., 2016
Baocheng Guo, Di Lu, Wen Bo Liao, and Juha Merilä. Genomewide scan for adaptive differentiation along altitudinal gradient in the Andrew’s toad Bufo andrewsi. Molecular Ecology, 25(16):3884–3900, 2016.
We used "SNP_identified_by_PoPoolation2_in_GenePop_format" for raw genome-wide data file and the authors emailed us outlier SNPs data. Dr. Guo granted us permission to provide the outlier SNPs data.
Data is publicly available on : https://doi.org/10.5061/dryad.n70c7
Study number 13. De kort et al., 2014
Hanne De Kort, Katrien Vandepitte, Hans Henrik Bruun, Déborah Closset-Kopp, Olivier Honnay, and Joachim Mergeay. Landscape genomics and a common garden trial reveal adaptive differentiation to temperature across Europe in the tree species Alnus glutinosa. Molecular Ecology, 23(19):4709–4721, 2014.
For raw genome-wide data, we used "Genalex Alnus" file and for putatively outlier SNPs, we used information from Table 2.
Data is available on: https://doi.org/10.5061/dryad.rg82f
Study number 14. Hurel et al., 2021
Agathe Hurel, Marina de Miguel, Cyril Dutech, Marie-Laure Desprez-Loustau, Christophe Plomion, Isabel Rodríguez-Quilón, Agathe Cyrille, Thomas Guzman, Ricardo Alía, Santiago C González-Martínez, et al. Genetic basis of growth, spring phenology, and susceptibility to biotic stressors in maritime pine. Evolutionary Applications, 14(12):2750–2772, 2021.
We used "6100SNPs_520Pinus_pinaster_33populations.csv" file for raw genome-wide data and main paper tables and supplementary material files for information on putatively adaptive SNPs.
Data is available on: https://doi.org/10.5061/dryad.r4xgxd2df
Study number 15. Depardieu et al., 2021
Claire Depardieu, Sébastien Gérardi, Simon Nadeau, Geneviève J Parent, John Mackay, Patrick Lenz, Manuel Lamothe, Martin P Girardin, Jean Bousquet, and Nathalie Isabel. Connecting tree-ring phenotypes, genetic associations and transcriptomics to decipher the genomic architecture of drought adaptation in a widespread conifer. Molecular Ecology, 30(16):3898–3917, 2021.
We used "genotypes_Depardieu_2020.txt" for raw genome-wide data and main paper tables and supplementary files for putatively adaptive SNPs.
Data available on: Dryad Digital Repository [https://doi.org/10.5061/dryad.6rd6f] and on Github [https://github.com/ClaireDepardieu/Genetic_basis_drought]
Study number 16. Chen et al., 2012
Jun Chen, Thomas Källman, Xiaofei Ma, Niclas Gyllenstrand, Giusi Zaina, Michele Morgante, Jean Bousquet, Andrew Eckert, Jill Wegrzyn, David Neale, et al. Disentangling the roles of history and local selection in shaping clinal variation of allele frequencies and gene expression in Norway spruce (Picea abies). Genetics, 191(3):865– 881, 2012.
We used "genotype_pabies445.txt" for raw genomewide data, and Table 2 for information on putatively adaptive SNPs.
Data available on: https://doi.org/10.5061/dryad.82201
Study number 17. Xuereb et al., 2022
Amanda Xuereb, Quentin Rougemont, Xavier Dallaire, Jean-Sébastien Moore, Eric Normandeau, Bérénice Bougas, Alysse Perreault-Payette, Ben F Koop, Ruth Withler, Terry Beacham, et al. Re-evaluating coho salmon (Oncorhynchus kisutch) conservation units in Canada using genomic data. Evolutionary Applications, 15(11):1925–1944, 2022.
We only analyzed Thompson River data for this study. We combined "Thompson_coho_inds_filtered_neutral.recode."Thompson_coho_inds_filtered_GEAcombined_outliers.recode.vcf" files to create genomewide SNP data file and we used "Thompson_coho_inds_filtered_GEAcombined_outliers.recode.vcf" file for putatively adaptive SNPS file.
Data available on: https://doi.org/10.5061/dryad.r4xgxd2gx
Study number 18. Holliday et al., 2010
Jason A Holliday, Kermit Ritland, and Sally N Aitken. Widespread, ecologically relevant genetic markers developed from association mapping of climate-related traits in Sitka spruce (Picea sitchensis). New Phytologist, 188(2):501–514, 2010.
The authors emailed us their data and granted permission to provide 0,1,2,NA (our version) of their data.
Study number 19. Flanagan et al., 2021
Sarah P Flanagan, Emily Rose, and Adam G Jones. The population genomics of repeated freshwater colonizations by gulf pipefish. Molecular Ecology, 30(7):1672–1687, 2021.
We used "converted_subset.vcf" for raw genome-wide data, "fw_SNPinfo_noFL" for putatively adaptive SNPs.
Data available on: https://doi.org/10.5061/dryad.12jm63xvh
Study number 20. Bay et al., 2018
Rachael A Bay, Ryan J Harrigan, Vinh Le Underwood, H Lisle Gibbs, Thomas B Smith, and Kristen Ruegg. Genomic signals of selection predict climate-driven population declines in a migratory bird. Science, 359(6371):83–86, 2018.
The authors emailed us their data and granted permission to provide the 0,1,2,NA converted version of their data.
Study number 21. Chavez-Galarza et al. 2013
Julio Chávez-Galarza, Dora Henriques, J Spencer Johnston, João C Azevedo, John C Patton, Irene Muñoz, Pilar De la Rúa, and M Alice Pinto. Signatures of selection in the Iberian honey bee (Apis mellifera iberiensis) revealed by a genome scan analysis of single nucleotide polymorphisms. Molecular Ecology, 22(23):5890–5907, 2013.
Dr.Pinto also emailed us data files and granted permission to post our converted version of their data files. We used emailed dataset "Dataset SNPs iberia_coord.csv" for genome-wide SNPs and Tables 1 and S1 for information on putatively adaptive SNPs.
Data available on: https://doi.org/10.5061/dryad.1kk2k
Study number 22. Keller et al., 2018
Stephen R Keller, Vikram E Chhatre, and Matthew C Fitzpatrick. Influence of range position on locally adaptive gene–environment associations in Populus flowering time genes. Journal of Heredity, 109(1):47–58, 2018.
We used "Keller_etal_2017_JoH_SNPdata_DRYAD" for raw genome-wide data file, and combined information on putatively adaptive SNPs from table 2, Figure 6, Supplementary files.
Data available on: https://doi.org/10.5061/dryad.gp78p
Study number 23. Funk et al., 2016
W Chris Funk, Robert E Lovich, Paul A Hohenlohe, Courtney A Hofman, Scott A Morrison, T Scott Sillett, Cameron K Ghalambor, Jesus E Maldonado, Torben C Rick, Mitch D Day, et al. Adaptive divergence despite strong genetic drift: genomic analysis of the evolutionary mechanisms causing genetic differentiation in the island fox (Urocyon littoralis). Molecular Ecology, 25(10):2176–2194, 2016.
We used "GP_NO_grays.txt" for raw genome-wide data and Table S1 for information on putatively adaptive SNPs.
Dryad repository: https://doi.org/10.5061/dryad.2kn1v
Study number 24. Mosca et al., 2016 P. cembra
Elena Mosca, Felix Gugerli, Andrew J Eckert, and David B Neale. Signatures of natural selection on Pinus cembra and P. mugo along elevational gradients in the Alps. Tree Genetics & Genomes, 12:1–15, 2016.
Raw data is also from this study:
Elena Mosca, AJ Eckert, EA Di Pierro, D Rocchini, Nicola La Porta, Piero Belletti, and DB Neale. The geographical and environmental determinants of genetic diversity for four alpine conifers of the European Alps. Molecular Ecology, 21(22):5530–5545, 2012.
We used "Pice_matric_459" for raw genome-wide P.cembra data and "Pimg_matrix_694" for P. mugo raw genome-wide data. Information on populations and putatively adaptive SNPs was obtained from the main paper and the supplementary files.
Dataset publicly available: https://doi.org/10.5061/dryad.tm33d
Study number 25. Candy et al., 2015
John R Candy, Nathan R Campbell, Matthew H Grinnell, Terry D Beacham,Wesley A Larson, and Shawn R Narum. Population differentiation determined from putative neutral and divergent adaptive genetic markers in Eulachon (Thaleichthys pacificus, Osmeridae), an anadromous Pacific smelt. Molecular Ecology Resources, 15(6):1421–1434, 2015.
We combined "EulachonSig.gen" and "EulachonNonSig.gen" files to create genome-wide SNP data file and used "EulachonSig.gen" file for putatively adaptive SNPs data.
Data available on: https://doi.org/10.5061/dryad.1797v
Study number 26. Dallaire et al., 2021
Xavier Dallaire, Éric Normandeau, Julien Mainguy, Jean-Éric Tremblay, Louis Bernatchez, and Jean-Sébastien Moore. Genomic data support management of anadromous Arctic Char fisheries in Nunavik by highlighting neutral and putatively adaptive genetic variation. Evolutionary Applications, 14(7):1880–1897, 2021.
Dr. Dallaire emailed us genome-wide SNPs, putatively adaptive SNPs, and population data and granted permission to provide our converted version of their dataset.
Study number 27. Hess et al., 2013
Jon E Hess, Nathan R Campbell, David A Close, Margaret F Docker, and Shawn R Narum. Population genomics of Pacific lamprey: adaptive variation in a highly dispersive species. Molecular Ecology, 22(11):2898–2916, 2013.
We used "Lamprey_4439Loci518IndGP_DRYAD" for genome-wide SNP data and Table 3 and supplementary files for putatively adaptive SNP data.
Data available on: https://doi.org/10.5061/dryad.nd853
Study number 28. Milano et al., 2014
Ilaria Milano, Massimiliano Babbucci, Alessia Cariani, Miroslava Atanassova, Dorte Bekkevold, Gary R Carvalho, Montserrat Espiñeira, Fabio Fiorentino, Germana Garofalo, Audrey J Geffen, et al. Outlier SNP markers reveal fine-scale genetic structuring across European hake populations (Merluccius merluccius). Molecular Ecology, 23(1):118–135, 2014.
We used "Milano et al_Merluccius merluccius SNP data" for genome-wide SNP data and Table 2 for putatively adaptive SNPs.
Data available on: https://doi.org/10.5061/dryad.7bn22.
Study number 29. Swaegers et al., 2015
J Swaegers, J Mergeay, A Van Geystelen, L Therry, MHD Larmuseau, and R Stoks. Neutral and adaptive genomic signatures of rapid poleward range expansion. Molecular Ecology, 24(24):6163–6176, 2015.
Dr. Swaegers emailed us the data. We used "filtered_SNPs_Coenagrion_scitulum.vcf" file for genome-wide data (emailed by Dr. Swaegers) and information from main paper tables, supplementary and Dryad files for putatively adaptive SNPs.
Data available: https://doi.org/10.5061/dryad.n0hk7
Study number 30. Moore et al., 2014
Jean-Sébastien Moore, Vincent Bourret, Mélanie Dionne, Ian Bradbury, Patrick O’Reilly, Matthew Kent, Gérald Chaput, and Louis Bernatchez. Conservation genomics of anadromous Atlantic salmon across its North American range: outlier loci identify the same patterns of population structure as neutral loci. Molecular Ecology, 23(23):5680–5697, 2014.
We used "Ssalar_SNPall_genepop_28Apr2014" file for genome-wide SNPs data and "Ssalar_SNP_Bayescan_Regions_positive_genepop_15Apr2014.txt", "Ssalar_SNP_ArlequinHier_Fct0.01_genepop_28Apr2014.txt", and "Ssalar_SNP_nested_hierFdist_29Apr2014.txt" files for putatively adaptive SNPs.
Data available: https://doi.org/10.5061/dryad.sb601
Study number 31. Mahony et al., 2020
Colin R Mahony, Ian R MacLachlan, Brandon M Lind, Jeremy B Yoder, TongliWang, and Sally N Aitken. Evaluating genomic data for management of local adaptation in a changing climate: A lodgepole pine case study. Evolutionary Applications, 13(1):116–131, 2020.
We used "Pine_AllNatural_GCandTotemIndivs_GWAS_SNPs_June8th2019" for genome-wide SNPs and we used putatively adaptive SNPs from Fernandez-Fournier et al., 2021 analysis.
Data available: https://doi.org/10.5061/dryad.56j8vq8 and https://github.com/philippeff/PopulationPrioritization.
Study number 32. He et al., 2016
Tianhua He, Haylee D’Agui, Sim Lin Lim, Neal J Enright, and Yiqi Luo. Evolutionary potential and adaptation of Banksia attenuata (Proteaceae) to climate and fire regime in southwestern Australia, a global biodiversity hotspot. Scientific Reports, 6(1):26315,2016.
Dr. He kindly emailed us the datasets and granted permission to provide our converted version of their dataset.
We combined "All samples balance SNPs genepop", "All samples neutral SNP genepop", and "All samples selection SNP genepop" files for genome-wide SNPs.
Study number 33. Li et al., 2021
Yanping Li, Christopher P Burridge, Yunyun Lv, and Zuogang Peng. Morphometric and population genomic evidence for species divergence in the Chimarrichthys fish complex of the Tibetan Plateau. Molecular Phylogenetics and Evolution, 159:107117, 2021.
Dr. Peng emailed us the data and kindly provided permission to provide our converted version of their datasets. We used "snp.csv" for genome-wide data and used files under "arlequin-lositan-selected-id" for putatively adaptive SNPs.
Study number 34. Cullingham et al., 2014
Catherine I Cullingham, Janice EK Cooke, and David W Coltman. Cross-species outlier detection reveals different evolutionary pressures between sister species. New Phytologist, 204(1):215–229, 2014.
Catherine I Cullingham, Janice EK Cooke, and David W Coltman. Effects of introgression on the genetic population structure of two ecologically and economically important conifer species: lodgepole pine (Pinus contorta var. latifolia) and jack pine (Pinus banksiana). Genome, 56(10):577–585, 2013.
Dr. Cullingham kindly emailed us the data and granted permission to provide our converted conversion of their datasets on Dryad. We only used Jackpine data.
We used "JackSNPDataCullinghametal2014.csv" for genome-wide SNPs file and Table 3, 4, and S1 for putatively adaptive SNPs.
DATA & FILE OVERVIEW
In the folder: data_and_scripts_for_publication, you will find 34 folders. Each folder is labelled as study citation in this format: "Study number.authors_study publication year"
In each of the 34 study folders, you will find four folders: "raw_data
", "processed_data
", "results
", and "r_files
".
For cases where files are publicly available in a data repository provided by the authors you will find the data files in the "raw_data
" folder (only the ones we used for our analysis). We also provide a link to the file in the relevant r-script. The genome-wide and putatively adaptive SNPs files we used (after applying 10% missing rate cut-off) are in the "processed_data
" folder.
For cases where authors emailed us data files and provided permission to share our "012NA" converted version of these files (via email) you will find genome-wide and putatively adaptive SNPs files in the "processed_data
" folder. In these cases, we provide files both before and after applying our 10% missing rate cut-off.
In the "results
" folder you will find three folders: "all_snps
", "adaptive_snps
", and "all_adaptive_snps
". The "all_snps
" folder contain a single file with populations, their Shapley values, and their rankings based on that Shapley Value using genome-wide SNPs. The "adaptive_snps
" folder contains a similar file based on putatively adaptive SNPs and the "all_adaptive_snps
" folder contain a single file based on combining genome-wide and putatively adaptive SNPs.
In the "r_files
" folder you will find five or (in some cases where genome-wide files were too big to run PCA) six r scripts:
"01_data_organization_conversion_and_missing_rate_cutoff" organizes genome-wide and putatively adaptive SNPs datasets, converts these into a standard 012NA format, and performs 10% missing rate cut-off.
"02a_pca_for_cedar_script.R" contains code to perform PCA on genome-wide data. This script is for a computer cluster. You will only find this file for studies with big datasets.
"02b_all_adaptive_snps_analysis_correlation_analysis.R" contains code for comparing correlations between genome-wide and putatively adaptive SNPs.
"03_comparing_expected_and_observed_correlation.R" samples random subset of genome-wide SNPs (equal sized as number of putatively adaptive SNPs) via bootstrap and compares the expected correlation (correlation between random and genome-wide SNPs) with the observed correlation (correlation between adaptive and genome-wide SNPs).
"04_study_wide_FST_calculation.R" calculates study-wide FST.
"05_allele_frequency_calculation.R" calculates and compares allele frequencies of neutral, random, and putatively adaptive sets of genome-wide SNPs.
We provide detail annotation of R code within each R file.
Please note: Due to Dryad submission guidelines, we removed latitude, Longitude, and elevation coordinates information from the raw files.
Methods
We collated genome-wide and putatively adaptive SNPs data from already published studies. We found outlier and/ore association type studies from three meta-analysis (Ahrens et al., 2018; Lind et al., 2018; Exposito-Alonso et al., 2022) and recommendations from colleagues and evaluated those studies based on a set criteria and thresholds. Data was obtained from public data repository or requested via email from the authors of studies. If data was emailed, permission was granted by the original study authors (via email communication) to provide the 012NA converted version of their datasets.