Anolis carolinensis character displacement SNP
Data files
Jun 06, 2022 version files 19.36 MB
-
94_44K_ALL_V.vcf
19.25 MB
-
README.txt
993 B
-
Supplement_215_outliers_details.txt
51.10 KB
-
Supplemental_44K_SNPs_details.txt
50.67 KB
-
Supplemental_islands_descripiton.txt
644 B
-
Supplemental_Sample_description.txt
4.51 KB
Jan 27, 2023 version files 32.55 MB
-
94_44K_ALL_V.vcf
19.25 MB
-
README_2023.txt
2.05 KB
-
Sup_excel_Table_4_44K_SNP_Summary.xlsx
13.19 MB
-
Supplement_215_outliers_details.txt
51.10 KB
-
Supplemental_44K_SNPs_details.txt
50.67 KB
-
Supplemental_islands_descripiton.txt
644 B
-
Supplemental_Sample_description.txt
4.51 KB
Abstract
Here are six files that provide details for all 44,120 identified single nucleotide polymorphisms (SNPs) or the 215 outlier SNPs associated with the evolution of rapid character displacement among replicate islands with (2Spp) and without competition (1Spp) between two Anolis species. On 2Spp islands, A. carolinensis occurs higher in trees and have evolved larger toe pads. Among 1Spp and 2Spp island populations, we identify 44,120 SNPs, with 215-outlier SNPs with improbably large FST values, low nucleotide variation, greater linkage than expected, and these SNPs are enriched for animal walking behavior. Thus, we conclude that these 215-outliers are evolving by natural selection in response to the phenotypic convergent evolution of character displacement. There are two, non-mutually exclusive perspective of these nucleotide variants. One is character displacement is convergent: all 215 outlier SNPs are shared among 3 out of 5 2Spp island and 24% of outlier SNPS are shared among all five out of five 2Spp island. Second, character displacement is genetically redundant because the allele frequencies in one or more 2Spp are similar to 1Spp islands: among one or more 2Spp islands 33% of outlier SNPS are within the range of 1Spp MiAF and 76% of outliers are more similar to 1Spp island than mean MiAF of 2Spp islands. Focusing on convergence SNP is scientifically more robust, yet it distracts from the perspective of multiple genetic solutions that enhances the rate and stability of adaptive change.
The six files include: a description of eight islands, details of 94 individuals, and four files on SNPs. The four SNP files include the VCF files for 94 individuals with 44KSNPs and two files (Excel sheet/tab-delimited file) with FST, p-values and outlier status for all 44,120 identified single nucleotide polymorphisms (SNPs) associated with the evolution of rapid character displacement. The sixth file is a detailed file on the 215 outlier SNPs.
Complete sequence data is available at Bioproject PRJNA833453, which including samples not included in this study. The 94 individuals used in this study are described in “Supplemental_Sample_description.txt”
Anoles and genomic DNA:
Tissue or DNA for 160 Anolis carolinensis and 20 A. sagrei samples were provided by the Museum of Comparative Zoology at Harvard University (Table S2). Samples were previously used to examine evolution of character displacement in native A. carolinensis following invasion by A. sagrei onto man-made spoil islands in Mosquito Lagoon Florida (Stuart et al. 2014). One hundred samples were genomic DNAs, and 80 samples were tissues (terminal tail clip, Table S2).
Genomic DNA was isolated from 80 of 160 A. carolinensis individuals (MCZ, Table S2) using a custom SPRI magnetic bead protocol (Psifidi et al. 2015). Briefly, after removing ethanol, tissues were placed in 200 ul of GH buffer (25 mM Tris- HCl pH 7.5, 25 mM EDTA, , 2M GuHCl Guanidine hydrochloride, G3272 SIGMA, 5 mM CaCl2, 0.5% v/v Triton X-100, 1% N-Lauroyl-Sarcosine) with 5% per volume of 20 mg/ml proteinase K (10 ul/200 ul GH) and digested at 55º C for at least 2 hours. After proteinase K digestion, 100 ul of 0.1% carboxyl-modified Sera-Mag Magnetic beads (Fisher Scientific) resuspended in 2.5 M NaCl, 20% PEG were added and allowed to bind the DNA. Beads were subsequently magnetized and washed twice with 200 ul 70% EtOH, and then DNA was eluted in 100 ul 0.1x TE (10 mM Tris, 0.1 mM EDTA). All DNA samples were gel electrophoresed to ensure high molecular mass and quantified by spectrophotometry and fluorescence using Biotium AccuBlueTM High Sensitivity dsDNA Quantitative Solution according to manufacturer’s instructions.
Genotyping-by-sequencing (GBS) libraries were prepared using a modified protocol after Elshire et al. (Elshire et al. 2011). Briefly, high-molecular-weight genomic DNA was aliquoted and digested using ApeKI restriction enzyme. Digests from each individual sample were uniquely barcoded, pooled, and size selected to yield insert sizes between 300-700 bp (Borgstrom et al. 2011). Pooled libraries were PCR amplified (15 cycles) using custom primers that extend into the genomic DNA insert by 3 bases (CTG). Adding 3 extra base pairs systematically reduces the number of sequenced GBS tags, ensuring sufficient sequencing depth. The final library had a mean size of 424 bp ranging from 188 to 700 bp .
Anolis SNPs:
Pooled libraries were sequenced on one lane on the Illumina HiSeq 4000 in 2x150 bp paired-end configuration, yielding approximately 459 million paired-end reads ( ~138 Gb). The medium Q-Score was 42 with the lower 10% Q-Scores exceeding 32 for all 150 bp. The initial library contained 180 individuals with 8,561,493 polymorphic sites. Twenty individuals were Anolis sagrei, and two individuals (Yan 1610 & Yin 1411) clustered with A. sagrei and were not used to define A. carolinesis’ SNPs.
Anolis carolinesis reads were aligned to the Anolis carolinensis genome (NCBI RefSeq accession number:/GCF_000090745.1_AnoCar2.0). Single nucleotide polymorphisms (SNPs) for A. carolinensis were called using the GBeaSy analysis pipeline (Wickland et al. 2017) with the following filter settings: minimum read length of 100 bp after barcode and adapter trimming, minimum phred-scaled variant quality of 30 and minimum read depth of 5. SNPs were further filtered by requiring SNPs to occur in > 50% of individuals, and 66 individuals were removed because they had less than 70% of called SNPs. These filtering steps resulted in 51,155 SNPs among 94 individuals. Final filtering among 94 individuals required all sites to be polymorphic (with fewer individuals, some sites were no longer polymorphic) with a maximum of 2 alleles (all are bi-allelic), minimal allele frequency 0.05, and He that does not exceed HWE (FDR <0.01). SNPs with large He were removed (2,280 SNPs). These SNPs with large significant heterozygosity may result from aligning paralogues (different loci), and thus may not represent polymorphisms. No SNPs were removed with low He (due to possible demography or other exceptions to HWE). After filtering, 94 individual yielded 44,120 SNPs. Thus, the final filtered SNP data set was 44K SNPs from 94 indiviuals.
Statistical Analyses:
Eight A. carolinensis populations were analyzed: three populations from islands with native species only (1Spp islands) and 5 populations from islands where A. carolinesis co-exist with A. sagrei (2Spp islands, Table 1, Table S1). Most analyses pooled the three 1Spp islands and contrasted these with the pooled five 2Spp islands.
Two approaches were used to define SNPs with unusually large allele frequency differences between 1Spp and 2Spp islands: 1) comparison of FST values to random permutations and 2) a modified FDIST approach to identify outlier SNPs with large and statistically unlikely FST values.
Random Permutations:
FST values were calculated in VCFTools (version 4.2, (Danecek et al. 2011)) where the p-value per SNP were defined by comparing FST values to 1,000 random permutations using a custom script (below). Basically, individuals and all their SNPs were randomly assigned to one of eight islands or to 1Spp versus 2Spp groups. The sample sizes (55 for 2Spp and 39 for 1Spp islands) were maintained. FST values were re-calculated for each 1,000 randomizations using VCFTools.
Modified FDIST:
To identify outlier SNPs with statistically large FST values, a modified FDIST (Beaumont and Nichols 1996) was implemented in Arlequin (Excoffier et al. 2005). This modified approach applies 50,000 coalescent simulations using hierarchical population structure, in which demes are arranged into k groups of d demes and in which migration rates between demes are different within and between groups. Unlike the finite island models, which have led to large frequencies of false positive because populations share different histories (Lotterhos and Whitlock 2014), the hierarchical island model avoids these false positives by avoiding the assumption of similar ancestry (Excoffier et al. 2009).
References
Beaumont, M. A. and R. A. Nichols. 1996. Evaluating loci for use in the genetic analysis of population structure. P Roy Soc B-Biol Sci 263:1619-1626.
Borgstrom, E., S. Lundin, and J. Lundeberg. 2011. Large scale library generation for high throughput sequencing. PLoS One 6:e19119.
Bradbury, P. J., Z. Zhang, D. E. Kroon, T. M. Casstevens, Y. Ramdoss, and E. S. Buckler. 2007. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23:2633-2635.
Cingolani, P., A. Platts, L. Wang le, M. Coon, T. Nguyen, L. Wang, S. J. Land, X. Lu, and D. M. Ruden. 2012. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6:80-92.
Danecek, P., A. Auton, G. Abecasis, C. A. Albers, E. Banks, M. A. DePristo, R. E. Handsaker, G. Lunter, G. T. Marth, S. T. Sherry, G. McVean, R. Durbin, and G. Genomes Project Analysis. 2011. The variant call format and VCFtools. Bioinformatics 27:2156-2158.
Earl, D. A. and B. M. vonHoldt. 2011. Structure Harvester: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conservation Genet Resour 4:359-361.
Elshire, R. J., J. C. Glaubitz, Q. Sun, J. A. Poland, K. Kawamoto, E. S. Buckler, and S. E. Mitchell. 2011. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS One 6:e19379.
Evanno, G., S. Regnaut, and J. Goudet. 2005. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol Ecol 14:2611-2620.
Excoffier, L., T. Hofer, and M. Foll. 2009. Detecting loci under selection in a hierarchically structured population. Heredity 103:285-298.
Excoffier, L., G. Laval, and S. Schneider. 2005. Arlequin (version 3.0): An integrated software package for population genetics data analysis. Evol Bioinform 1:47-50.
Glaubitz, J. C., T. M. Casstevens, F. Lu, J. Harriman, R. J. Elshire, Q. Sun, and E. S. Buckler. 2014. TASSEL-GBS: a high capacity genotyping by sequencing analysis pipeline. PLoS One 9:e90346.
Jombart, T. 2008. adegenet: a R package for the multivariate analysis of genetic markers. Bioinformatics 24:1403-1405.
Jombart, T. and I. Ahmed. 2011. adegenet 1.3-1: new tools for the analysis of genome-wide SNP data. Bioinformatics.
Jombart, T., S. Devillard, and F. Balloux. 2010. Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. Bmc Genetics 11.
Lotterhos, K. E. and M. C. Whitlock. 2014. Evaluation of demographic history and neutral parameterization on the performance of FST outlier tests. Mol Ecol 23:2178-2192.
Mi, H., A. Muruganujan, and P. D. Thomas. 2012. PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Research 41:D377-D386.
Porras-Hurtado, L., Y. Ruiz, C. Santos, C. Phillips, A. Carracedo, and M. V. Lareu. 2013. An overview of STRUCTURE: applications, parameter settings, and supporting software. Front Genet 4:98.
Pritchard, J. K., M. Stephens, and P. Donnelly. 2000. Inference of population structure using multilocus genotype data. Genetics 155:945-959.
Psifidi, A., C. I. Dovas, G. Bramis, T. Lazou, C. L. Russel, G. Arsenos, and G. Banos. 2015. Comparison of eleven methods for genomic DNA extraction suitable for large-scale whole-genome genotyping and long-term DNA banking using blood samples. PLoS One 10:e0115960.
Ramasamy, R., S. Ramasamy, B. Bindroo, and V. Naik. 2014. STRUCTURE PLOT: a program for drawing elegant STRUCTURE bar plots in user friendly interface. SpringerPlus 3:431.
Stuart, Y. E., T. S. Campbell, P. A. Hohenlohe, R. G. Reynolds, L. J. Revell, and J. B. Losos. 2014. Rapid evolution of a native species following invasion by a congener. Science 346:463-466.
Thomas, P. D. 2003. PANTHER: A Library of Protein Families and Subfamilies Indexed by Function. Genome Research 13:2129-2141.
Wickland, D. P., G. Battu, K. A. Hudson, B. W. Diers, and M. E. Hudson. 2017. A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy. BMC Bioinformatics 18:586.
Original sample (tissue or DNA) are available at MCZ and specific individuals and their catalogue number are described in Table S2 on journal site or here in table “Supplemental_Sample_description.txt”.
Original sequence data are available at NCBI Bioproject PRJNA833453. Sequences are names as “Spp_Island_Status_MCZ#_sex “ where “Spp” is Ac or As (Ac = Anolis carolinensis, As = A. sagrei), islands are as defined in Figure 1 and Supplemental Table 1 & 2 (or here as “Supplemental_islands_descripiton.txt” and “Supplemental_Sample_description.txt”)