Data for: Genetic structuring and species boundaries in the Atlantic stony coral Favia (Scleractinia, Faviidae)
Data files
Dec 07, 2023 version files 6.54 GB
-
favia_coral_20missing.vcf
-
popmap.csv
-
README.md
-
TotalRawSNPs.vcf
Abstract
Scleractinian corals are the main modern builders of coral reefs, which are major hot spots of marine biodiversity. Southern Atlantic reef corals are understudied compared to their Caribbean and Indo-Pacific counterparts and many hypotheses about their population dynamics demand further testing. We employed thousands of single nucleotide polymorphisms (SNPs) recovered via ezRAD to characterize genetic population structuring and species boundaries in the amphi-Atlantic hard coral genus Favia. Coalescent-based species delimitation (BFD* - Bayes Factor Delimitation) recovered F. fragum and F. gravida as separate species. Although our results agree with depth-related genetic structuring in F. fragum, they did not support incipient speciation of the “tall” and “short” morphotypes. The preferred scenario also revealed a split between two main lineages of F. gravida, one from Ascension Island and the other from Brazil. The Brazilian lineage is further divided into a species that occurs throughout the Northeastern coast and another that ranges from the Abrolhos Archipelago to the state of Espírito Santo. BFD* scenarios were corroborated by analyses of SNP matrices with varying levels of missing data and by a speciation-based delimitation approach (DELINEATE). Our results challenge current notions about Atlantic reef corals because they uncovered surprising genetic diversity in Favia and rejected the long-standing hypothesis that Abrolhos Archipelago may have served as a Pleistocenic refuge during the last glaciations.
README: Title of dataset: Genetic structuring and species boundaries in the Atlantic stony coral Favia (Scleractinia, Faviidae)
Authors: Carolina de Lima Adam, Robert J. Toonen, David B. Carlon, Carla Zilberberg, Marcos Soares Barbeitos
https://doi.org/10.5061/dryad.0vt4b8h4t
Corresponding author information:
- Carolina de Lima Adam
- Current Institution address: University of Oregon
- Email: carolinaladam@gmail.com
This README file describes the data collection and the data package accompanying the above publication.
Date of collection: F. gravida (Ascension Island - 6 individuals; Brazil - 47 individuals); F. fragum (Caribbean - 16 individuals)
File list
- TotalRawSNPs.vcf
- favia_coral_20missing.vcf
- popmap.csv
- *.R files:
Description of individual files
TotalRawSNPs.vcf
- Variant Call File (VCF) with raw single nucleotide polymorphisms (SNP) for each individual in the dataset
File obtained with dDocent with the following parameters:
- Clustering threshold c = 90%
- Minimum within individual coverage level k1 = 3
- Minimum number of individuals sharing a read k2 = 4
favia_coral_20missing.vcf
- Subset of TotalRawSNPs.vcf after filtering with vcftools
Filters applied:
- Minimum quality Phred 30 (vcftools --minQ 30), minimum mean depth 3 (vcftools min-mean-DP 3)
- Removed individuals with more than 55% missing data
- vcftools --missing-ind (generates out.imiss file)
- awk '$5 > 0.55' out.imiss | cut -f1 > lowDP.indv
- vcftools --remove lowDP.indv
- Loci with more than 20% missing data were removed (vcftools --max-missing 0.8)
- Indels (insertions and deletions) were removed (vcftools remove-indels)
- Loci with minor allele frequency lower than 0.01 were removed (vcftool --maf 0.01)
- Only one SNP per locus was retained (vcftools --thin 500)
popmap.csv:
- tab-delimited file designating the sampling site for each sample in the VCF file
sNMF.R:
- Script to run sNMF analysis (Fast and Efficient Estimation of Individual Ancestry Coefficients)
fst_heatmap.R:
- Script to estimate pairwise Fst (Fixation index) between Favia individuals using the VCF file as input and to plot a heatmap based on the obtained values
pca_dapc.R:
- Script to perform Principal Components Analysis (PCA) and Discriminant Analysis of Principal Components (DAPC) using the VCF file as input
Methods
DNA was extracted from tissue samples of F. fragum using the Omega E.Z.N.A Tissue DNA kit and Invitrogen PureLink Genomic DNA kit. Extractions were purified using 1.8X AmPureXP magnetic beads. DNA quality was assessed via electrophoresis in 1.5% agarose gel, ensuring that only high molecular weight DNA was carried over to the digestion step. Quantification was performed using a Qubit 2 Fluorometer and the dsDNA High Sensitivity Assay kit. Libraries were prepared following the ezRAD protocol (Toonen et al. 2013). Briefly, samples were digested using the enzyme DpnII (New England Biolabs), in 50μL reactions containing 5μL DpnII NEB 10X Buffer, 2 units of DpnII, and 200-1000ng of DNA. Digestions were incubated at 37°C for 3 hours, then heat-inactivated for 20 minutes at 65°C, purified using 1.8X AMPureXP beads, run in 1.5% agarose gels and considered successful if producing a smeared band. Library preparation began with the KAPA HyperPrep Library kit (Roche Sequencing Store) following Knapp et al. (2016) with minor modifications. DNA samples were end-repaired and a-tailed, and then adapter ligation was performed using IDT xGen Stubby Adapters and Unique Dual Index (UDI) primer pairs. These index-ligated products were size selected with Mag-Bind Magnetic Beads (Omega Bio-Tek) targeting fragments in a 350-700bp range in two steps, with DNA:bead ratios of 1:0.6 and 1:0.2, respectively. Samples were amplified using six to ten PCR cycles with KAPA HiFi Hotstart Ready-mix (Omega Bio-Tek) and purified using 1:1 DNA:AMPure XP beads. Libraries were validated using Qubit dsDNA HS kit, Agilent 2100 Bioanalyzer, and qPCR, and then pooled and sequenced as paired-end (2x150bp) reads on the Illumina HiSeq 4000 sequencer at the Research Technology Support Facility (RTSF) Genomics Core, at Michigan State University.
We assessed raw read quality with FASTQC (Andrews, 2010) and subsequent processing was performed with the bioinformatic pipeline dDocent (Puritz, Hollenbeck, & Gold, 2014).
First, universal adapters and reads with Phred <30 were excluded from further analyses. We randomly chose three individuals from each population to create the de novo assembly reference onto which all samples would be mapped. We used dDocent scripts ReferenceOpt.sh and RefMapOpt.sh to select the optimal combination of parameters for the de novo assembly, aiming at maximizing the number of paired mapping reads while minimizing the number of mismatched reads. We set the minimum level of similarity among sequences in the same cluster, or clustering threshold (c), to 90%, the minimum within individual coverage level (k1) to 3, and the minimum number of individuals sharing a read (k2) to 4. Within dDocent, sequences from all individuals were mapped against the reduced representation reference and SNPs were called with FreeBayes (Garrison & Marth, 2012). The resulting raw VCF file was filtered using VCFTools (Danecek et al. 2011). We applied a minimum quality filter of 30 and a minimum mean depth of 3, then removed individuals with more than 55% missing loci. To assess the influence of missing data on the results we created three subsets with 0, 10 and 20% missing loci. Finally, we removed indels, filtered out loci with minor allele frequencies lower than 0.01 and kept a single SNP per locus in downstream analyses.