Skip to main content

Illumina next generation ddRAD sequencing SNP data from: Contrasting genetic diversity and structure between endemic and widespread damselfishes are related to differing adaptive strategies

Cite this dataset

Robitzch, Vanessa (2022). Illumina next generation ddRAD sequencing SNP data from: Contrasting genetic diversity and structure between endemic and widespread damselfishes are related to differing adaptive strategies [Dataset]. Dryad.


Aim: Discerning when, where, and how processes of isolation lead to differing biogeography is especially complex for marine species with similar ecological niches and within the same geographic location. We assessed population genetics of congeneric and ecologically similar damselfishes within their overlapping distributions and across potential barriers to geneflow.


Taxon: Dascyllus marginatus (endemic) and Dascyllus abudafur (widespread).


Location: Coral reefs from the Red Sea, Djibouti, Yemen, Oman, and Madagascar. 


Methods: We used RADseq derived SNPs to investigate key differences in population genetics between both species and discuss barriers shaping genetic differentiation (neutral vs. selective) and biogeography. 


Results: Dascyllus marginatus inhabited the Red Sea, the coasts of Yemen (including Socotra), and the Gulf of Oman. Dascyllus abudafur species was present from the Red Sea to Madagascar but was absent from Yemen and Oman. Populations of D. marginatus had an order of magnitude higher genetic differentiation compared to D. abudafur, as well as several outlier loci (suggesting selective pressure), which were absent in D. abudafur despite equal sampling locations. In both species, specimens from the Red Sea and Djibouti formed one genetic cluster separated from all other locations.  


Main conclusions: The stronger genetic structure at smaller geographic scale of the endemic species seems associated to faster adaptation to environmental differences; whereas the widespread species only experienced reduced geneflow and neutral differentiation at much larger geographic scales. Restrictive transitions (between the Gulf of Aqaba and the Red Sea or the Red Sea and the Gulf of Aden) did not affect the genetic architecture of either species, while the environmental shift within the Red Sea (at 22°N/20°N) affected the endemic but not the widespread species. Samples from continental Yemen revealed that a genetic break in the Gulf of Aden likely reflects historical colonization processes and not contemporary environmental regimes.


The extracted, high quality genomic DNA (500 ng per sample) was double digested using the restriction enzymes SphI and MluCI (NEB) and following the protocol described by Peterson et al., (2012) with some modifications: The DNA was digested at 37 °C for three hours followed by a ligation step, where each sample was assigned to one of sixteen unique adaptors. Pools of sixteen individuals were combined and run on a 1.5 % agarose gel, from which fragments of ~400 base-pairs (bp) were manually excised and purified using a Zymoclean Gel DNA recovery kit. Each pool was then amplified adding a unique indexing primer for each pool according to the standard Illumina multiplexed sequencing protocol in a 50 μl PCR reaction containing 25 μl Kapa Hifi Hotstart Ready Mix Taq, 20 μl of pooled library DNA, 2.5 μl of the universal Illumina PCR primer, and an additional 2.5 μl of one of twelve unique indexing primers for each pool. Amplifications were carried out in an Eppendorf 94-well vapo.protect Mastercycler Pro System (Fisher Scientific) using the following protocol: initial step at 95 °C for 3 min, followed by ten cycles of 98 °C for 20 sec, 60 °C for 30 sec, and 72 °C for 30 sec, and a final step at 72 °C for 5 min. DNA libraries were quantified first using the high sensitivity DNA analysis kit in a 2100 Bioanalyzer (Agilent Technologies) and then by running a qPCR in an ABI7900HT fast real-time PCR system (Thermo Fisher Scientific) using the KAPA Library Quantification Kits (Kapa Biosystems). The length (in bp) and quality of the library fragments was once again measured in the 2200 TapeStation (Agilent Technologies) using the High Sensitivity D1000 ScreenTape kit. Pools were subsequently combined in equimolar concentration to form a single genomic library. In total, four libraries were created and run simultaneously on four separate lanes of a HiSeq 2000 Illumina sequencer (each library and lane contained 80 individuals: two libraries/lanes for each species; i.e., two for Dascyllus marginatus and two for D. abudafur as single end reads, 1 x 101 bp; v3 reagents).  

De-novo assembly

Sequences were demultiplexed and filtered for quality using the ‘’ pipeline in STACKS v.2.5 (Catchen et al., 2011). Individual reads with uncalled bases (-c), low quality score (- q), and phred-scores below an average of 20 (on a sliding window, -s = 20) were discarded. RAD-tags and barcodes were rescued (-r). After demultiplexing, individuals with less than 500,000 reads recovered, were removed. Loci for the remaining samples were assembled de-novo using the ‘’ pipeline (STACKS v.2.5). Different parameter combinations were evaluated, which resulted in different numbers of loci but gave similar results in genetic comparisons (i.e., genetic clustering and pairwise FST among sites). For the final data set, the maximum number of mismatches allowed between stacks within and between individuals (for ustacks and cstacks, respectively) was set to three (-M and -n), similar to the one recommended by Mastretta-Yanes et al., (2014).


SNP filtering

The ‘populations’ component of STACKs v.2.5 was used to create and export a vcf file containing SNPs (only the first per locus to be called, --write-single-snp) that are present in at least 95% of the samples (-R 0.95), with a minimum allele frequency of 0.02 (--min-maf) and a maximum observed heterozygosity of 0.6 (-- max-obs-het); and to calculate population statistics (--fstats).