Data from: Phylogeography of the freshwater crab Potamon persicum (Decapoda: Potamidae): an ancestral ring species?
Data files
Aug 08, 2024 version files 125.22 KB
-
All_Samples_CO1.fas
-
Concatenate_Data_CO1_16S_H3.fas
-
README.md
-
Specimens_Details.xlsx
Abstract
The Zagros Mountains, characterized by complex topography and three large drainage systems, harbor the endemic freshwater crab Potamon persicum in Iran. Our study delves into the evolutionary history of P. persicum, utilizing two mitochondrial and one nuclear marker. We collected 214 specimens from 24 localities, identifying 21 haplotypes grouped into two major evolutionary lineages. Substantial differentiation exists between drainage systems and lineages. Historical demographic analysis revealed a significant decrease in population size during the late-Holocene, accompanied by a recent population bottleneck. Species distribution modelling has revealed eastward shifts in suitable habitats between the last glacial maximum and the present day. Following the last glacial maximum, habitat fragmentation occurred, resulting in the establishment of small populations. These smaller populations are more vulnerable to climatic and geological events, thereby limiting gene flow and accelerating genetic differentiation within species. Historical biogeographic analysis traced the origin of P. persicum to the western Zagros Mountains, with major genetic divergence occurring during the Pleistocene. Our genetic analyses suggest that P. persicum may have shown a genetic pattern similar to a classical ring species before the Pleistocene. The Namak Lake sub-basin could have served as a contact zone where populations did not interbreed but were connected through gene flow in a geographic ring. Currently, genetic separation is evident between basins, indicating that P. persicum in the Zagros Mountains is not a contemporary ring species. Also, our biogeographical analysis estimated that range evolution may have been driven initially by dispersal, and only during the late Pleistocene by vicariance.
README: Supplementary Data
https://doi.org/10.5061/dryad.m63xsj49d
Description of the data and file structure
All sequences in the FASTA format.
An Excel file containing details about sampling localities and NCBI Accession Numbers.
Sharing/Access information
Links to other publicly accessible locations of the data:
Data was derived from the following sources:
- All physical specimens are deposited in the Museum of the University of Tehran.
Methods
Sample Collection
A total of 214 specimens of P. persicum (Fig. 1) were collected from 24 localities, encompassing more than 280,000 km2, covering the entire distribution range of the species in Iran. To gain a more precise understanding of geographical structuring, the specimens were taken from three geographically distant drainage systems, Urmia, Khali Fars-Oman and Markazi basins (Fig. 2).
DNA Extraction and PCR Amplification
Total genomic DNA was extracted from the muscle tissue of walking legs using the Sambio DNA reagent kit. Three markers, mitochondrial Cytochrome Oxidase subunit 1 (CO1), mitochondrial 16S rRNA (16S), and nuclear Histone 3 (H3) were selected for their wide application in crustacean population diversity and structure studies (Klaus et al. 2010, 2014: Jesse et al. 2011; Keikhosravi et al. 2015; Phiri and Daniels 2016; Daniels and Klaus 2018; Parvizi et al. 2018, 2019; Gao et al. 2019; Stark et al. 2021). Final sample sizes differed for markers, we amplified CO1 for all specimens but because 16S and H3 are conserved we amplified one specimen for each population. Different primers were used for amplifying of the markers; CO1, LCO1490 (5ʹ-GGTCAACAAATCATAAAGATATTGG-3ʹ) with HCO2198 (5ʹ-TAAACTTCAGGGTGACCAAAAAATCA-3ʹ); 16S, 16L29 (5ʹ-YGCCTGTTTATCAAAAACAT-3ʹ) with 16HLeu (5ʹ-CATATTATCTGCCAAAATAG-3ʹ); and for H3, H3AF1 (5ʹ-ATGGCTCGTACCAAGCAGACVGC-3ʹ) with H3AR1 (5ʹ-ATATCCTTRGGCATRATRGTGAC-3ʹ). PCR conditions included an initial denaturation at 95 ˚C for 3 min, followed by 37 cycles of denaturation at 95 ˚C for 35 sec, annealing at 49 ˚C (16S), 51 ˚C (COI), 51.1 ˚C (H3) for 45 sec, and extension at 72 ˚C for 1 min; with a final extension at 72 ˚C for 7 min. Products sequenced using the LCO1490, 16L29 and H3AF1 primers. Sequences manually edited with ChromasPro version 2.6.6 (http://technelysium.com.au) and BioEdit (version 7.0.5.3; Hall 1999), and aligned using MUSCLE algorithm with default parameters in MEGA-X (Kumar et al. 2018). All sequences have been submitted to GenBank (see Supplementary material).
Phylogenetic Analyses
Phylogenetic analyses were performed through Bayesian inference (BI) using BEAST 2 version 2.7.3. (Bouckaert et al. 2019) and Maximum Likelihood (ML) using RAxML 8.2. (Kozlov et al. 2019). Models of nucleotide substitution for analyses were selected in jModelTest 2.1.10 (Darriba et al. 2012) using the Akaike Information Criterion (AIC). The data were partitioned based on genes into three partitions. Trees and clock were linked, while site models remained unlinked. We used an uncorrelated relaxed log normal clock implemented in the BEAST 2 version 2.7.3. (Bouckaert et al. 2019). For this analysis we used Potamon ruttneri Pretzmann, 1962 and Potamon gedrosianum Alcock, 1909 as outgroup taxa. We assigned a value of 0.1 Mya for the divergence of P. ruttneri as external calibration point based on the fossil calibrated phylogeny of the genus Potamon by Ghanavi et al. (2023). Four Markov chains were run, with each chain starting from a random tree and run for 10 million generations, sampling each chain at every 10,000th tree. This process was repeated four times (overall 40 million generations) to ensure that trees converged on the same topology. Each run was checked for adequate convergence using Tracer v1.7.0, first independently and then together. All parameter’s effective sample size (ESS) values were confirmed to be higher than 200. Then, we combined the results of log and tree files of independent runs using LogCombiner v.2.7.3. The Maximum Clade Credibility tree (MCC) was obtained using TreeAnnotator 2.7.3 after removal of 20% of the trees as burn-in. To estimate the maximum likelihood tree, analyses were conducted using RAxML 8.2. with 1,000 bootstrap replications (Kozlov et al. 2019). The trees were visualized with FigTree v1.4.4 (Rambaut 2018).
Population Structure
We estimated standard genetic diversity metrics including number of haplotypes (h), haplotype diversity (Hd), nucleotide diversity (π), number of segregating sites (s) in DnaSP 6.12.03 (Rozas et al. 2017). Haplotype network was estimated using statistical parsimony network analysis (TCS Network) using PopART (Clement et al. 2002). Analyses of molecular variance (AMOVA) performed in Arlequin v3.5.2.2 (Excoffier and Lischer (2010) to evaluate the hierarchical subdivision of genetic diversity within populations (FST), among populations in the same geographic region (FSC), and between geographically separated populations (FCT). Additionally, genetic divergences among drainage systems and lineages were obtained using the p-distance model with 1,000 bootstrap replicates in MEGA-X (Kumar et al. 2018). A Mantel test was conducted to assess the correlation between geographical distance and genetic divergence. The analysis utilized Alleles In Space ver1.0 (Miller 2005) with 10,000 permutations. The Mantel index ranges between +1 and -1. A higher positive and significant Mantel index value indicates a stronger positive correlation (direct) between genetic and geographical distances. Conversely, more negative and significant Mantel index values suggest a stronger negative (inverse) correlation between the genetic and geographical distances among populations (Miller 2005).
Historical Demography
We employed Arlequin v3.5.2.2 for conducting neutrality tests (Tajima’s D and Fu’s Fs statistics) and mismatch distribution (MMD) analyses. These analyses aimed to identify deviations from the null hypothesis of neutral evolution and to investigate recent population expansions, with 10,000 permutations per analysis. The onset of population expansion was estimated using the formula: t = τ/2u, where t represents the time since expansion, τ is the expansion parameter tau, u is the evolutionary rate per generation. The expansion parameter tau (τ) was estimated for the CO1 sequences using Arlequin v3.5.2.2. We used a mutation rate of 2.33% substitutions per site per million years of CO1 (Schubart et al. 1998).
Species Distribution Modelling
We used Maxent 3.4.4 (Phillips et al. 2006) to estimate the probability of occurrence of the species using presence data and environmental variables because it performs well with small sample sizes and with presence-only data (Elith et al. 2006; Hernandez et al. 2006; Rhoden et al. 2017). Nineteen bioclimatic variables were obtained from the WorldClim series (Hijmans et al. 2005; http://www.worldclim.org/). We ran the models for current conditions (CC) and last glacial maximum (LGM). The spatial resolution for current was 30 arc-seconds (approximately 1 km2) and for LGM was 2.5 arc-minutes (approximately 5 km2). The Maxent model was run with 90% presence records used for training and 10% for random testing. The procedure was repeated fifteen times and the number of iterations set on 5000. The accuracy of the models will be tested by the receiver operating characteristic (ROC) analysis. Considering the area under the curve (AUC) derived from the ROC plot, values range between 0 and 1, a model with an AUC value higher than 0.75 indicates an acceptable and robust model and with AUC values of <0.5 indicating a random prediction (Elith et al. 2006). The importance of each climatic variable for explaining the species distribution was determined by jack-knife procedure (Sillero and Carretero 2012).
Historical Biogeography
Ancestral biogeography was reconstructed using the Dispersal-Extinction-Cladogenesis (DEC) model with the BioGeoBears package (Ree et al. 2005; Matzke 2013; Ree and Sanmartín 2018). The distribution range of P. persicum in Iran included six sub-basins (Fig. 8A). The ingroup subtree was extracted from the resulting time-calibrated tree in BEAST 2 version 2.7.3. (Bouckaert et al. 2019), and 3,604 trees (after 20% burn-in) from the Bayesian Inference analyses were used to estimate the posterior probabilities of ancestral areas at each node.