Understudied regions and messy taxonomy: Geography, not taxonomy, is the best predictor for genetic divergence of the Poecilimon bosphoricus species group
Data files
Oct 19, 2023 version files 1.64 MB
Abstract
The complex and dynamic history of the Anatolian Peninsula during the Pleistocene set the stage for species diversification. However, the evolutionary history of biodiversity in the region is shrouded by the challenges of studying species divergence in the recent, dynamic past. Here we study the Poecilimon bosphoricus (PB) species group to understand how the bush crickets’ diversification and the region’s complex history are coupled. Specifically, using sequences of two mitochondrial and two nuclear gene segments from over 500 individuals for a comprehensive set of taxa with extensive geographic sampling, we infer the phylogenetic and geographic setting of species divergence. In addition, we use the molecular data to examine hypothesized species boundaries that were defined morphologically. Our analyses of the timing of divergence confirm the recent origin of the PB complex, indicating its diversification coincided with the dynamic geology and climate of the Pleistocene. Moreover, the geography of divergence suggests a history of fragmentation followed by admixture of populations, suggestive of a ring species. However, the evolutionary history based on genetic divergence conflicts with morphologically defined species boundaries, raising the prospect that incipient species divergences may be relatively ephemeral. As such, the morphological differences observed in the PB complex may not be sufficient to have prevented homogenizing gene flow in the past. Alternatively, with the recent origin of the complex, the lack of time for lineage sorting may underlie the discord between morphological species boundaries and genetic differentiation. Under either hypothesis, geography – not taxonomy – is the best predictor of genetic divergence.
README: Datasets and trees of P. bosphoricus group
https://doi.org/10.5061/dryad.8pk0p2nv2
The unique haplotypes of COI, ND2, and ITS genes were given here. Every sequence is coded with the species and population name given in the article. These matrices are used for phylogenetic analysis (ML&BI) and molecular clock analysis (BEAST). The trees obtained with these efforts are also deposited here.
Methods
Individuals from almost all of the currently recognized species of the P. bosphoricus species group (Orthoptera, Tettigonidae; Phaneropterinae) were collected across the total geographic ranges of the species group during 2015–2019. The only exception is for P. athos that is isolated to the Athos Peninsula, Greece.
DNA was extracted from the muscle of the hind femur with a proteinase K digestion and following the salt/isopropanol protocol (Aljanabi and Martinez, 1997). We amplified and sequenced the mitochondrial genes cytochrome C oxidase subunit I (COI) and nicotinamide adenine dinucleotide subunit 2 (ND2), and two nuclear ribosomal internal transcribed spacers of 5.8S rDNA, hereafter referred to as ITS; for details see supplementary material. Both strands were sequenced on a 23 ABI 3730XL DNA analyser by Macrogen Europe (Macrogen Inc., Amsterdam, the Netherlands). GenBank accession numbers of sequences are given in the article.
Contigs from the forward and reverse sequences were visualized using SEQUENCHER v.4.1.4 (Gene-Codes Corp.) and aligned using MAFFT v.7.245 (Katoh and Standley, 2013) online version (http://align.bmr.kyushuu.ac.jp/mafft/online/server/) with the default setups (FFT-NS-i strategy, scoring matrix 200PAM (k = 2), gap opening penalty = 1.53). The numt probability of the COI and ND2 sequences was assessed following the criteria of Kaya and Çıplak (2018). Because the mtDNA and nuclear loci were not sequenced in each individual, datasets for each of the mitochondrial (COI and ND2) and nuclear loci (ITS) were analysed separately; concatenating across loci is not possible without a significant reduction in individuals and geographic coverage (e.g., some localities were sequenced for COI but not ND2, or some individuals had only ITS sequences). For example, 565, 849 and 678 individuals were sequenced for COI, ND2 and ITS, respectively, versus 381 individuals with sequences of COI and ND2 (for details about sequences see Table S1 and Table S3). We recognize this is not ideal; however, given our study is focused on the geography of divergence, our priority is on including all individuals so analyses were then run separately for each gene. Moreover, concatenating the data with the systemic pattern of missing data would introduce an unwanted bias that is undesirable. Unique haplotypes and their frequencies were identified by DNASP v.5 (Librado and Rozas, 2009) and the nucleotide composition, the number of variable sites, and indels were calculated with MEGA v.X (Kumar et al., 2016) for each locus. Saturation probabilities of single gene datasets were assessed using DAMBE v.7 (Xia, 2018). Phylogenetic relationships were estimated by maximum likelihood (ML) and Bayesian methods using RAxML (Stamatakis, 2006) via CIPRES with 1000 replicates and MRBAYES v.3.2.2 (Ronquist et al., 2012). To estimate divergence times, branch lengths were estimated using a molecular clock in BEAST v.2.6.1 (Bouckaert et al., 2019). Each BEAST analysis was run with a relaxed lognormal clock, linked site GTR and gamma model using 4 discrete gamma categories, linked Yule tree model for 100 million generations sampling every 10000 generations. The maximum clade credibility trees were built using TREEANNOTATOR implemented in BEAST, discarding the initial 25% samples as burn-in samples.