Data from: Altai Mountains – cradle of hybrids and introgressants: A case study in Veronica subg. Pseudolysimachium (Plantaginaceae)
Data files
Nov 03, 2025 version files 114.15 MB
-
Altai-1148-varsites.phy
52.84 MB
-
Altai-1148.recode.vcf
15.61 MB
-
Altai-1148.str
8.38 MB
-
Altai-174.phy
37.32 MB
-
README.md
665 B
Abstract
Mountains form a diverse mosaic of microhabitats over small distances created by changes in climate, soil, and water availability. A key to adaptation of plants to such microhabitats is genetic variation; however, natural accumulation of genetic variation through mutation is slow and often not sufficient alone. Adaptive introgression via hybridization is an alternative to generate genetic variation. Here, we investigate hybridization and discuss its adaptive role in Veronica subg. Pseudolysimachium at their Altai Mountains distribution. To support our hypotheses of frequent hybridization, we genotyped thousands of SNPs for 233 individuals from 10 species and 7 putative hybrids previously described based on morphology. We employed Bayesian and likelihood statistical models and supported our results by morphometric analysis and genomic in situ hybridization (GISH). The results suggest that almost all the individuals of the putative hybrids are of F1 type. The GISH investigation in one case strongly supports homoploid hybridization (origin of V. ×schmakovii from V. longifolia and V. porphyriana. Divergence times of Altai Veronica species are estimated to be within 1–2 million years ago with high probability of gene flow over that time. Our results also demonstrate that the direction of gene flow is mainly from the locally endemic V. porphyriana. We hypothesize that the large Siberian plains and topographically diverse foreland of the Altai Mountains provide an ideal setting for hybridization with the potential for adaptive introgression of alleles conferring tolerance to cooler climates, to the lowland species migrating into the Altai Mountains.
The dataset comprises:
Phylogenetic alignments (.phy files) used for tree reconstruction and evolutionary inference.
Variant call format file (.vcf) containing SNP data recoded for population genetic analyses.
Structure input file (.str) formatted for Bayesian clustering and admixture analysis.
These files were derived from high-throughput sequencing of 1148 loci across multiple individuals. The data support findings of reticulate evolution and complex genetic relationships among taxa in this biodiversity hotspot.
We used GBS-SNP-CROP to generate polyploid-aware bi-allelic SNPs (Melo & al., 2016). GBS-SNP-CROP is explicitly designed for sample sets including individuals of varying ploidy levels and has the potential to genotype bi-allelic SNPs and exclude the multi-allelic variants by imposing a population-level allele frequency filter via a user-defined Alternative Allele Strength parameter. For each potential SNP position, this parameter considers the total read depth, across the whole population, of all four bases, from primary (the allele with the highest depth at that position) to quaternary (the allele with the lowest depth). A potential SNP is retained for further downstream analysis if and only if it is strongly bi-allelic. The GBS-SNP-CROP workflow first processes the raw GBS data to exclude the sequences with noise/bad quality; secondly builds a mock reference (if reference genome is unavailable); maps the high-quality reads to generate standardized alignment files; and lastly, calls the SNPs. The pipeline has seven Perl scripts utilizing VSEARCH and PEAR for clustering and merging of pair-end reads respectively (Zhang & al., 2014; Rognes & al., 2016). To make a mock reference we used only the diploid individuals (confirmed by flow cytometry) with a high number of reads after quality filtering (suppl. Table S1). At the end we got a total of 233,987 SNPs with 99.9999% confidence, which means 0.000001 error rate and Alternative Allele Strength parameter 0.90. For phylogeny estimation we used all these 233,987 SNPs with 75% missing data allowance before the post-processing step in VCFtools v.3.0. For STRUCTURE analyses, we only used the unlinked bi-allelic SNPs allowing every SNP to be present again in at least 75% of the individuals with minor allele frequency equal to 0.05 utilizing VCFtools (Danecek & al., 2011).
- Khan, Gulzar; Mayland‐Quellhorst, Eike; Kosachev, Petr A. et al. (2024). Altai Mountains – cradle of hybrids and introgressants: A case study in Veronica subg. Pseudolysimachium (Plantaginaceae). TAXON. https://doi.org/10.1002/tax.13176
