Data from: Biogeography and diversification patterns in the Irano-Turanian biodiversity hotspot inferred from a molecular phylogeny of the subendemic Iris subgenus Scorpiris (Iridaceae)
Data files
Mar 05, 2026 version files 951.97 KB
-
108Samples.phy
879.99 KB
-
Bayesian_Tree.con.tre
48.08 KB
-
rasp.xlsx
16.24 KB
-
README.md
5.69 KB
-
scorpiris-ClAr.csv
1.96 KB
Abstract
The Irano-Turanian Floristic Region harbors a rich flora, but our understanding of the development of this diversity is limited by a lack of data on phylogenetic relationships and biogeographic patterns of endemic and more widespread plants. Hypotheses of in situ diversification versus allopatric diversification were tested using Iris subgen. Scorpiris, a species-rich group that is widely distributed in this region. Phylogenetic relationships of the subgenus were inferred using a comprehensive sampling strategy that incorporated newly collected accessions that represent previously under-sampled clades and underrepresented geographical regions. Included was I. drepanophylla, the type species for a major clade that had not been included in previous Iris studies. Six markers were used, thus increasing plastid region sampling compared to previous studies. The historical biogeography of resolved clades was explored to determine patterns of origin, dispersal, vicariance, and divergence, while the RelTime method was used to estimate times of divergence. This study confirmed major clades previously identified in the subgenus, suggested monophyly or non-monophyly of several species, and revealed unrecognized diversity in two species. Ages of major clades date from the Miocene to the Pliocene, while diversification continued into the Pleistocene. Our biogeographic inferences indicate that the subgenus and most major clades of I. subgen. Scorpiris originated and diversified in the Pamir-Alay, with subsequent diversification after expansion into the Tian Shan and Irano-Anatolian regions. Species from one clade dispersed to the Mediterranean and Caucasian regions from the Irano-Turanian region with relatively little diversification. We conclude, diversification of I. subgen. Scorpiris mainly followed the model of dispersal, then allopatric diversification,n mostly through founder events and/or ecological speciation facilitated by general aridification across Eurasia and mountain uplifts that created rain shadows. We hypothesize that their multi-scaled bulbs facilitated adaptation to seasonally dry conditions that developed with climatic and geological changes.
Dataset DOI: 10.5061/dryad.r2280gbqt
Description of the data and file structure
Nucleotide sequence data were gathered to produce a phylogenetic tree for hypotheses of relationships in Iris subgenus Scorpiris. This dataset was analyzed using two phylogenetic approaches. Maximum parsimony (MP) analyses were performed with PAUP, and Bayesian Inference (BI) analyses were performed with MrBayes.
Areas of occurrence were determined for each taxon included in the study to reconstruct ancestral areas for clades resolved. These reconstructions were used to hypothesize patterns of dispersal within the subgenus using the program Rasp. A secondary dating analysis using MegaII was also performed to provide estimates for the timing of dispersal events suggested by ancestral reconstruction.
Files and variables
File: rasp.xlsx
Description: The Rasp Excel file was developed to assign geographic areas to taxa included in the study. The distributional range of I. subgen. Scorpiris was subdivided into five units based on floristic regions: Mediterranean, Caucasus (including northern parts of Turkey and four provinces of Iran), Pamir-Alay mountain system excluding eastern Pamir (north-eastern Afghanistan is the southern part of western Pamirs), Irano-Anatolian region, and northern and western Tian-Shan (including Fergana range).
File: 108Samples.phy
Description: Nucleotide sequence data from six plastid markers: matK/trnK, trnL–trnF, psbJ–petA, rpl32–trnL, rpoB–trnC, and rpl14-rpl36. The dataset included 108 samples and was 8137 bp long with 22.6% missing data that includes indels, 976 variable sites, and 567 parsimony informative sites. The only codes used in these datasets are the IUCN codes for nucleotides, and missing data are represented by an N. The dataset included in this submission is in Phylip format (phy). Taxon names for acronyms used in the dataset are given in the Rasp file.
File: Bayesian_Tree.con.tre
Description: This is the resulting phylogenetic tree of Iris subgenus Scorpiris. It is in Newick format and includes the posterior probability and branch lengths.
File: scorpiris-ClAr.csv
Description: This file lists the geographic area of occurrence for samples used in the study, by clade. It was used in R to map nine climate variables on sites where 79 samples occurred to determine climatic niches of clades resolved in the phylogenetic study.
Code/software
Phylogenetic Analyses using a fasta dataset
PartitionFinder 2.1.1
The dataset was partitioned and modeled using PartitionFinder 2.1.1, a greedy search, and a comparison of AICc scores. The final partitioning scheme assigned a separate partition to each marker, and the best-fit substitution model was a GTR model with rate heterogeneity described by a gamma distribution for markers rpoB-trnC, rpl14-rpl36, and matK/trnK, a proportion of invariant sites for psbJ-petE, and both gamma and invariant site parameters for trnL-trnF and rpl32-trnL.
PAUP 4.0b10
This program was run using a heuristic search with 1000 random sequence-addition replicates, tree bisection–reconnection branch swapping, saving multiple trees, treating indels as missing data, and the modeling scheme from PartitionFinder. Branch support was estimated via non-parametric bootstrap (BS) with 1000 bootstrap replicates using the same search settings as in the MP analysis.
MrBayes 3.1.2
With substitution model parameters for all partitions unlinked and default priors. Two independent Markov Chain Monte Carlo runs, each with one cold and three heated chains using the default temperature, were run for 106 generations and sampled every 500th generation. Stationarity was assessed when the average standard deviation of split frequencies was < 0.01 and ESS values were > 200. After discarding 50% of generations of each run as burn-in, posterior probability (PP) distributions were estimated from the combined set of trees.
Ancestral Reconstruction using the Rasp and Fasta files
RASP 4.2
The likelihood-based BioGeoBEARS module was used in these analyses. To find the best-fitting model, we compared the resulting Akaike Information Criterion (AICc) and AICc weights for DEC, DEC+J, DIVALIKE, DIVALIKE+J, BAYAREALIKE, and BAYAREALIKE+J models. For all models, the maximum number of areas was set to five, and outgroups were removed from our time-calibrated consensus tree.
MEGA 11
The previously determined estimated crown age for I. subgen. Scorpiris of 8.67–17.94 Ma were used as calibration for the root node of I. subgen. Scorpiris and divergence dates for the main nodes within I. subgen. Scorpiris were estimated via the RelTime method and the BI tree with branch lengths.
Ecological Niche Modeling
RStudio v. 4.3.1
We used the location data of each clade along the selected nine individual climate variables to produce box and whisker plots that show the range, median values, and upper and lower quartiles.
Viewing the Bayesian Tree
FigTree v. 1.4.4
Access information
Other publicly accessible locations of the data:
- Nucleotide data is available at Genbank https://www.ncbi.nlm.nih.gov/nucleotide/. Accession numbers of newly collected sequence data are LC700120–LC700212.
Data was derived from the following sources:
- None
