Skip to main content

Phylogeny of species, infraspecific taxa, and forms in Iris subgenus Xiphium (Iridaceae) that have centers of diversity in the Mediterranean Basin biodiversity hotspot

Cite this dataset

Wilson, Carol et al. (2023). Phylogeny of species, infraspecific taxa, and forms in Iris subgenus Xiphium (Iridaceae) that have centers of diversity in the Mediterranean Basin biodiversity hotspot [Dataset]. Dryad.


Iris subgenus Xiphium is a small group of taxa that mostly occur in the Mediterranean Basin, a long-recognized biodiversity hotspot. Phylogenetic relationships among these Iris were reconstructed based on sequence data from 110 nuclear markers and whole plastomes using Bayesian inference and maximum likelihood methods. Best trees based on plastome and combined datasets resolved subgenus Xiphium and I. xiphium as not monophyletic while nuclear data resolved the subgenus as monophyletic but I. xiphium as not monophyletic. Topology tests indicated that the alternative hypothesis of a monophyletic subgenus cannot be rejected while a monophyletic I. xiphium can be rejected. We hypothesize that the subgenus is monophyletic based on these analyses, morphology, and biogeography and that uneven patterns of missing data is a likely reason for topological incongruence among datasets. A previously suggested informal group within the subgenus was supported. Patterns of relationships among species suggest multiple exchanges between the African and European continents but also the importance of the Strait of Gibraltar as a barrier to genetic exchange. Bayesian analyses, biodiversity hotspots, constraint trees, Iberian Peninsula, Iridaceae, maximum likelihood analyses, Mediterranean Basin, missing data, North Africa, nuclear markers, Strait of Gibraltar, targeted enrichment, whole plastomes


Genomic DNA was isolated from silica-dried leaf materials using protocols modified from the CTAB method of Doyle and Doyle (1987). Modifications from this procedure included RNase treatment and an ethanol precipitation with ammonium acetate following the initial isopropanol precipitation. Prior to library preparation, extracted DNA (0.4–1.2 µg) was fragmented to an average length of 500 bp with sonication (Bioruptor UDC-200, Diagenode, Denville, NJ, USA). Single index library construction and DNA enrichment followed Meyer and Kircher (2010).

Targeted markers were captured and enriched with a custom myBaits-hyb capture kit designed for Iris (Daicel Arbor Biosciences, Ann Arbor, MI, USA) following the manufacturers recommendations except that kit blockers were replaced with blocking oligos from Xgen (Integrated DNA Technologies Inc., Coralville, IA, USA) and SeqCap plant capture enhancer (Roche, Burgess Hill, UK). Data for targeted markers and un-enriched plastomes was obtained using NGS 100 bp paired-end read sequencing run on an illumina 4000 (Illumina Inc., San Diego, CA, USA). DNA extraction, library preparation, and sequencing were performed at the University of California, Berkeley, California, USA.

A pipeline was developed and executed on the Savio high-performance computing cluster at the University of California, Berkeley (available from communicating author). The pipeline used Trimomatic (Bolger et al. 2014) to filter and remove index sequences with parameters settings of 40 bp minimum length and a 10:20 sliding window. Using a minimum read depth of four and quality of 20%, plastomes were assembled against the I. gatesi R.C. Foster plastome (Wilson 2014) and nuclear markers were assembled against the 635 markers developed from exome data that are described above. Data for each nuclear marker was examined in Geneious 9.14 (Biomatters, Ltd., New Zealand) to select markers > 900 bp in length with < 25% missing data for each sample. Final datasets were assembled in in Geneious 9.14 (Biomatters, Ltd., New Zealand) and included plastomes, 110 selected nuclear exome markers, combined nuclear markers (nuclear), combined plastome and nuclear markers (combined), and combined nuclear and plastid coding regions (all-genes). Sites with > 50% of n’s were excluded from datasets with reading frames preserved by excluding bp in multiples of three within coding sequences.

Plastomes were partitioned by coding, intron, and intergenic spacer regions and each coding region of plastome and nuclear datasets was partitioned by codon position resulting in 379 and 330 partitions, respectively. Partitions were merged and modeled for nuclear, plastome, all-gene, and combined datasets using PartitionFinder (Lanfear et al. 2012) executed in IQ-TREE v2.1.3 (Nguyen et al. 2015) resulting in 12, 12, 18, and 25 partitions, respectively. RAxML v. 8 (Stamatakis 2014) and IQ-TREE v2.1.3 (Nguyen et al. 2015) maximum likelihood (ML; Felsenstein 1981) and ML bootstrap (Felsenstein 1985) analyses on nuclear, plastome, all-gene, and combined datasets were each performed with one thousand replicates for each bootstrap. MrBayes Version 3.1.2 (Huelsenbeck and Ronquist 2001) was used to perform Bayesian Inference (BI) which was run for four million generations, with two runs and six chains that were sampled every 1,000 generations with a burn-in rate of 0.25.

Usage notes

The dataset uploaded includes the plastome, nuclear, combined, and all-gene datasets.  Also included are partition files for each dataset, constraint trees that test monophyly of Iris xiphiuma monophyletic subgenus with a nested monophyletic I. xiphium, and a monophyletic Xiphium group + I. juncea. Final trees include those based on the combined, all-gene, plastome, and nuclear datasets. Tables 1-3 provide information on datasets and trees.


American Iris Foundation