Data from: A new species in Iris section Pseudoregelia (Iridaceae) from Gansu, China
Data files
Dec 06, 2024 version files 1.23 MB
-
README.md
2.24 KB
-
sequence_data.zip
1.23 MB
Abstract
Iris longnanensis Z.F. Bai, Y.E. Xiao, F.Y. Yu (Iridaceae: subgenus Iris section Pseudoregelia), a novel herbaceous species native to the arid and thermal valleys of Longnan city, Gansu province, China, is herein described and illustrated. Morphologically akin to I. leptophylla Lingelsheim, I. longnanensis is distinguished by its dense, persistent foliage and strikingly short new leaves (6.6–15.1 cm) during the blooming season, along with its distinctive purplish-brown flowering stem. Phylogenetic analyses based on chloroplast DNA sequences corroborate the classification of I. longnanensis within section Pseudoregelia, highlighting the significance of this new species discovery for understanding the evolution and diversity within the genus.
README: A new species in Iris section Pseudoregelia (Iridaceae) from Gansu, China
https://doi.org/10.5061/dryad.70rxwdc7h
Description of the data and file structure
Genomic DNA was extracted from silica-dried leaf materials of the Longnan species (two individuals) and the other species using a commercial plant genomic DNA extraction kit (CW0531; CWBIO, Jiangsu, China). We constructed a library with an average length of 350 bp using the Illumina DNA Library Preparation Kit (E7370L; NEB, USA), which was then sequenced on the NovaSeq 6000 platform (Illumina, San Diego, CA). De novo assembly was performed on the clean data using SPAdes v.3.14.1, and k-mer values of 105 were set to obtain the graph file. Then, the redundant contigs were removed visually using the Bandage software and edited into a circular form. The circular sequence thus obtained is the complete chloroplast genome sequence.
Files and variables
File: sequence_data.zip
Description: This file contains all the complete chloroplast genome sequence datas. The sequences of Longnan species and 12 species/varieties in Iris subg. Iris (I. bloudowii, I. cuniculiformis, I. goniocarpa, I. goniocarpa var. tenella, I. leptophylla, I. mandshurica, I. sichuanensis, I. pandurata, I. tigridia, I. mandshurica, and I. bloudowii) were obtained by our sequencing. I. domestica , I. dichotoma and I. ensata were downloaded from GenBank.
Code/software
Genomic Data Analysis:
- Software Options: NCBI's BLAST, Artemis, and GBRAP (GenBank Retrieving, Analyzing, and Parsing software) for handling .gbk/.gb files. A maximum likelihood (ML) tree was constructed using RAxML-HPC v.8 (Stamatakis 2014) with 1000 bootstrap replicates under the best substitution model (GTR+GAMMA). The Bayesian inference of the phylogeny was performed using MrBayes v.3.7.2a (Ronquist et al. 2012).
Data was derived from the following sources:
- The sequences data of I. domestica, I. dichotoma and I. ensata were downloaded from Genbank, the Genbank accession numbers were NC_050833.1(I. domestica) , NC_056172.1( I.dichotoma) and NC_056173.1(I. ensata).
Methods
Living specimens and vouchers were examined from mid-March to mid-June in 2023. Ten randomly selected individuals of the new species were used for the morphometric surveys (Table 1). Additionally, the new species was compared with herbarium sheets at the herbaria E, BNU, CAS, NAS, CDBI, EE, HNWP, IATM, IBSC, IFP, KUN, NAS, PE and SG, and with the original descriptions of I. dolichosiphon, I. dolichosiphon subsp. orientalis, I. cuniculiformis, I. goniocarpa, I. goniocarpa var. tenella, I. hookeriana, I. leptophylla, I. narcissiflora, I. pandurata, I. sichuanensis, I. kemaonensis, I. pandurata, and I. tigridia. (No specimen of I. sikkimensis was available.)
Taxon sampling for the phylogenetic analysis
The Longnan species and 11 species/varieties in Iris subg. Iris (I. bloudowii, I. cuniculiformis, I. goniocarpa, I. goniocarpa var. tenella, I. leptophylla, I. mandshurica, I. sichuanensis, I. pandurata, I. tigridia, I. mandshurica, and I. bloudowii), two sister species to I. subgen. Iris (I. domestica and I. dichotoma, Wilson, 2017), and one species in I. subgen. Limniris (I. ensata) were used to construct phylogenetic trees based on complete chloroplast genome data. Iris domestica, I. dichotoma, and I. ensata were selected as the outgroup for the phylogenetic analysis. All samples were obtained from the conservation nursery of Shanghai Botanical Garden. Iris kemaonensis, I. dolichosiphon, I. dolichosiphon subsp. orientalis, I. hookeriana, I. narcissiflora, and I. sikkimensis was not available for phylogenetic study. Iris sikkimensis is not known from the wild so it is ignored here.
Genomic DNA was extracted from silica-dried leaf materials of the Longnan species (two individuals) and the other species using a commercial plant genomic DNA extraction kit (CW0531; CWBIO, Jiangsu, China). We constructed a library with an average length of 350 bp using the Illumina DNA Library Preparation Kit (E7370L; NEB, USA), which was then sequenced on the NovaSeq 6000 platform (Illumina, San Diego, CA). Low-quality reads and adapters were trimmed using FastQC v.0.11.9 (Andews, 2010). The raw data of the 10 species was >3.30G, and the clean data was >3.28G after quality control processing, resulting in a >496-fold depth of coverage of the chloroplast genome. The GC content of the cleaned data from the 11 species was 43.34% (I. bloudowii), 41.96% (I. cuniculiformis), 42.68% (I. goniocarpa), 42.59% (I. goniocarpa var. Tenella), 40.49% (I. leptophylla), 40.77% (I. mandshurica), 40.29% (I. sichuanensis), 41.46% (I. pandurata), 41.46% (I. tigridia), 41.40% and 40.80% (I. longnanensis), the Q20-value was >95.77%, and the Q30 value was >89.77%, indicating that the quality of the chloroplast genome sequencing and assembly results was very high. De novo assembly was performed on the clean data using SPAdes v.3.14.1, and k-mer values of 105 were set to obtain the graph file. Then, the redundant contigs were removed visually using the Bandage software and edited into a circular form. The circular sequence thus obtained is the complete chloroplast genome sequence. Finally, it was annotated by PGA (Qu et al., 2019) with Iris dichotoma (OK448492) as the reference genome. IR regions were identified with IRscope (Amiryousefi et al. 2018), which detected the IRa and IRb regions along with single-copy regions (LSC and SSC). The identified IRs were manually validated by aligning IRa and IRb to confirm sequence similarity. Boundaries of the IR regions were annotated by locating conserved genes, such as ycf1 and rps19. Finally, the complete plastome map, including IR, LSC, and SSC regions, was visualized using OGDraw to confirm IR boundaries and genome structure.
We used a dataset including the 115 coding-genes sequence to construct the phylogenetic tree. A maximum likelihood (ML) tree was constructed using RAxML-HPC v.8 (Stamatakis 2014) with 1000 bootstrap replicates under the best substitution model (GTR+GAMMA). The Bayesian inference of the phylogeny was performed using MrBayes v.3.7.2a (Ronquist et al. 2012). The Markov chain Monte Carlo chain length was set to 20 million and the sample frequency was set to 2,000 (two independent operations). The final average standard deviation of split frequencies was less than 0.01.
Distribution and mapping
The survey sites for the new species were located on a GPS map (GARMIN GPSMAP 639csx). Maps were produced using QGIS v.3.32 based on the free vector and raster map data available online (https://www.naturalearthdata.com).