Data from: Exploring the potential of Angiosperms353 markers for species identification of Eastern Mediterranean orchids
Data files
Aug 27, 2025 version files 8.30 MB
-
Greek_orchids_Angiosperms353.zip
8.30 MB
-
README.md
4.52 KB
Abstract
Tuberous orchids are ecologically vulnerable species, threatened by a range of environmental pressures such as overharvesting, grazing and land use change. Conservation efforts require accurate species identification, but are impeded by limited phylogenetic resolution of traditional genetic markers, which is exacerbated by widespread taxonomic conflict regarding the classification of orchids. Target enrichment holds promise to resolve both these challenges by offering a large set of nuclear loci with which to increase phylogenetic resolution and evaluate competing species models. Here, we evaluate the effectiveness of the Angiosperms353 markers for distinguishing over 50 tuberous orchid species native to Greece and we explore the possibility of narrowing these markers to a smaller set that could function as a minimal probe set. Our methodology consists of a three-tiered approach: 1) generating a species-level phylogeny using all Angiosperms353 loci with sufficient target recovery, 2) evaluating competing species models based on “splitter” and “lumper” classifications through Bayes Factor species delimitation, and 3) ranking the potential of Angiosperms353 loci to discriminate representatives of lineages with different divergence times based on their phylogenetic informativeness. While the inferred multi-species coalescent phylogeny had overall high support, Bayes Factor delimitation revealed mixed outcomes, favouring splitting in Serapias, and in Ophrys favouring splitting in basal clades and lumping in more recently diverged clades. A molecular clock analysis of Ophrys confirms rapid and recent radiation in clades marked by phylogenetic uncertainty, suggesting the need for additional loci to fully resolve this genus. Finally, we found 30 loci to be highly phylogenetically informative across four epochs of Orchidinae evolution; we suggest these are promising candidates for future marker development. Our findings enhance the Plant and Fungal Tree of Life (PAFTOL) by contributing additional phylogenomic data for species that were previously underrepresented, while shedding light on the ongoing “splitter”-vs-“lumper” debate and offering new directions for species identification of tuberous orchids, a group with distinct taxonomic and conservation challenges.
https://doi.org/10.5061/dryad.qrfj6q5sb
Description of the data and file structure
This datasets contains the exons sequences, gene alignments and gene trees generated for 165 orchid samples through hybridisation capture and sequencing with the Angiosperms353 baits. The samples were collected in Greece and represent 56 taxa of the Greek native orchid flora. Target sequences were assembled bioinformatically with HybPiper (Johnson et al., 2016) by mapping sequencing reads against a custom reference file including CDS sequences of the 353 target loci from 7 different orchid transcriptomes (McLay et al., 2021) and building contigs with the mapped reads. The exons were used to create gene alignments for 148 out of 165 individuals with MAFFT (Katoh & Standley, 2013) and trimmed with TrimAL (Capella-Gutiérrez et al., 2009) prior to building gene trees with IQ-TREE 2 (Minh et al., 2020). The raw sequencing data used to generate this dataset is available on the NCBI Sequencing Read Archive (SRA) under BioProject PRJNA1167703.
Files and variables
File: Greek_orchids_Angiosperms353.zip
Description: This directory contains four folders:
reference: this directory contains the target reference file (orchids_mega353.fasta) that was used to assemble the reads and recover target sequences. The Angiosperms353 target reference file was enriched with transcriptome sequences of seven orchid species, to enhance target recovery as described by McLay et al. (2021). These species and their abbreviations are listed in orchids_mega353.csv.
exons: this directory contains the recovered exon sequences for all 165 samples after running the HybPiper pipeline (Johnson et al., 2016). These sequences have not been curated.
alignments: this directory contains the exon alignments for 148 samples after removing individuals with poor recovery/taxon concordance, realignment with MAFFT (Katoh & Standley, 2013) and trimming with trimAL (Capella-Gutiérrez et al., 2009).
treefiles: this directory contains the gene trees that were constructed for each locus with IQ-TREE 2 (Minh et al., 2020) based on the trimmed exon alignments. The treefiles represent the trees with the best likelihood score after running IQ-TREE 2 with model selection, 1000 bootstrap replicates and 10,000 iterations.
Access information
Other publicly accessible locations of the data:
- Raw sequencing data and sample metadata is available at: http://www.ncbi.nlm.nih.gov/bioproject/1167703
Data was derived from the following sources:
- Detailed instructions on how to create the custom target reference file are available here: https://github.com/chrisjackson-pellicle/NewTargets
References
- Capella-Gutiérrez, S., Silla-Martínez, J. M., Gabaldón, T., 2009. trimAl: A tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973. https://doi.org/10.1093/bioinformatics/btp348.
- Johnson, M. G., Gardner, E. M., Liu, Y., Medina, R., Goffinet, B., Shaw, A. J., Zerega, N. J. C., & Wickett, N. J. (2016). HybPiper: Extracting coding sequence and introns for phylogenetics from high-throughput sequencing reads using target enrichment. Applications in Plant Sciences, 4(7), 1600016. https://doi.org/10.3732/apps.1600016.
- Katoh, K., & Standley, D. M. (2013). MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Molecular Biology and Evolution, 30(4), 772–780.
- McLay, T. G. B., Birch, J.L., Gunn, B. F., Ning, W., Tate, J. A., Nauheimer, L., Joyce, E. M., Simpson, L., Schmidt-Lebuhn, A. N., Baker, W. J., Forest, F., Jackson, C. J., 2021. New targets acquired: Improving locus recovery from the Angiosperms353 probe set. Applications in Plant Sciences. https://doi.org/10.1002/aps3.11420.
- Minh, B. Q., Schmidt, H. A., Chernomor, O., Schrempf, D., Woodhams, M. D., von Haeseler, A., & Lanfear, R. (2020). IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era. Molecular Biology and Evolution, 37(5), 1530–1534.
We collected 165 orchid samples representing 56 unique taxa in Greece, and sequenced these following targeted capture and enrichment with the Angiosperms353 baits. The recovered exon sequences were aligned and trimmed to generate gene trees for 335 out of 353 target genes.
