Skip to main content
Dryad

Data and code from: Unsupervised machine learning for species discovery in Eurytoma and Phylloxeroxenus (Hymenoptera: Eurytomidae) parasitoids of oak gall wasps

Data files

May 12, 2026 version files 119.08 MB

Click names to download individual files

Abstract

Species discovery (inferring species limits de novo, without a priori hypotheses) from genetic data has become more common as molecular tools have expanded and has been a helpful initial step in tackling the taxonomic impediment for small insects. Often species discovery involves a single locus (e.g., mtCOI) but the accessibility of techniques for large sub genomic sequencing projects (1000s of loci) makes it possible to approach molecular species discovery with more robust datasets. Here, we test unsupervised machine learning (UML) methods for species discovery on a set of UCE loci for a large collection of parasitic wasps reared from North American oak galls, all initially thought to be in genus Eurytoma Illiger. UML methods produced species hypotheses that largely aligned with those that emerge from a commonly used mtCOI-based species partitioning method, and that also tended to match existing species descriptions. Results revealed a new genus-level association with oak galls (Phylloxeroxenus Ashmead) hidden among the Eurytoma, two distinct lineages of Eurytoma including a new lineage of Eurytoma more closely related to the South American genus Kavayva Zhang, Gates, & Silvestre, evidence for one or more cryptic Eurytoma species, and a mix of generalist and specialist host ranges. We make recommendations for how best to employ UML methods to similar datasets.