Data from: Phylogenomics of prominent moths (Lepidoptera: Notodontidae): A subfamily-level reclassification
Data files
Jun 17, 2025 version files 51.01 MB
-
datasets.zip
51 MB
-
README.md
7.74 KB
Abstract
We present the first taxonomically comprehensive phylogenomic dataset for Notodontidae to stabilize subfamily classification. Conflicting classifications of Notodontidae have confounded the understanding of this diverse group (over 4,700 described species), partly because there has been limited taxonomic and geographic sampling, and morphological homoplasy is widespread. We note less rampant homoplasy in larvae and highlight larval characters wherever possible in addressing subfamily diagnostics. This study incorporates as many as 854 anchored hybrid enrichment loci from 150 species. Our dataset is the first to include taxa from all continents where Notodontidae occur, as well as type genera and (when possible) type species of all previously recognized subfamilies. Our genomic dataset is analyzed using maximum likelihood, multispecies coalescent, and parsimony phylogenetic methods. We recognize 21 subfamilies, four of which are given new names based on phylogenomic analyses corroborated by morphological diagnoses: Chadisrinae St Laurent and Schintlmeister, subfam. nov., Peratodontinae St Laurent and Goldstein, subfam. nov., Teleclitinae St Laurent and Goldstein, subfam. nov., and Shachiinae St Laurent and Goldstein, subfam. nov.; an additional two subfamilies are elevated from tribes of Heterocampinae: Lusurinae Thiaucourt, stat. nov. and Hapigiinae Franclemont and Miller, stat. nov.. Heterocampinae Neumoegen and Dyar is recognized to include two cosmopolitan tribes: Neodrymoniaini Kobayashi stat. rev. and Heterocampini Neumoegen and Dyar. The following family-group names are synonymized: Ptilodontinae Grote and Robinson, syn. rev. and Ptilophorinae Matsumura, syn. rev. With Notodontinae Stephens and Rosemini Forbes, syn. nov. with Hemiceratinae Guenée. The largely diurnal group Dioptinae Walker, syn. nov. is nested within Nystaleinae Forbes. The rarely used, and often overlooked, name and authorship Anaphinae Sharpe, 1890, are maintained. Our results also support the following generic changes: Eutrotonotus Gaede, syn. nov. (a synonym of Clostera Samouelle), Erconholda Kiriakoff, stat. rev. (formerly subgenus of Phalera Hübner), and Antheua atrata (Grünberg), comb. nov. (formerly placed in Phalera). Two further genera are newly synonymized with Scevesia: Narriocampa Thiaucourt, syn. nov. and Haxairella Thiaucourt, syn. nov.. We provide a checklist of 659 valid genera for all Notodontidae, with subfamily-level assignments.
freely available from: https://smithsonian.figshare.com/articles/book/Phylogenomics_of_Prominent_Moths_Lepidoptera_Notodontidae_A_Subfamily-Level_Reclassification/28912337?file=54125117
Description of the data and file structure
This repository is associated with St Laurent et al. 2025, Smithsonian Contributions to Zoology, Phylogenomics of Prominent Moths (Lepidoptera: Notodontidae): a subfamily-level reclassification.
In the study, we gathered genomic data from published sources and samples newly sequenced for target capture, anchored hybrid enrichment, using the LEP1 probe set. All raw reads from this and associated studies are available on NCBI SRA: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1153979.
In this Dryad repository, we have provided all downstream molecular data after processing and phylogenetic analyses according to recent revisions to a pipeline first published by Breinholt et al. 2018: https://doi.org/10.1093/sysbio/syx048 and further developed for Noctuoidea systematics in St Laurent et al. 2023a: https://doi.org/10.1093/isd/ixad004 and St Laurent et al. 2023b: https://doi.org/10.1111/syen.12614
Directory "datasets"
Description: This directory contains machine-readable outputs from all analyses
Subdirectories and files within "datasets"
datasets/658_loci
Description: All 658 AHE (Anchored Hybrid Enrichment) locus nucleotide alignments are provided in FASTA format, with each alignment generated using MAFFT version 7.407. These alignments correspond to the loci recovered from at least 60% of the taxa sampled in the study, and collectively comprise the dataset referred to as "N_60p" in the manuscript. The alignment process involved aligning each locus individually with MAFFT using the --adjustdirectionaccurately option. This option ensures that sequence orientation is corrected, maximizing compatibility between AHE data derived from target capture methods and those extracted from full genomes or transcriptomes obtained from previously published datasets.
Following alignment, consensus sequences were constructed using FAS-conCAT-G version 1.02, which was also employed for concatenating the individual locus alignments into a single supermatrix. This standardized and consistent approach across all alignments supports reproducibility and facilitates downstream phylogenomic analyses.
datasets/ASTRAL_Tree.tre
Description The final species tree was inferred using ASTRAL-III and is provided in Newick format. Specifically, ASTRAL-III version 5.7.8 was used to infer the species tree under the multi-species coalescent (MSC) model. The input for ASTRAL consisted of unpartitioned gene trees, each independently estimated using IQ-TREE, a maximum likelihood-based phylogenetic inference software. The resulting ASTRAL tree represents the consensus topology that best reconciles the set of gene trees under the MSC framework.
Branch support values in the ASTRAL output are presented as local posterior probabilities, a statistical measure that reflects the confidence in each bipartition of the species tree given the gene trees. Values greater than 0.95 are interpreted as strong support for the corresponding node, indicating a high level of confidence in the inferred relationships. This high-resolution tree serves as a robust framework for interpreting evolutionary relationships among the taxa sampled in this study.
datasets/ASTRAL_treefiles
Description: All input gene trees used for ASTRAL species tree inference were individually reconstructed using IQ-TREE. For each gene, maximum likelihood phylogenetic inference was performed using unpartitioned alignments, meaning that the entire alignment for each locus was treated as a single evolutionary unit without subdividing it into separate partitions. This approach maintains consistency across gene tree estimations and aligns with standard practices for input preparation in species tree reconstruction under the multi-species coalescent model. The resulting set of gene trees served as the input for ASTRAL-III to infer the final species tree.
datasets/IQTREE
Description: The best of 100 IQTREE results (according to log likelihood) for each dataset type (AA or nucleotide, partitioned by locus or codon position, and loci selected by genesortR), as well as their input supermatrices and partitioning scheme with models selected by Model Finder, are provided in this directory. For each analysis, 1,000 ultrafast bootstrap (UFBoot) and 1,000 Shimodaira-Hasegawa approximate likelihood ratio test (SH-aLRT) replicates were calculated for support. The -bnni command was used in all cases to alleviate branch support overestimation. Before the 100 IQ-TREE runs, -m TESTNEWMERGEONLY was used to call ModelFinder to identify the best partitioning scheme and model of nucleotide evolution for both those datasets a priori partitioned by locus or by codon position, the selected models and partitioning scheme was used for each of the 100 replicates for each dataset, models and partitions are provided in NEXUS format in this directory.
Zenodo hosted supplemental files.
Description: PDF versions of tree files (.tre) available in the Dryad package as the supplemental tables referenced in the published article are hosted on Zenodo. Descriptions of individual files are below.
file: AA_658.tre.pdf
Description: The best IQTREE from the 60% recovery amino acid dataset partitioned by locus
file: AA_genesortR.tre.pdf
Description: The best IQTREE from the genesortR selected amino acid dataset partitioned by locus
file: ASTRAL_Tree.tree.pdf
Description of the result of the ASTRAL analysis
file: codon_658.tre.pdf
Description of the best IQTREE from the 60% recovery nucleotide dataset partitioned by codon
file: codon_all.tre.pdf
Description of the best IQTREE from the nucleotide dataset where all 854 loci were used, partitioned by codon
file: genesortR_locus.tre.pdf
Description of the best IQTREE from the genesortR selected nucleotide dataset partitioned by locus
file: iqtree_all_AA.tre.pdf
Description of the best IQTREE from the amino acid dataset where all 854 loci were used, partitioned by locus
file: iqtree_all_N_locus.tre.pdf
Description of the best IQTREE from the nucleotide dataset where all 854 loci were used, partitioned by locus
file: N_658.tre.pdf
Description of the best IQTREE from the 60% recovery nucleotide dataset partitioned by locus
file: Table_S1.xlsx
Description: A table listing all samples used in the study and their sources, including up-to-date identifications with the classification scheme presented in this article
file: Table_S2.xlsx
Description: A table format of the morphological matrix used (in part) to establish diagnoses. This table is meant to be a general overview of how morphology was examined and recorded as we built diagnoses, not for formal morphological phylogenetic analyses, which we did not perform
file: Table_S3.xlsx
Description: A table form of the checklist of genera of Notodontidae is presented at the end of the article to permit the development of machine-readable updates to Notodontidae classification
Sequence data is derived from published full genomes and previously published anchored hybrid enrichment (AHE) as well as recently sequenced specimens for AHE.
All downstream data processing follows the modified AHE pipeline originally developed by Breinholt et al. (2018) https://academic.oup.com/sysbio/article/67/1/78/3796843
Raw reads are found at NCBI SRA PRJNA1153979
- St Laurent, Ryan A; Goldstein, Paul Z; Miller, James S et al. (2023). Phylogenetic systematics, diversification, and biogeography of Cerurinae (Lepidoptera: Notodontidae) and a description of a new genus. Insect Systematics and Diversity. https://doi.org/10.1093/isd/ixad004
- St Laurent, Ryan A.; Goldstein, Paul Z.; Prada-Lara, Liliana et al. (2025). Phylogenomics of Prominent Moths (Lepidoptera: Notodontidae): A Subfamily-Level Reclassification. Smithsonian Contributions to Zoology. Smithsonian Institution Scholarly Press. https://doi.org/10.5479/si.28912337
