UCE phylogenomics improves the classification of the cosmopolitan pit-building antlion tribe Myrmeleontini (Neuroptera: Myrmeleontidae: Myrmeleontinae)
Data files
Feb 26, 2026 version files 16.31 GB
-
Myrmeleontini_upload.zip
16.31 GB
-
README.md
19.36 KB
Abstract
This dataset provides raw genome assemblies, a lineage-specific ultraconserved element (UCE) probe set, and derived phylogenomic data for the antlion tribe Myrmeleontini (Neuroptera: Myrmeleontidae). The files include scaffold-level genome assemblies used for probe design, the UCE probe set Myrmeleontidae_v1 (156,272 probes targeting 8,537 loci), UCE loci filtered to 90% taxon occupancy (uceMYR90; 3,027 loci), a concatenated supermatrix alignment, and inputs/outputs for concatenated maximum-likelihood and coalescent-based (ASTRAL) phylogenetic inference, including partitioning and support-assessment results. Support and conflict are summarized using IQ-TREE concordance factors (gCF and sCF) and Quartet Sampling metrics (QC, QD, QI).
Dataset DOI: 10.5061/dryad.sn02v6xkc
Description of the data and file structure
README – Phylogenomic dataset for Myrmeleontini (Neuroptera: Myrmeleontidae)
1. Overview
This Dryad submission provides data used for phylogenomic analyses of Myrmeleontini based on ultraconserved elements (UCEs). The dataset includes (all in Myrmeleontini_upload.zip):
(1) scaffold-level genome assemblies used for probe design (“original scaffold”)
(2) a lineage-specific UCE probe set (“Myrmeleontidae_v1.fasta”)
(3) UCE loci filtered for 90% completeness (“90 UCEs data”; uceMYR90)
(4) a concatenated UCE supermatrix (“FcC_supermatrix.fas”)
(5) files for concatenated maximum-likelihood (ML) inference and partitioning (“ML”)
(6) original ASTRAL inputs/outputs (“astral”)
(7) support and conflict assessment results for ML and ASTRAL trees (“16-1-concordane_factors”, “17-quetetsampling”, “16-2-as-concordane_factors”, “17-as-quetetsampling”)
All files are provided as plain text or standard phylogenetic formats (FASTA, Newick/Nexus, and program output text files).
2. Directory and file inventory
2.1 original scaffold/
Contents: de novo assembled genomes (scaffold-level assemblies) used to design the UCE probe set.
File type: FASTA (.fa/.fasta; may be gzip-compressed).
Each file in the “original scaffold” folder is a raw genome assembly, and the file name corresponds to the voucher/specimen ID (the associated species information for each voucher is provided in Table S1 of the manuscript).
2.2 Myrmeleontidae_v1.fasta
Contents: Lineage-specific UCE probe sequences used for locus extraction.
File type: FASTA.
Key values: 156,272 probes targeting 8,537 loci.
2.3 90 UCEs data/
Contents: UCE loci filtered to 90% taxon occupancy (uceMYR90; 3,027 loci), including locus alignments used for downstream analyses.
Typical file types: aligned loci in FASTA/PHYLIP (depending on export), and/or per-locus alignment files.
File naming convention (90% UCE loci)
Files in the folder “90 UCEs data/” are named automatically by the pipeline and follow this pattern:
uce-.treeshrink.
where:
- is an internal, sequential UCE locus identifier assigned by the extraction/processing scripts. It is used only to uniquely label and track loci across files and does not encode biological meaning.
- “treeshrink” indicates that the locus alignment was processed with TreeShrink to identify and remove outlier long-branch sequences.
- indicates the file format (e.g., “fas” = FASTA alignment).
Example:
- uce-340.treeshrink.fas
FASTA alignment for UCE locus 340 after TreeShrink filtering.
Notes: Alignments were produced with MAFFT and trimmed with ClipKit; potential alignment errors were filtered using TAPER; outlier sequences were filtered using TreeShrink prior to matrix generation.
2.4 ML/
Contents:
Files for the concatenated maximum-likelihood (ML) analysis based on the 90% completeness UCE supermatrix.
ML/partition/
Folder containing files used for partitioning the concatenated alignment.
- FcC_supermatri.csv: Partition table (see file for details) used to define/record partition blocks for the concatenated matrix.
- FcC_supermatri_entropy_partition_finder.cfg: Configuration file for entropy-based partitioning of UCE loci (used to generate/guide the partition scheme).
ML/FcC_supermatrix.fas
Concatenated alignment (supermatrix) of UCE loci filtered to 90% taxon occupancy.
How to reuse: This file can be used directly in concatenated phylogenetic inference programs (e.g., IQ-TREE), together with the corresponding partition file(s) provided under the “ML” folder.
ML/ML.tre
Maximum-likelihood phylogenetic tree inferred from the concatenated supermatrix under the partitioned model.
2.6 astral/
Contents:
1) all.gene.tre
Plain-text Newick gene trees used as the ASTRAL input. This file contains 3,027 gene trees (one tree per UCE locus; one tree per line). Internal node labels are bootstrap-like support values (0–100) and branch lengths are included.
2) astral.tre
The ASTRAL species tree output in Newick format. This is the direct ASTRAL output tree with branch lengths and node support values (as written in the file; values are in the 0–1 range).
3) species_tree.tre
The same ASTRAL species tree as in astral.tre, written with an alternative rooting/ordering for readability (Apexz is placed as the outgroup in this representation). Branch lengths and the same node support values are retained.
4) log.txt
Console log of the ASTRAL run, including the program/version information, input summary (e.g., number of gene trees and species), threading, and the final inferred tree/score reported by ASTRAL.
2.7 16-1-concordane_factors/ (ML support assessment)
Contents: IQ-TREE concordance factor results computed on the concatenated-tree framework.
Typical outputs include tables reporting gCF and sCF values per internal branch.
gCF (gene concordance factor) = percentage of gene trees supporting a given internal branch of the reference tree.
sCF (site concordance factor) = percentage of alignment sites supporting a given internal branch of the reference tree.
Definitions:
- gCF (gene concordance factor): percentage of gene trees supporting a given internal branch.
Concordance factors (gCF/sCF) – ML-tree based (IQ-TREE)
FcC_supermatrix_partition_myr.contree
The partitioned maximum-likelihood (ML) tree inferred from the concatenated UCE supermatrix. This tree was used as the reference tree for concordance factor calculations.
rerooted.tre
The same ML reference tree after re-rooting using the designated outgroup taxa, provided for visualization and consistency in downstream interpretation.
all.gene.tre
Input gene trees used for concordance factor (gCF) calculations. This file contains the set of per-locus ML gene trees corresponding to the UCE loci retained in the 90% occupancy dataset (one tree per locus; Newick format).
concord90.log
IQ-TREE log file for the concordance factor run, recording the command line, settings, and run summary.
concord90.cf.tre
The reference tree annotated with concordance factor values produced by IQ-TREE (Newick format with node/branch annotations).
concord90.cf.tre.nex
NEXUS-format version of the concordance-factor annotated tree (same information as concord90.cf.tre), provided for compatibility with software that prefers NEXUS.
concord90.cf.branch
Tab-delimited branch-by-branch concordance factor output from IQ-TREE, listing gCF/sCF and related branch information.
concord90.cf.stat
Final summary table of concordance factors (gCF and sCF) for the ML reference tree. This is the definitive results file for downstream use and interpretation.
- sCF (site concordance factor): percentage of alignment sites supporting a given internal branch.
alignment/
Folder containing the filtered and aligned UCE locus sequences used for concordance factor calculations (one alignment per locus; generated from the 90% occupancy UCE dataset after filtering and alignment).
rerooted.tre
The reference maximum-likelihood tree after re-rooting using the designated outgroup taxa, used for visualization and consistent interpretation.
concord.best_scheme
Best-fit partitioning scheme selected/used by IQ-TREE for the concordance factor analysis (plain-text summary).
concord.best_scheme.nex
NEXUS-format representation of the best-fit partitioning scheme (same information as concord.best_scheme), provided for compatibility.
concord.model.gz
Model/partition summary file produced by IQ-TREE (plain-text).
concord.best_model.nex
NEXUS-format version of the model/partition summary (same information as concord.best_model).
concord.model.gz
Compressed file containing model/partition-related outputs generated by IQ-TREE (gzip-compressed).
concord.ckp.gz
Checkpoint file produced by IQ-TREE, which can be used to resume the run if needed.
concord.treefile
Reference tree file produced/used by IQ-TREE for this run (Newick format).
concord.cf.tre
Reference tree annotated with concordance factor values produced by IQ-TREE (Newick format with node/branch annotations).
concord.cf.tre.nex
NEXUS-format version of concord.cf.tre (same concordance-factor annotations), provided for compatibility.
concord.cf.branch
Tab-delimited branch-by-branch concordance factor output from IQ-TREE, listing sCF (and associated statistics) for each branch.
concord.cf.stat
Final summary table of site concordance factor results for the reference tree. This is the definitive results file for downstream use and interpretation.
concord.iqtree
IQ-TREE report file summarizing run settings, chosen models/partitions, and key results.
concord.ckp.gz
Compressed checkpoint file produced by IQ-TREE (gzip-compressed), enabling the run to be resumed if needed.
concord.log
Plain-text log file for the concordance factor run, recording the command line and run progress.
2.8 16-2-as-concordane_factors/ (ASTRAL support assessment)
Contents: Concordance-factor summaries mapped to the ASTRAL species-tree framework.
Definitions: similar as Section 2.7 (gCF and sCF).
- gCF (gene concordance factor): percentage of gene trees supporting a given internal branch.
Concordance factors (gCF/sCF) – ASTRAL species-tree based
species_tree.tre
The ASTRAL-inferred species tree (Newick format) used as the reference tree for concordance factor calculations in the ASTRAL framework.
asrerooted.tre
The ASTRAL species tree after re-rooting using the designated outgroup taxa, provided for visualization and consistent interpretation.
all.gene.tre
Input gene trees used for concordance factor calculations under the ASTRAL framework (one gene tree per UCE locus; Newick format). These are the per-locus trees summarized by ASTRAL to infer the species tree.
concord90.log
Log file for the concordance factor run, recording the command line, settings, and run summary.
concord90.cf.tree
The ASTRAL reference tree annotated with concordance factor values (Newick format with node/branch annotations).
concord90.cf.tree.nex
NEXUS-format version of concord90.cf.tree (same concordance-factor annotations), provided for compatibility.
concord90.cf.branch
Tab-delimited branch-by-branch concordance factor output, listing concordance factor values and related statistics for each branch of the ASTRAL reference tree.
concord90.cf.stat
Final summary table of concordance factor results for the ASTRAL reference tree. This is the definitive results file for downstream use and interpretation.
- sCF (site concordance factor): percentage of alignment sites supporting a given internal branch.
alignment/
Folder containing the filtered and aligned UCE locus sequences used for site concordance factor calculations (one alignment per locus; derived from the 90% occupancy UCE dataset).
asrerooted.tre
The ASTRAL species tree after re-rooting using the designated outgroup taxa, used as the reference tree for this sCF analysis and for visualization.
concord.best_scheme
Best-fit partitioning scheme selected/used by IQ-TREE for this concordance factor run (plain-text summary).
concord.best_scheme.nex
NEXUS-format version of the best-fit partitioning scheme (same information as concord.best_scheme), provided for compatibility.
concord.model.gz
Model/partition summary file produced by IQ-TREE (plain-text).
concord.ckp
Checkpoint file produced by IQ-TREE, which can be used to resume the run if needed.
concord.best_model.nex
NEXUS-format version of the model/partition summary (same information as concord.best_model).
concord.model.gz
Compressed file containing model/partition-related outputs generated by IQ-TREE (gzip-compressed).
concord.treefile
Reference tree file used by IQ-TREE for the sCF computation (Newick format).
concord.cf.tree
ASTRAL reference tree annotated with site concordance factor values (Newick format with node/branch annotations).
concord.cf.tree.nex
NEXUS-format version of concord.cf.tree (same sCF annotations), provided for compatibility.
concord.cf.branch
Tab-delimited branch-by-branch site concordance factor output, reporting sCF (and associated statistics) for each branch of the reference tree.
concord.cf.stat
Final summary table of site concordance factor results for the ASTRAL reference tree. This is the definitive results file for downstream use and interpretation.
concord.iqtree
IQ-TREE report file summarizing run settings, chosen models/partitions, and key results.
concord.ckp.gz
Compressed checkpoint file produced by IQ-TREE (gzip-compressed), enabling the run to be resumed if needed.
concord.log
Plain-text log file for the concordance factor run, recording the command line and run progress.
2.9 17-quetetsampling/ (ML support assessment)
Notes:
- QC (quartet concordance): values near 1 indicate strong concordance; values near -1 indicate strong discordance.
- QD (quartet differential): values near 1 indicate discordant alternatives are similarly frequent; values near 0 indicate one discordant alternative dominates.
- QI (quartet informativeness): proportion of informative replicates; 1 means all replicates informative, 0 means none.
Contents: Quartet Sampling outputs mapped to the ASTRAL species tree.
Quartet Sampling (QC/QD/QI) – input and output files
FcC_supermatrix_partition_myr.txt
Partition file used in the phylogenetic analysis of the concatenated UCE supermatrix. This file defines the partition blocks (and associated models/settings, if specified) used for tree inference.
rerooted.tre
The phylogenetic tree used as the reference for Quartet Sampling, re-rooted using the designated outgroup taxa for consistent interpretation.
RESULT.node.counts.csv
Quartet Sampling output table reporting, for each internal node, the counts/frequencies of concordant and discordant quartet topologies across replicates.
RESULT.node.scores.csv
Final Quartet Sampling results table reporting node-level scores. This is the definitive results file to use for downstream interpretation (including QC, QD, and QI values).
RESULT.labeled.figtree
Tree file formatted for visualization in FigTree, with Quartet Sampling results mapped/annotated on the reference tree.
RESULT.labeled.tre
Newick tree annotated with Quartet Sampling results (node labels/metadata), suitable for downstream plotting or inspection.
RESULT.labeled.tre.freq
Supplementary output containing quartet topology frequencies associated with the annotated tree.
RESULT.labeled.tre.qc
Supplementary output containing QC (quartet concordance) values mapped to the annotated tree.
RESULT.labeled.tre.qd
Supplementary output containing QD (quartet differential) values mapped to the annotated tree.
RESULT.labeled.tre.qi
Supplementary output containing QI (quartet informativeness) values mapped to the annotated tree.
RESULT.run.stats
Run statistics and summary information produced during Quartet Sampling (e.g., number of replicates and run diagnostics).
2.10 17-as-quetetsampling/ (ASTRAL support assessment)
Contents: Quartet Sampling outputs computed on the ML/concatenated tree.
FcC_supermatrix_partition_myr.txt
Partition file used in the phylogenetic analysis of the concatenated UCE supermatrix. This file defines the partition blocks (and associated models/settings, if specified) used for tree inference.
asrerooted.tre
The ASTRAL species tree used as the reference for Quartet Sampling, re-rooted using the designated outgroup taxa for consistent interpretation.
RESULT.node.counts.csv
Quartet Sampling output table reporting, for each internal node, the counts/frequencies of concordant and discordant quartet topologies across replicates.
RESULT.node.scores.csv
Final Quartet Sampling results table reporting node-level scores. This is the definitive results file to use for downstream interpretation (including QC, QD, and QI values).
RESULT.labeled.figtree
Tree file formatted for visualization in FigTree, with Quartet Sampling results mapped/annotated on the reference tree.
RESULT.labeled.tre
Newick tree annotated with Quartet Sampling results (node labels/metadata), suitable for downstream plotting or inspection.
RESULT.labeled.tre.freq
Supplementary output containing quartet topology frequencies associated with the annotated tree.
RESULT.labeled.tre.qc
Supplementary output containing QC (quartet concordance) values mapped to the annotated tree.
RESULT.labeled.tre.qd
Supplementary output containing QD (quartet differential) values mapped to the annotated tree.
RESULT.labeled.tre.qi
Supplementary output containing QI (quartet informativeness) values mapped to the annotated tree.
RESULT.run.stats
Run statistics and summary information produced during Quartet Sampling (e.g., number of replicates and run diagnostics).
3. Recommended free/open software to view and reuse the data
General file viewing:
- Any text editor (e.g., VS Code, Notepad++, TextEdit)
- gzip/gunzip or 7-Zip (for .gz compressed files)
Sequence and alignment viewing:
- AliView, MEGA, or UGENE (free tools for FASTA/alignment viewing)
Phylogenetic inference and tree viewing:
- IQ-TREE 2 (concatenated ML inference; reading partition files and producing output trees)
- ASTRAL (species-tree inference from gene trees)
- FigTree or iTOL (tree visualization; iTOL is web-based)
Alignment utilities (if reprocessing is desired):
- MAFFT (alignment)
- ClipKit (trimming)
- TAPER (alignment error filtering)
- TreeShrink (outlier long-branch filtering)
4. Notes on data reuse
Users can reproduce the main phylogenomic analyses by:
(1) using “FcC_supermatrix.fas” with the partition file(s) in “ML/” to rerun concatenated ML analyses (IQ-TREE)
(2) using the per-locus gene trees in “astral/” to rerun ASTRAL and compare species-tree estimates
(3) using concordance-factor and Quartet Sampling folders to quantify support and discordance across alternative trees
5. Legal and ethical considerations
This dataset contains no human subject data. Specimens represent insects (Myrmeleontidae). Information on specimen provenance, sampling permissions, museum loans, and any applicable legal considerations (e.g., collecting permits and access and benefit-sharing where relevant) is documented in the associated manuscript and supporting materials.
6. Contact
For questions about the dataset, please contact:
Name: Yuchen Zheng
Email: zhengyuchenantlion@gmail.com
Affiliation: Institute of Zoology, Chinese Academy of Sciences
Code/software
Please see the material and methods
Access information
Other publicly accessible locations of the data:
- NCBI (Bioproject: PRJNA1403738) (uploading)
Data was derived from the following sources:
- the research of "UCE phylogenomics improves the classification of the cosmopolitan pit-building antlion tribe Myrmeleontini (Neuroptera: Myrmeleontidae: Myrmeleontinae)" (proofing)
