Skip to main content

Standardized nuclear markers improve and homogenize species delimitation in Metazoa

Cite this dataset

Dietz, Lars et al. (2022). Standardized nuclear markers improve and homogenize species delimitation in Metazoa [Dataset]. Dryad.


Species are the fundamental units of life and evolution. Their recognition is essential for science and society. Molecular methods have been increasingly employed for the identification of animal species, despite several challenges. 

Here, we explore with genomic data from nine animal lineages a set of nuclear markers, namely metazoan-level universal single-copy orthologs (metazoan USCOs), for their use in species delimitation. Our data sets include arthropods and vertebrates. We use various data assembly strategies and employ coalescent-based species inference as well as population admixture analyses and phenetic methods.

We demonstrate that metazoan USCOs well distinguish closely related morphospecies and consistently outperform classical mitochondrial DNA barcoding in discriminating closely related species in different animal taxa. USCOs overcome the general shortcomings of mitochondrial DNA barcodes, and due to standardization across Metazoa, also those of other approaches. They accurately assign samples not only to lower but also to higher taxonomic levels. 

Metazoan USCOs provide a powerful and unifying framework for DNA-based species delimitation and taxonomy in animals and their employment could result in a more efficient use of research data and resources.


Sequences were obtained from targeted enrichment of universal single-copy orthologous loci (USCOs) from nine different animal genera. Reads were ssembled and aligned with seven different approaches (see Supplementary Information of paper for details). In each dataset, alignment positions present in less than three individuals were excluded, additionally reduced datasets were generated that include only positions present in all individuals. Phylogenetic trees were created with IQ-TREE both for concatenated alignments and for individual genes, and coalescent-based analyses were done with ASTRAL based on the individual gene trees. Species delimitation analyses were conducted with BPP based on the ASTRAL trees. SNPs were extracted from the alignments with snp-sites. CO1 sequences were obtained by Sanger sequencing (arthropods) or extracted from USCO reads (frogs) and phylogenetic trees based on them calculated with PhyML.

Usage notes

All alignments, including SNP datasets, are in FASTA format and can be opened with standard alignment viewers. In alignments for approaches A1 and A2, ambiguity codes (R, Y, W, S, M, K) stand for positions inferred to be heterozygous.

Phylogenetic trees are in NEWICK format, files including multiple named trees are in NEXUS format. Both can be opened in a standard phylogenetic tree viewer such as FigTree. Individual gene trees are unrooted, trees based on whole USCO datasets are rooted. 

SNP datasets recoded for NMDS analysis are tab-delimited text and numbers have the following meaning: 0: homozygous for more common allele, 1: heterozygous, 2: homozygous for less common allele. Unknown positions are represented by empty cells.


Deutsche Forschungsgemeinschaft, Award: AH175/3-1

Deutsche Forschungsgemeinschaft, Award: AH175/6-1

Deutsche Forschungsgemeinschaft, Award: AH175/6-2

Deutsche Forschungsgemeinschaft, Award: MI649/18-1

Deutsche Forschungsgemeinschaft, Award: MI649/18-1

Deutsche Forschungsgemeinschaft, Award: NI1387/6-1

Deutsche Forschungsgemeinschaft, Award: NI1387/6-1

Deutsche Forschungsgemeinschaft, Award: NI1387/7-1

National Science Foundation, Award: DEB-1256742