Data from: Copepod phylogenomics supports Canuelloida as a valid order separate from Harpacticoida
Data files
Feb 27, 2025 version files 55.68 MB
-
matrices.zip
38.94 MB
-
orthologs.zip
16.70 MB
-
README.md
3.38 KB
-
species_tree_files.zip
40.38 KB
Abstract
Copepods are small crustaceans that are ubiquitous in aquatic environments. They are particularly abundant in marine and freshwater plankton, marine sediments, and as parasites or commensals of other aquatic organisms. Despite their abundance and importance, phylogenetic relationships among copepods are poorly resolved. The validity of higher-level taxa, including several orders, has continued to be controversial throughout the 21st century. This study has two main goals: first, to use phylogenomic data to assess relationships among the four major copepod orders: Calanoida, Cyclopoida, Harpacticoida, and Siphonostomatoida, which together include more than 98% of copepod species diversity, and second, to test the validity of the recently proposed order Canuelloida. Towards these goals, we sampled 28 copepod transcriptomes and genomes spanning 20 families and 5 orders, including the first transcriptome of a representative of Canuelloida. We identified 2,527 single-copy protein-coding genes comprising 939,460 amino acid (aa) positions and 530,269 informative sites. All phylogenetic analyses support a monophyletic Podoplea (i.e., the superorder comprising all copepod orders except for Calanoida and Platycopioida) with Calanoida as its sister taxon. We find robust support across all methods for Canuelloida as a distinct order separate from the traditionally recognized Harpacticoida (Oligoarthra). Contrary to several recent studies of smaller sets of nuclear genes or mitochondrial genomes, we recover Cyclopoida and Harpacticoida as sister taxa and find that gene tree discordance analysis rejects the alternative topologies. Transcriptomic data are promising for resolving the backbone of the copepod phylogeny but collecting and sequencing the nearly 15,000 species of copepods, many of which are infrequently encountered and less than 1 mm in size, remains a major hurdle.
This dataset is associated with Bernot et al. (2025) "Copepod phylogenomics supports Canuelloida as a valid order separate from Harpacticoida". This is the associated phylogenetic data comprising: concatenated alignments, individual ortholog alignments, and resulting species tree files.
Description of the data and file structure
Overview: the dataset includes the following zip files:
- matrices # super matrices used in this study, based on different occupancies described in Bernot et al. (2025)
- orthologs # individual ortholog alignments before and after trimming with gblocks
- species_tree_files # newick species tree files resulting from phylogenetic analysis of 50% matrix under different models described in Bernot et al. (2025)
Detailed Table of Contents
- matrices # concatenated matrices based on different taxon occupancy levels described in Bernot et al. (2025)
- supermatrix_100aa_mintaxa15.phy # phylip file of 50% taxon matrix (15/30 taxa)
- supermatrix_100aa_mintaxa15_reformatted.nex # nexus file of 50% taxon matrix (15/30 taxa)
- supermatrix_100aa_mintaxa22.phy # phylip file of 73% taxon matrix (22/30 taxa)
- supermatrix_100aa_mintaxa22_reformated.nex # nexus file of 73% taxon matrix (22/30 taxa)
- supermatrix_100aa_mintaxa24.phy # phylip file of 80% taxon matrix (24/30 taxa)
- supermatrix_100aa_mintaxa24_reformated.nex # nexus file of 80% taxon matrix (24/30 taxa)
- orthologs # orthologs.zip comprises 2 folders of the individual alignments for the 2,686 orthologs used in this study. File naming convention follows https://bitbucket.org/yangya/phylogenomic_dataset_construction/src/master/
- raw_fastas # mafft aligned fasta files for the 2,686 orthologs in the 50% matrix.
- gblocks_alignments # mafft alignments after trimming with gblocks for 2,686 orthologs in the 50% matrix. Each file assigned the same unique identifiers as above. "ngaps" file ending appended after alignments were checked to remove any taxon that consisted only of gaps following trimming with gblocks, which can occasionally occur.
- species_tree_files # tree files resulting from analyses of 50% taxon matrix under different models described below.
- C20_mintax15_goodnames.treefile # tree file from IQTREE2 C20+LG+F+G analysis with 100 BS reps of 50% matrix
- iqtree_auto_UFBoot_mintax15_goodnames.iqtree # tree file from IQTREE2 partitioned analysis of 50% matrix as described in manuscript
- iqtree_auto_UFBoot_mintax24_goodnames.iqtree # tree file from IQTREE2 partitioned analysis of 73% matrix as described in manuscript
- iqtree_auto_UFBoot_mintax22_goodnames.iqtree # tree file from IQTREE2 partitioned analysis of 80% matrix as described in manuscript
- ASTRAL_mintax22_raw_goodnames.tre # tree file from ASTRAL analysis of 50% taxon matrix using raw gene trees
- ASTRAL_mintax22_BS30_goodnames.tre # tree file from ASTRAL analysis of 50% taxon matrix using gene trees with <30% UFBS support collapsed
- ASTRAL_mintax22_BS20_goodnames.tre # tree file from ASTRAL analysis of 50% taxon matrix using gene trees with <20% UFBS support collapsed
