Skip to main content
Dryad

Data from: Genomic data provide insights into the classification of extant termites

Cite this dataset

Hellemans, Simon; Wang, Menglin; Kaymak, Esra; Bourguignon, Thomas (2024). Data from: Genomic data provide insights into the classification of extant termites [Dataset]. Dryad. https://doi.org/10.5061/dryad.02v6wwqbm

Abstract

The higher classification of termites requires substantial revision as the Neoisoptera, the most diverse termite lineage, comprise many paraphyletic and polyphyletic higher taxa. Here, we produced an updated termite classification using genomic-scale analyses. We reconstructed phylogenies under diverse substitution models with ultraconserved elements analyzed as concatenated matrices or within the multi-species coalescence framework. Our classification is further supported by analyses controlling for rogue loci and taxa, and topological tests. We show that the Neoisoptera are composed of seven family-level monophyletic lineages, including the Heterotermitidae Froggatt, Psammotermitidae Holmgren, and Termitogetonidae Holmgren, raised from subfamilial rank. The species-rich Termitidae are composed of 18 subfamily-level monophyletic lineages, including the new subfamilies Crepititermitinae, Cylindrotermitinae, Forficulitermitinae, Neocapritermitinae, Protohamitermitinae, and Promirotermitinae, and the revived Amitermitinae Kemner, Microcerotermitinae Holmgren, and Mirocapritermitinae Kemner. Building an updated taxonomic classification on the foundation of unambiguously supported monophyletic lineages makes it highly resilient to potential destabilization caused by the future availability of novel phylogenetic markers and methods. The taxonomic stability is further guaranteed by the modularity of the new termite classification, designed to accommodate as-yet undescribed species with uncertain affinities to the herein delimited monophyletic lineages in the form of new families or subfamilies.

README: Supplementary Data from: Genomic data provide insights into the classification of extant termites

Authors: Simon Hellemans, Mauricio M. Rocha, Menglin Wang, Johanna Romero Arias, Duur K. Aanen, Anne-Geneviève Bagnères, Aleš Buček, Tiago F. Carrijo, Thomas Chouvenc, Carolina Cuezzo, Joice Paulo Constantini, Reginaldo Constantino, Franck Dedeine, Jean Deligne, Paul Eggleton, Theodore A. Evans, Robert Hanus, Mark C. Harrison, Myriam Harry, Guy Josens, Corentin Jouault, Chicknayakanahalli M. Kalleshwaraswamy, Esra Kaymak, Judith Korb, Chow-Yang Lee, Frédéric Legendre, Hou-Feng Li, Nathan Lo, Tomer Lu, Kenji Matsuura, Kiyoto Maekawa, Dino McMahon, Nobuaki Mizumoto, Danilo E. Oliveira, Michael Poulsen, David Sillam-Dussès, Nan-Yao Su, Gaku Tokuda, Edward Vargo, Jessica L. Ware, Jan Šobotník , Rudolf H. Scheffrahn, Eliana Cancello, Yves Roisin, Michael S. Engel, and Thomas Bourguignon

File 1: "ids_to_species.txt"

Specimens and corresponding identification codes in the database (TER-X-UCEDB).

File 2: "TER_UCE_DB_CONTRIB_4.fasta.gz"

New UCE dataset presented in this study.
Extracted UCEs from all samples using the bait set produced by Hellemans et al. (2022; doi: 10.1016/j.ympev.2022.107520).
This is contribution #4 for the Termite UCE Database.
Each sample was assigned a unique identification code (TER-X-UCEDB).
The database is maintained at: https://github.com/sihellem/TER-UCE-DB/.

File 3: "supermatrices.tar"

Supermatrices used in this study.
Refer to the Supplementary Data 4 published alongside the main text for relevant metrics.
The tar archive contains the following supermatrices and partitions:
1_internal_unfiltered.charsets
1_internal_unfiltered.nexus
2_internal_cogenic.charsets
2_internal_cogenic.nexus
3_internal_intergenic.charsets
3_internal_intergenic.nexus
4_edge_no_stop_nuc.charsets
4_edge_no_stop_nuc.nexus
4_edge_no_stop_nuc_no3rd.charsets
5_edge_two_stops_nuc.charsets
5_edge_two_stops_nuc.nexus
5_edge_two_stops_nuc_no3rd.charsets
6_edge_no_stop_codon.charsets
6_edge_no_stop_codon.nexus
7_edge_two_stops_codon.charsets
7_edge_two_stops_codon.nexus
8_edge_no_stop_prot.charsets
8_edge_no_stop_prot.fasta
9_edge_two_stops_prot.charsets
9_edge_two_stops_prot.fasta

File 4: "trees.tar"

Phylogenetic trees reconstructed in this study.
Prefix of treefiles is as follows: tree#software_analysis#...
Refer to the Supplementary Data 5 published alongside the main text for more information on analysis#.
Trees 1-30 were reconstructed with IQ-TREE (30 trees), while trees 31-72 were reconstructed with ASTRAL-III (21 trees: #31-51), or with TreeShrink followed by ASTRAL-III (21 trees: #52-72).
The tar archive contains the following trees:

# Trees from IQ-TREE
01_iqtree_01_DNA_70_unfilt_GTR_aln.nexus.treefile.tree
02_iqtree_02_DNA_70_unfilt_TESTMERGE_aln.nexus.treefile.tree
03_iqtree_03_DNA_70_cogenic_GTR_aln.nexus.treefile.tree
04_iqtree_04_DNA_70_cogenic_TESTMERGE_aln.nexus.treefile.tree
05_iqtree_05_DNA_70_intergenic_GTR_aln.nexus.treefile.tree
06_iqtree_06_DNA_70_intergenic_TESTMERGE_aln.nexus.treefile.tree
07_iqtree_07_DNA_70_no-stop-pal_NUCLEO_no_3rd_GTR+I+G_bb_aln.nexus.treefile.tree
08_iqtree_08_DNA_70_no-stop-pal_NUCLEO_no_3rd_TESTMERGE_rcluster_aln.nexus.treefile.tree
09_iqtree_09_DNA_70_max-two-stops-pal_NUCLEO_no_3rd_GTR+I+G_bb_aln.nexus.treefile.tree
10_iqtree_10_DNA_70_max-two-stops-pal_NUCLEO_no_3rd_TESTMERGE_rcluster_aln.nexus.treefile.tree
11_iqtree_11_COD_70_no-stop-pal_NUCLEO_all_partitions_GTR+I+G_bb_aln.nexus.treefile.tree
12_iqtree_12_COD_70_no-stop-pal_NUCLEO_all_partitions_TESTMERGE_rcluster_aln.nexus.treefile.tree
13_iqtree_13_COD_70_max-two-stops-pal_NUCLEO_all_partitions_GTR+I+G_bb_aln.nexus.treefile.tree
14_iqtree_14_COD_70_max-two-stops-pal_NUCLEO_all_partitions_TESTMERGE_rcluster_aln.nexus.treefile.tree
15_iqtree_15_COD_70_no-stop-pal_NUCLEO_coding_ECMK07_rcluster_aln.nexus.treefile.tree
16_iqtree_16_COD_70_no-stop-pal_NUCLEO_coding_GY2K_rcluster_aln.nexus.treefile.tree
17_iqtree_17_COD_70_no-stop-pal_NUCLEO_coding_MG2K_rcluster_aln.nexus.treefile.tree
18_iqtree_18_COD_70_max-two-stops-pal_NUCLEO_coding_ECMK07_rcluster_aln.nexus.treefile.tree
19_iqtree_19_COD_70_max-two-stops-pal_NUCLEO_coding_GY2K_rcluster_aln.nexus.treefile.tree
20_iqtree_20_COD_70_max-two-stops-pal_NUCLEO_coding_MG2K_rcluster_aln.nexus.treefile.tree
21_iqtree_21_COD_70_no-stops-aln_PROT_LG+F+G_rcluster_aln.fasta.treefile.tree
22_iqtree_22_COD_70_no-stops-aln_PROT_LG+C20+F+G_rcluster_aln.fasta.treefile.tree
23_iqtree_23_COD_70_no-stops-aln_PROT_Q.insect+F+G_rcluster_aln.fasta.treefile.tree
24_iqtree_24_COD_70_no-stops-aln_PROT_DCMut_rcluster_aln.fasta.treefile.tree
25_iqtree_25_COD_70_no-stops-aln_PROT_TESTMERGE_rcluster_aln.fasta.treefile.tree
26_iqtree_26_COD_70_max-two-stops-aln_PROT_LG+F+G_rcluster_aln.fasta.treefile.tree
27_iqtree_27_COD_70_max-two-stops-aln_PROT_LG+C20+F+G_rcluster_aln.fasta.treefile.tree
28_iqtree_28_COD_70_max-two-stops-aln_PROT_Q.insect+F+G_rcluster_aln.fasta.treefile.tree
29_iqtree_29_COD_70_max-two-stops-aln_PROT_DCMut_rcluster_aln.fasta.treefile.tree
30_iqtree_30_COD_70_max-two-stops-aln_PROT_TESTMERGE_rcluster_aln.fasta.treefile.tree

# Trees from ASTRAL without TreeShrink
31_astral_01_sptree-renamed.tree
32_astral_03_sptree-renamed.tree
33_astral_05_sptree-renamed.tree
34_astral_07_sptree-renamed.tree
35_astral_09_sptree-renamed.tree
36_astral_11_sptree-renamed.tree
37_astral_13_sptree-renamed.tree
38_astral_15_sptree-renamed.tree
39_astral_16_sptree-renamed.tree
40_astral_17_sptree-renamed.tree
41_astral_18_sptree-renamed.tree
42_astral_19_sptree-renamed.tree
43_astral_20_sptree-renamed.tree
44_astral_21_sptree-renamed.tree
45_astral_22_sptree-renamed.tree
46_astral_23_sptree-renamed.tree
47_astral_24_sptree-renamed.tree
48_astral_26_sptree-renamed.tree
49_astral_27_sptree-renamed.tree
50_astral_28_sptree-renamed.tree
51_astral_29_sptree-renamed.tree

# Trees from ASTRAL with TreeShrink
52_astral_treeshrink_01_sptree-renamed.tree
53_astral_treeshrink_03_sptree-renamed.tree
54_astral_treeshrink_05_sptree-renamed.tree
55_astral_treeshrink_07_sptree-renamed.tree
56_astral_treeshrink_09_sptree-renamed.tree
57_astral_treeshrink_11_sptree-renamed.tree
58_astral_treeshrink_13_sptree-renamed.tree
59_astral_treeshrink_15_sptree-renamed.tree
60_astral_treeshrink_16_sptree-renamed.tree
61_astral_treeshrink_17_sptree-renamed.tree
62_astral_treeshrink_18_sptree-renamed.tree
63_astral_treeshrink_19_sptree-renamed.tree
64_astral_treeshrink_20_sptree-renamed.tree
65_astral_treeshrink_21_sptree-renamed.tree
66_astral_treeshrink_22_sptree-renamed.tree
67_astral_treeshrink_23_sptree-renamed.tree
68_astral_treeshrink_24_sptree-renamed.tree
69_astral_treeshrink_26_sptree-renamed.tree
70_astral_treeshrink_27_sptree-renamed.tree
71_astral_treeshrink_28_sptree-renamed.tree
72_astral_treeshrink_29_sptree-renamed.tree

# Consensus trees
73_iqtree_majority_rule_phytools_updated.tree
74_astral_majority_rule_phytools_updated.tree
75_astral_treeshrink_majority_rule_phytools_updated.tree

File 5: "termitidae_diagnosing_database_v1.fasta.gz"

Database containing the top 20 UCE loci diagnostic of subfamilies of Termitidae.
This file was designed with all samples of Termitidae available in the Termite UCE Database (Contributions 1-4; i.e., 227 samples).
The database file will be updated with each new contribution (maintained at: https://github.com/sihellem/TER-UCE-DB/).
The presently deposited file (version 1) is not diagnostic for subfamilies represented by only one sample (i.e., Crepititermitinae, Forficulitermitinae, and Protohamitermitinae).

Funding

Okinawa Institute of Science and Technology Graduate University, Subsidiary funding

Czech Science Foundation, Award: 20-20548S