### Supplementary Materials from: # Molecular Phylogenetics and Evolution, doi: 10.1016/j.ympev.2022.107520 # Title: Using ultraconserved elements to reconstruct the termite tree of life # Authors: Simon Hellemans, Menglin Wang, Nonno Hasegawa, Jan Šobotník, Rudolf H. Scheffrahn, Thomas Bourguignon ### Supplementary Data 1: SQLite database containing the putative orthologs found in the four termite genomes, obtained with ‘phyluce_probe_get_multi_merge_table’, with positions on the base genome of M. natalensis. ### Supplementary Data 2: Conserved sequences with equally distributed buffer, obtained with ‘phyluce_probe_get_genome_sequences_from_bed’, designed for the base genome of M. natalensis. ### Supplementary Data 3: Deduplicated probe set (minimum identity threshold of 80% and minimum coverage of 83%) cleaned from sequences with ambiguous base calls and GC-content above 70% or below 30%, designed for the base genome of M. natalensis (240,602 baits targeting 170,648 loci). This deduplicated set was obtained with ‘phyluce_probe_get_tiled_probes’, ‘phyluce_probe_easy_lastz’ and ‘phyluce_probe_remove_duplicate_hits_from_probes_using_lastz’. ### Supplementary Data 4: Conserved sequences with equally distributed buffer and database table, obtained respectively with ‘phyluce_probe_slice_sequence_from_genomes’ and ‘phyluce_probe_get_multi_fasta_table’, for all four considered termite genomes (53,422 loci). Slices of (A) Macrotermes natalensis, (B) Zootermopsis nevadensis, (C) Cryptotermes secundus, and (D) Coptotermes formosanus. (E) SQLite database. ### Supplementary Data 5: Final deduplicated probe set (minimum identity threshold of 80% and a minimum coverage of 83%) cleaned from sequences with ambiguous base calls and GC-content above 70% or below 30%, designed for all four considered termite genomes (397,910 baits targeting 50,616 loci). This deduplicated set was obtained with ‘phyluce_probe_get_tiled_probes’, ‘phyluce_probe_easy_lastz’ and ‘phyluce_probe_remove_duplicate_hits_from_probes_using_lastz’. ### Supplementary Data 6: Termite UCE Database. Extracted UCEs from all samples using the final deduplicated probe set (Supplementary Data 5) in which each sample was assigned a unique identification code (TER-X-UCEDB; see Supplementary Table 1). The database is maintained at: https://github.com/oist/TER-UCE-DB/. ### Supplementary Data 7: Alignments produced for the 5,934 loci for which data missingness was below 25%. (A) Nexus alignment file. (B) Character sets file. ### Supplementary Data 8: Bait set reduced to 5,934 loci (47,091 baits), corresponding to the loci kept in the 75% sample matrix (Supplementary Data 7). ### Supplementary Data 9: Tentatively annotated UCEs, based on the GFF file (NCBI Annotation Release 100) from the Z. nevadensis genome assembly (GCF_000696155), and the UCEs extracted using ‘phyluce_probe_slice_sequence_from_genomes’ (Supplementary Data 4B). (Sheet 1) Annotation overview of the 50,616 UCEs, annotated into protein-coding or intergenic regions. (Sheet 2) All possible annotations for each of the 50,616 UCEs. (Sheet 3) Curated annotation of the 40,966 singly-annotated protein-coding loci into exons and introns.