Skip to main content

Genomic characterization and curation of UCEs improves species tree reconstruction: Supplementary Material S1

Cite this dataset

Van Dam, Matthew (2020). Genomic characterization and curation of UCEs improves species tree reconstruction: Supplementary Material S1 [Dataset]. Dryad.


Ultraconserved genomic elements (UCEs) are generally treated as independent loci in phylogenetic analyses. The identification pipeline for UCE probes does not require prior knowledge of genetic identity, only selecting loci that are highly conserved, single copy, without repeats, and of a particular length. Here we characterized UCEs from 11 phylogenomic studies across the animal tree of life, from birds to marine invertebrates. We found that within vertebrate lineages, UCEs are mostly intronic and intergenic, while in invertebrates, the majority are in exons. We then curated 4 different sets of UCE markers by genomic category from 5 different studies including: birds, mammals, fish, Hymenoptera (ants, wasps, and bees) and Coleoptera (beetles). Of genes captured by UCEs, we find that many are represented by 2 or more UCEs, corresponding to non-overlapping segments of a single gene. We considered these UCEs to be non-independent, merged all UCEs that belonged to a particular gene, constructed gene and species trees, and then evaluated the subsequent effect of merging co-genic UCEs on gene and species tree reconstruction. Average bootstrap support for merged UCE gene trees was significantly improved across all datasets apparently driven by the increase in loci length. Additionally, we conducted simulations and found that gene trees generated from merged UCEs were more accurate than those generated by unmerged UCEs. As loci length improves gene tree accuracy, this modest degree of UCE characterization and curation impacts downstream analyses and demonstrates the advantages of incorporating basic genomic characterizations into phylogenomic analyses.

Usage notes

Supplementary Material Appendix S1

This folder contains the following:

  • 1)  Folder containing Box-Plots of gene tree spectral distances comparisons and normalized RF distances (“GENE TREE DISTANCES”). These are between the unmerged and merged gene trees. 
There are two plots for each taxon, the peakedness and asymmetry 

  • 2)  Folder containing the tree files for all the species trees (“SPECIES TREES - 
unmerged/merged”) They are broken down into two folders one for the genic and intergenic species trees with bootstrap support values. A second subfolder for with the quartet support values at the nodes, in the “Quartet Support” folder. 

  • 3)  A PDF labeled “Comparison of unmerged to merged species trees” This displays the species trees generated in ASTRAL-III, unmerged ASTRAL tree is to the left, merged genic loci plus all remaining loci to the right. Node values indicate bootstrap support values Changes in topology are circled on the merged species tree Increases in ABS values are in red, decreases are in blue. 

  • 4)  Folder containing the species trees from the simulations The true tree replicates all end with “s_tree.trees”. Loci ran individual end with “merged_trees_all_single”. Combinations of merging end with “single4_5_merged1-3.tree” where loci 4 and 5 were not merged “single” loci and loci 1-3 were merged. All trees ending with “RANDO_OUT” are the randomly merged sets of loci.
  • 5)  A table for the simulation parameters used (“5 Simulation Parametersxlsx”).
  • 6)  A figure with an overview of the simulation procedure (“6 Simulation procedure overviewpdf”)
  • 7)  A table with the pairwise comparison of species tree distances, Robinson–Foulds 
distance (RF) and Kuhner–Felsenstein (KF) 

  • 8)  A table listing the UCE set and the corresponding NCBI reference numbers for the base-taxon.
  • 9)  Bar-plot of the number of UCEs that can be found within a gene
  • 10)  Table of UCE totals by genomic class
  • 11)  Table of taxa used and UCE count by taxon
  • 12)  Table of results from GLM analyses
  • 13) A boxplot of the “Nearest Neighbor Distance” between intragenic and the nearest neighboring intergenic UCE