Supplementary information for: Using networks to identify structure in phylogenetic tree sets
Cite this dataset
Brown, Jeremy et al. (2020). Supplementary information for: Using networks to identify structure in phylogenetic tree sets [Dataset]. Dryad. https://doi.org/10.5061/dryad.612jm642f
Abstract
Modern phylogenomic studies produce large sets of trees that can represent variation in inferred phylogenies across genes, uncertainty in estimated phylogenies for a given gene, or both. Standard practice is to condense this variation down to a small set of point estimates or consensus trees in order to facilitate display and interpretation. However, doing so results in the loss of enormous amounts of information about the structure of the underlying tree set. Here, we propose new approaches to explore and detect structure in the tree set itself. These approaches rely on the well-developed mathematical foundations of community detection in networks and leverage two different network types. The first type uses nodes to represent trees and connects these nodes with edges whose weights are determined by the similarity (affinity) of the trees. The second type uses nodes to represent bipartitions and connects nodes with edges whose weights represent the covariance in bipartition presence/absence across trees in the set. These two network types carry information that is complementary, but not identical. A variety of methods may be applied to both networks in order to identify interesting community structure. These community detection approaches provide a rich view of the information contained in phylogenomic data sets and facilitate investigation into the forces driving inferred phylogenetic variation across genomes.
Funding
National Science Foundation, Award: DBI-1262571
National Science Foundation, Award: DBI-1934156
National Science Foundation, Award: DBI-1262476
National Science Foundation, Award: DBI-1934182
National Science Foundation, Award: DBI-1934157