Skip to main content
Dryad

Supplementary information for: Using networks to identify structure in phylogenetic tree sets

Cite this dataset

Brown, Jeremy et al. (2020). Supplementary information for: Using networks to identify structure in phylogenetic tree sets [Dataset]. Dryad. https://doi.org/10.5061/dryad.612jm642f

Abstract

Modern phylogenomic studies produce large sets of trees that can represent variation in inferred phylogenies across genes, uncertainty in estimated phylogenies for a given gene, or both. Standard practice is to condense this variation down to a small set of point estimates or consensus trees in order to facilitate display and interpretation. However, doing so results in the loss of enormous amounts of information about the structure of the underlying tree set. Here, we propose new approaches to explore and detect structure in the tree set itself. These approaches rely on the well-developed mathematical foundations of community detection in networks and leverage two different network types. The first type uses nodes to represent trees and connects these nodes with edges whose weights are determined by the similarity (affinity) of the trees. The second type uses nodes to represent bipartitions and connects nodes with edges whose weights represent the covariance in bipartition presence/absence across trees in the set. These two network types carry information that is complementary, but not identical. A variety of methods may be applied to both networks in order to identify interesting community structure. These community detection approaches provide a rich view of the information contained in phylogenomic data sets and facilitate investigation into the forces driving inferred phylogenetic variation across genomes.

Funding

National Science Foundation, Award: DBI-1262571

National Science Foundation, Award: DBI-1934156

National Science Foundation, Award: DBI-1262476

National Science Foundation, Award: DBI-1934182

National Science Foundation, Award: DBI-1934157