Skip to main content
Dryad

The Community Coevolution Model with application to the study of evolutionary relationships between genes based on phylogenetic profiles

Cite this dataset

Liu, Chaoyue; Kenney, Toby; Beiko, Robert; Gu, Hong (2022). The Community Coevolution Model with application to the study of evolutionary relationships between genes based on phylogenetic profiles [Dataset]. Dryad. https://doi.org/10.5061/dryad.p8cz8w9rd

Abstract

Organismal traits can evolve in a coordinated way, with correlated patterns of gains and losses reflecting important evolutionary associations. Discovering these associations can reveal important information about the functional and ecological linkages among traits. Phylogenetic profiles treat individual genes as traits distributed across sets of genomes and can provide a fine-grained view of the genetic underpinnings of evolutionary processes in a set of genomes. Phylogenetic profiling has been used to identify genes that are functionally linked, and to identify common patterns of lateral gene transfer in microorganisms. However, comparative analysis of phylogenetic profiles and other trait distributions should take into account the phylogenetic relationships among the organisms under consideration.

Here we propose the Community Coevolution Model (CCM), a new coevolutionary model to analyze the evolutionary associations among traits, with a focus on phylogenetic profiles. In the CCM, traits are considered to evolve as a community with interactions, and the transition rate for each trait depends on the current states of other traits. Surpassing other comparative methods for pairwise trait analysis, CCM has the additional advantage of being able to examine multiple traits as a community to reveal more dependency relationships. We also develop a simulation procedure to generate phylogenetic profiles with correlated evolutionary patterns that can be used as benchmark data for evaluation purposes.

A simulation study demonstrates that CCM is more accurate than other methods including the Jaccard Index and three tree-aware methods. The parameterization of CCM makes the interpretation of the relations between genes more direct, which leads to Darwin's scenario being identified easily based on the estimated parameters. We show that CCM is more efficient and fits real data better than other methods resulting in higher likelihood scores with fewer parameters. An examination of 3786 phylogenetic profiles across a set of 659 bacterial genomes highlights linkages between genes with common functions, including many patterns that would not have been identified under a non-phylogenetic model of common distribution. We also applied the CCM to 44 proteins in the well-studied Mitochondrial Respiratory Complex I and recovered associations that mapped well onto the structural associations that exist in the complex.

Methods

The Community Coevolution Model (CCM), is a new coevolutionary model to analyze the evolutionary associations among binary traits.

Supplementary Figures and Tables files:

  • Supplementary Figures and Tables
  • Table S3_Predictions_Unannotated_Genes_LZ

The files contain the figures and tables that are mentioned in the CCM paper.

LZ data sets:

  • Data file: phylogenetic tree (LZ)
  • Data file: profile matrix (LZ)

The draft assembly of the bacterium “Lachnospiraceae bacterium 3-1-57FAACT1” (abbreviated as LZ), was isolated from a biopsy retrieved from the transverse colon of a female Crohn’s Disease patient at the time of colonoscopy (Liu et al. 2018). 658 completed and draft genomes from class Clostridia were retrieved from the National Center for Biotechnology Information (NCBI) for the comparative analysis of LZ. The phylogenetic tree was built through the AMPHORA2 pipeline (Wu and Scott 2012) and RAxML-HPC (Stamatakis 2006) using their concatenated, conserved protein sequences and another set of eight outgroup genomes from class Bacilli and phyla Actinobacteria and Proteobacteria were used for rooting. The phylogenetic profiles were constructed by comparing the complete set of LZ against all other genomes using rapsearch (Ye et al. 2011).

Usage notes

  • Supplementary Figures and Tables  (.pdf)
  • Table S3_Predictions_Unannotated_Genes_LZ (.csv)
  • Data file: phylogenetic tree (LZ)  (.nex)
  • Data file: profile matrix (LZ) (.csv)

The tree files (.nex, .nwk) can be opened with FigTree , the online tool iTOL , etc.

For more details, please refer to the README file.

Funding

Natural Sciences and Engineering Research Council