Skip to main content
Dryad

Total Ortholog Median Matrix (TOMM): an alternative unsupervised approach for phylogenomics based on evolutionary distance between protein coding genes

Cite this dataset

Maruyama, Sandra R. et al. (2020). Total Ortholog Median Matrix (TOMM): an alternative unsupervised approach for phylogenomics based on evolutionary distance between protein coding genes [Dataset]. Dryad. https://doi.org/10.5061/dryad.b1k526g

Abstract

The increasing number of available genomic data allowed the development of phylogenomic analytical tools. Current methods compile information from single gene phylogenies, whether based on topologies or multiple sequence alignments. Generally, phylogenomic analyses elect gene families or genomic regions to construct phylogenomic trees. Here, we presented an alternative approach for Phylogenomics, named TOMM (Total Ortholog Median Matrix), to construct a representative phylogram composed by amino acid distance measures of all pairwise ortholog protein sequence pairs from desired species inside a group of organisms. The procedure is divided two main steps, (1) ortholog detection and (2) creation of a matrix with the median amino acid distance measures of all pairwise orthologous sequences. We tested this approach within three different group of organisms: Kinetoplastida protozoa, hematophagous Diptera vectors and Primates. Our approach was robust and efficacious to reconstruct the phylogenetic relationships for the three groups. Moreover, novel branch topologies could be achieved, providing insights about some phylogenetic relationships between some taxa.

Usage notes

Funding

São Paulo Research Foundation, Award: 2016/20258-0

Intramural Research Program of the National Institute of Allergy and Infectious Diseases