The increasing number of available genomic data allowed the development of phylogenomic analytical tools. Current methods compile information from single gene phylogenies, whether based on topologies or multiple sequence alignments. Generally, phylogenomic analyses elect gene families or genomic regions to construct phylogenomic trees. Here, we presented an alternative approach for Phylogenomics, named TOMM (Total Ortholog Median Matrix), to construct a representative phylogram composed by amino acid distance measures of all pairwise ortholog protein sequence pairs from desired species inside a group of organisms. The procedure is divided two main steps, (1) ortholog detection and (2) creation of a matrix with the median amino acid distance measures of all pairwise orthologous sequences. We tested this approach within three different group of organisms: Kinetoplastida protozoa, hematophagous Diptera vectors and Primates. Our approach was robust and efficacious to reconstruct the phylogenetic relationships for the three groups. Moreover, novel branch topologies could be achieved, providing insights about some phylogenetic relationships between some taxa.
Supplemental Figures: S1, S2, S3, S4, S5, S6 and S7
S1-S6: Phylograms using different percentiles of the ranked ortholog pairs of 46 species of Kinetoplastida protozoa (S1, S2 and S4), different cutoff for E-value (S3) and different methods for orthology inference (S5 and S6).
S7: Bar graph showing the total number of orthologous identified by the RSD and OrthoMCL algorithms for 78 pairs of species combinations used in the analysis, based on 13 species with sequences retrieved from TriTryp database, as indicated in Table 1 ("Proteins sequence source" column). Intersections (shared orthologs) and unique orthologs were calculated with gene ID lists as input using Venn diagram tool (http://bioinformatics.psb.ugent.be/webtools/Venn/).
File name: Supplemental_Figures_S1_S2_S3_S4_S5_S6_S7.pdf
Supplemental Table 1: Kinetoplastida pairwise matrices
Excel spreadsheet containing resulting tables of pairwise orthologs data (pairwise matrices). Sheet "AA distance": aminoacid distance obtained from median value calculated by RSD algorithm. Sheet "Number-50": total number of orthologs identified by RSD algorithm settings of of 0.01 for the blast e-value of acceptance, and the value of 0.8 for the minimum ratio of the smallest sequence to the larger one.
SupplementalTable_S1.xlsx
Supplemental Table 2: Hemataphagous Diptera pairwise matrices
Excel spreadsheet containing resulting tables of pairwise orthologs data (pairwise matrices). Sheet "AA distance": aminoacid distance obtained from median value calculated by RSD algorithm. Sheet "Number-50": total number of orthologs identified by RSD algorithm settings of of 0.1 for the blast e-value of acceptance, and the value of 0.8 for the minimum ratio of the smallest sequence to the larger one.
SupplementalTable_S2.xlsx
Supplemental Table 3: Primates pairwise matrices
Excel spreadsheet containing resulting tables of pairwise orthologs data (pairwise matrices). Sheet "AA distance": aminoacid distance obtained from median value calculated by RSD algorithm. Sheet "Number-50": total number of orthologs identified by RSD algorithm settings of of 0.01 for the blast e-value of acceptance, and the value of 0.8 for the minimum ratio of the smallest sequence to the larger one.
SupplementalTable_S3.xlsx
Supplemental File 1: R scripts
Scripts for hclust, pvclust and ape R packages used to build phylograms from amino acid distance matrices.
SupplementalFile1.txt
Supplemental File 2: RSD resulting files (gene IDs) in compressed folders