Eukaryotic gene trees with 150 leaves
Data files
Mar 17, 2024 version files 38.36 KB
Abstract
Gene loss is an important process in gene and genome evolution. If a gene is present at the root of a rooted binary phylogenetic tree and can be lost in one descendant lineage, it can be lost in other descendant lineages as well, and potentially can be lost in all of them, leading to extinction of the gene on the tree. In that case, just before the gene goes extinct in the rooted phylogeny, there will be one lineage that still retains the gene for some period of time, representing a ‘last-one-out’ distribution. If there are many (hundreds) of leaves in one clade of a phylogenetic tree, yet only one leaf possesses the gene, it will look like the result of a recent gene acquisition, even though the distribution at the tips was generated by loss. Here we derive the probability of observing last-one-out distributions under a Markovian loss model and a given gene loss rate μ. We find that the probability of observing such cases can be calculated mathematically, and can be surprisingly high, depending upon the tree and the rate of gene loss. Examples from real data show that gene loss can readily account for the observed frequency of last-one-out gene distribution patterns that might otherwise be attributed to lateral gene transfer.
README: Eukaryotic gene trees with 150 leaves
https://doi.org/10.5061/dryad.612jm649v
This dataset consists of 10 eukaryotic gene trees that were used for the analysis.
Methods
The trees were extracted from a clustering of 150 eukaryotic genomes. Protein sequences were analyzed with an All-vs-All BLAST search. Reciprocal best hits were extracted and further analyzed for their respective global identity. Reciprocal best hits with an e-value smaller than 1e-10 and a global identity of 25% or more were used for the clustering with MCL.
Trees were constructed from the resulting clusters using IQTree.
The 10 analyzed trees from this analysis were randomly selected from all trees with exactly 150 leaves.