Phylogenomic analyses have increasingly adopted species tree reconstruction using methods that account for gene tree discordance using pipelines that require both human effort and computational resources. As the number of available genomes continues to increase, a new problem is facing researchers. Once more species become available, they have to repeat the whole process from the beginning because updating species trees is currently not possible. However, the de novo inference can be prohibitively costly in human effort or machine time. In this paper, we introduce INSTRAL, a method that extends ASTRAL to enable phylogenetic placement. INSTRAL is designed to place a new species on an existing species tree after sequences from the new species have already been added to gene trees; thus, INSTRAL is complementary to existing placement methods that update gene trees.
EPA-ng results for concatenation
The EPA-ng output files for concatenation tree and concatenated alignments
concatenation-epa.tar.gz
True Species Trees
The true species trees for all model conditions simulated using SimPhy
true-specis-trees.tar.bz
Estimated gene trees
Estimated gene trees for the three model conditions with speciation rate of 0.000001. There are three folders corresponding to the three ILS level and in each folder there are 50 replicates each containing 1000 gene trees in newick format.
estimatedgenetrees.0.000001.tar.gz
Estimated gene trees
Estimated gene trees for the three model conditions with speciation rate of 0.000001. There are three folders corresponding to the three ILS level and in each folder there are 50 replicates each containing 1000 gene trees in newick format. The name 'model.200.500000.0.0000001' means the number of species is 200, number of generations is 500000 and speciation rate is 0.0000001.
estimatedgenetrees.0.0000001.tar.gz
gene trees for 10K dataset
Estimated gene trees for the very large dataset, used for running time analysis
10k.genetrees.1.tar.gz
gene trees for 10K dataset
Estimated gene trees (part 2 ) for the very large dataset, used for running time analysis
10k.genetrees.2.tar.gz
Biological datasets
The estimated gene trees and species trees for the three biological datasets are included. The intermediate and final trees for ordered placement are also included in 'hierarchical-placement'.
biological-datasets.tar.gz
Backbone trees for ordered placement
Backbone trees used for ordered placement for simulated dataset with 201 species trees, which are estimated species trees created from 50, 200 and 1000 gene trees. 50, 100 and 150 leaves are pruned which are specified by 'x.pruned' in the names.
ordered-placement-backbones.tar.gz
EPA-ng gene trees (low ILS)
Estimated gene trees for simulated dataset with low ILS. For each set of gene trees (200 genes), one species has been pruned and then placed back using EPA-ng. The number in folder name, e.g. sp100 is the label of species being pruned and then placed back.
epa_genetrees.model.200.10000000.0.000001.tar.gz
EPA-ng gene trees (medium ILS)
Estimated gene trees for simulated dataset with medium ILS. For each set of gene trees (200 genes), one species has been pruned and then placed back using EPA-ng. The number in folder name, e.g. sp100 is the label of species being pruned and then placed back.
epa_genetrees.model.200.2000000.0.000001.tar.gz
EPA-ng gene trees (high ILS)
Estimated gene trees for simulated dataset with high ILS. For each set of gene trees (200 genes), one species has been pruned and then placed back using EPA-ng. The number in folder name, e.g. sp100 is the label of species being pruned and then placed back.
epa_genetrees.model.200.500000.0.000001.tar.gz