Phylogenomic analyses have increasingly adopted species tree reconstruction using methods that account for gene tree discordance using pipelines that require both human effort and computational resources. As the number of available genomes continues to increase, a new problem is facing researchers. Once more species become available, they have to repeat the whole process from the beginning because updating species trees is currently not possible. However, the de novo inference can be prohibitively costly in human effort or machine time. In this paper, we introduce INSTRAL, a method that extends ASTRAL to enable phylogenetic placement. INSTRAL is designed to place a new species on an existing species tree after sequences from the new species have already been added to gene trees; thus, INSTRAL is complementary to existing placement methods that update gene trees.

EPA-ng results for concatenation

The EPA-ng output files for concatenation tree and concatenated alignments

concatenation-epa.tar.gz

True Species Trees

The true species trees for all model conditions simulated using SimPhy

true-specis-trees.tar.bz

Estimated gene trees

estimatedgenetrees.0.000001.tar.gz

Estimated gene trees

Estimated gene trees for the three model conditions with speciation rate of 0.000001. There are three folders corresponding to the three ILS level and in each folder there are 50 replicates each containing 1000 gene trees in newick format. The name 'model.200.500000.0.0000001' means the number of species is 200, number of generations is 500000 and speciation rate is 0.0000001.

estimatedgenetrees.0.0000001.tar.gz

gene trees for 10K dataset

Estimated gene trees for the very large dataset, used for running time analysis

10k.genetrees.1.tar.gz

gene trees for 10K dataset

Estimated gene trees (part 2 ) for the very large dataset, used for running time analysis

10k.genetrees.2.tar.gz

Biological datasets

The estimated gene trees and species trees for the three biological datasets are included. The intermediate and final trees for ordered placement are also included in 'hierarchical-placement'.

biological-datasets.tar.gz

Backbone trees for ordered placement

Backbone trees used for ordered placement for simulated dataset with 201 species trees, which are estimated species trees created from 50, 200 and 1000 gene trees. 50, 100 and 150 leaves are pruned which are specified by 'x.pruned' in the names.

ordered-placement-backbones.tar.gz

EPA-ng gene trees (low ILS)

Estimated gene trees for simulated dataset with low ILS. For each set of gene trees (200 genes), one species has been pruned and then placed back using EPA-ng. The number in folder name, e.g. sp100 is the label of species being pruned and then placed back.

epa_genetrees.model.200.10000000.0.000001.tar.gz

EPA-ng gene trees (medium ILS)

Estimated gene trees for simulated dataset with medium ILS. For each set of gene trees (200 genes), one species has been pruned and then placed back using EPA-ng. The number in folder name, e.g. sp100 is the label of species being pruned and then placed back.

epa_genetrees.model.200.2000000.0.000001.tar.gz

EPA-ng gene trees (high ILS)

Estimated gene trees for simulated dataset with high ILS. For each set of gene trees (200 genes), one species has been pruned and then placed back using EPA-ng. The number in folder name, e.g. sp100 is the label of species being pruned and then placed back.

epa_genetrees.model.200.500000.0.000001.tar.gz

Data from: INSTRAL: discordance-aware phylogenetic placement using quartet scores

Data files

Abstract

EPA-ng results for concatenation

True Species Trees

Estimated gene trees

Estimated gene trees

gene trees for 10K dataset

gene trees for 10K dataset

Biological datasets

Backbone trees for ordered placement

EPA-ng gene trees (low ILS)

EPA-ng gene trees (medium ILS)

EPA-ng gene trees (high ILS)

Supplementary-material

Data from: INSTRAL: discordance-aware phylogenetic placement using quartet scores

Data files

Abstract

Usage notes

EPA-ng results for concatenation

True Species Trees

Estimated gene trees

Estimated gene trees

gene trees for 10K dataset

gene trees for 10K dataset

Biological datasets

Backbone trees for ordered placement

EPA-ng gene trees (low ILS)

EPA-ng gene trees (medium ILS)

EPA-ng gene trees (high ILS)

Supplementary-material

Works referencing this dataset