Skip to main content
Dryad logo

DEPP: Deep learning enables extending species trees using single genes


Jiang, Yueyu; Balaban, Metin; Zhu, Qiyun; Mirarab, Siavash (2022), DEPP: Deep learning enables extending species trees using single genes, Dryad, Dataset,


Placing new sequences onto reference phylogenies is increasingly used for analyzing environmental samples, especially microbiomes. However, existing placement methods have a fundamental limitation: they assume that query sequences have evolved using specific models directly on the reference phylogeny. Thus, they can place single-gene data (e.g., 16S rRNA amplicons) onto their own gene tree. This practice is a proxy for a more ambitious goal: extending a (genome-wide) species tree given data from individual genes. No algorithm currently addresses this challenging problem. Here, we introduce Deep-learning Enabled Phylogenetic Placement (DEPP), an algorithm that learns to extend species trees using single genes without pre-specified models. We show that DEPP updates the multi-locus microbial tree-of-life with single genes with high accuracy. We further demonstrate that DEPP can achieve the long-standing goal of combining 16S and metagenomic data onto a single tree, enabling community structure analyses that were previously impossible and producing robust patterns.

Usage Notes

Please note, this dataset is the most recent version of a duplicate dataset available via this link: (published February 4, 2022).