Reconstructing the phylogenetic relationships between species is one of the most formidable tasks in evolutionary biology. Multiple methods exist to reconstruct phylogenetic trees, each with their own strengths and weaknesses. Both simulation and empirical studies have identified several “zones” of parameter space where accuracy of some methods can plummet, even for four-taxon trees. Further, some methods can have undesirable statistical properties such as statistical inconsistency and/or the tendency to be positively misleading (i.e. assert strong support for the incorrect tree topology). Recently, deep learning techniques have made inroads on a number of both new and longstanding problems in biological research. Here we designed a deep convolutional neural network (CNN) to infer quartet topologies from multiple sequence alignments. This CNN can readily be trained to make inferences using both gapped and ungapped data. We show that our approach is highly accurate on simulated data, often outperforming traditional methods, and is remarkably robust to bias-inducing regions of parameter space such as the Felsenstein zone and the Farris zone. We also demonstrate that the confidence scores produced by our CNN can more accurately assess support for the chosen topology than bootstrap and posterior probability scores from traditional methods. While numerous practical challenges remain, these findings suggest that deep learning approaches such as ours have the potential to produce more accurate phylogenetic inferences.

Supplementary_Text_v1

Supplementary Text

FigS1_Suppl_BL_space

Figure S1

S1_Suppl_BL_space.pdf

FigS2_Suppl_3DSurface

Figure S2

S2_Suppl_3DSurface.pdf

FigS3_Suppl_Bias_gap

Figure S3

S3_Suppl_Bias_gap.jpeg

FigS4_Suppl_Bias_nogap

Figure S4

S4_Suppl_Bias_nogap.jpeg

FigS5_Suppl_Precision_recall

Figure S5

S5_Suppl_Precision_recall.pdf

TabS1_Suppl_500regionaccuracy

Table S1

S1_Suppl_tab_500regionaccuracy.docx

TabS2_Suppl_consistency

Table S2

S2_Suppl_tab_consistency.docx

Suppl_fig_tab_legends_v1

Supplementary Figure and Table legends

Data from: Accurate inference of tree topologies from multiple sequence alignments using deep learning

Data files

Abstract

Supplementary_Text_v1

FigS1_Suppl_BL_space

FigS2_Suppl_3DSurface

FigS3_Suppl_Bias_gap

FigS4_Suppl_Bias_nogap

FigS5_Suppl_Precision_recall

TabS1_Suppl_500regionaccuracy

TabS2_Suppl_consistency

Suppl_fig_tab_legends_v1

Data from: Accurate inference of tree topologies from multiple sequence alignments using deep learning

Data files

Abstract

Usage notes

Supplementary_Text_v1

FigS1_Suppl_BL_space

FigS2_Suppl_3DSurface

FigS3_Suppl_Bias_gap

FigS4_Suppl_Bias_nogap

FigS5_Suppl_Precision_recall

TabS1_Suppl_500regionaccuracy

TabS2_Suppl_consistency

Suppl_fig_tab_legends_v1