Improving Phylogenies Based on Average Nucleotide Identity, Incorporating Saturation Correction and Non-Parametric Bootstrap Support

Gosselin, Sean 1 ; Fullmer, Matthew2 ; Feng, Yutien 1 ; Gogarten, J Peter 1

Published Jul 21, 2021; Updated Jul 29, 2021 on Dryad. https://doi.org/10.5061/dryad.jwstqjq85

Data files

Jul 21, 2021 version files 325.06 MB

Genomes_by_Dataset.tar.xz

325.06 MB

Abstract

Whole genome comparisons based on Average Nucleotide Identities (ANI) and the Genome-to-genome distance calculator have risen to prominence in rapidly classifying prokaryotic taxa using whole genome sequences. Some implementations have even been proposed as a new standard in species classification and have become a common technique for papers describing newly sequenced genomes. However, attempts to apply whole genome divergence data to delineation of higher taxonomic units and to phylogenetic inference have had difficulty matching those produced by more complex phylogenetic methods. We present a novel method for generating statistically supported phylogenies of archaeal and bacterial groups using a combined ANI and alignment fraction-based metric. For the test cases to which we applied the developed approach we obtained results comparable with other methodologies up to at least the family-level. The developed method uses non-parametric bootstrapping to gauge support for inferred groups. This method offers the opportunity to make use of whole-genome comparison data, that are already being generated, to quickly produce phylogenies including support for inferred groups. Additionally, the developed ANI methodology can assist classification of higher taxonomic groups.

Included herein are supplemental materials, and all whole genome datasets used throughout the construction of this work.

Improving Phylogenies Based on Average Nucleotide Identity, Incorporating Saturation Correction and Non-Parametric Bootstrap Support

Data files

Abstract

Methods

Works referencing this dataset