Skip to main content
Dryad

Improving Phylogenies Based on Average Nucleotide Identity, Incorporating Saturation Correction and Non-Parametric Bootstrap Support

Cite this dataset

Gosselin, Sean; Fullmer, Matthew; Feng, Yutien; Gogarten, J Peter (2021). Improving Phylogenies Based on Average Nucleotide Identity, Incorporating Saturation Correction and Non-Parametric Bootstrap Support [Dataset]. Dryad. https://doi.org/10.5061/dryad.jwstqjq85

Abstract

Whole genome comparisons based on Average Nucleotide Identities (ANI) and the Genome-to-genome distance calculator have risen to prominence in rapidly classifying prokaryotic taxa using whole genome sequences. Some implementations have even been proposed as a new standard in species classification and have become a common technique for papers describing newly sequenced genomes. However, attempts to apply whole genome divergence data to delineation of higher taxonomic units and to phylogenetic inference have had difficulty matching those produced by more complex phylogenetic methods. We present a novel method for generating statistically supported phylogenies of archaeal and bacterial groups using a combined ANI and alignment fraction-based metric. For the test cases to which we applied the developed approach we obtained results comparable with other methodologies up to at least the family-level.  The developed method uses non-parametric bootstrapping to gauge support for inferred groups.  This method offers the opportunity to make use of whole-genome comparison data, that are already being generated, to quickly produce phylogenies including support for inferred groups. Additionally, the developed ANI methodology can assist classification of higher taxonomic groups.

Included herein are supplemental materials, and all whole genome datasets used throughout the construction of this work.

Methods

All genomes (unless noted in table S1) are pulled from NCBI at the accession number provided in table S1. All genomes are provided in fasta format.