Skip to main content
Dryad

Fitness effects of mutations: An assessment of PROVEAN predictions using mutation accumulation data

Data files

Feb 07, 2022 version files 28.35 GB

Select up to 11 GB of files for download

Abstract

Predicting fitness in natural populations is a major challenge in biology. It may be possible to leverage fast-accumulating genomic datasets to infer the fitness effects of mutant alleles, allowing evolutionary questions to be addressed in any organism. In this paper, we investigate the utility of one such tool, called PROVEAN. This program compares a query sequence with existing data to provide an alignment-based score for any protein variant, with scores categorized as neutral or deleterious based on a preset threshold. PROVEAN has been used widely in evolutionary studies, e.g., to estimate mutation load in natural populations, but has not been formally tested as a predictor of aggregate mutational effects on fitness. Using three large, published datasets on the genome sequences of laboratory mutation accumulation lines, we assessed how well PROVEAN predicted the actual fitness patterns observed, relative to other metrics. In most cases, we find that a simple count of the total number of mutant proteins is a better predictor of fitness than the number of variants scored as deleterious by PROVEAN. We also find that the sum of all mutant protein scores explains variation in fitness better than the number of mutant proteins in one of the datasets. We discuss the implications of these results for studies of populations in the wild.