Skip to main content
Dryad

Data files associated with: Evolution of the mutation spectrum across a mammalian phylogeny

Cite this dataset

Beichman, Annabel et al. (2023). Data files associated with: Evolution of the mutation spectrum across a mammalian phylogeny [Dataset]. Dryad. https://doi.org/10.5068/D1339F

Abstract

Although evolutionary biologists have long theorized that variation in DNA repair efficacy might explain some of the diversity of lifespan and cancer incidence across species, we have little data on the variability of normal germline mutagenesis outside of humans. Here, we shed light on the spectrum and etiology of mutagenesis across mammals by quantifying mutational sequence context biases using polymorphism data from thirteen species of mice, apes, bears, wolves, and cetaceans. After normalizing the mutation spectrum for reference genome accessibility and k-mer content, we use the Mantel test to deduce that mutation spectrum divergence is highly correlated with genetic divergence between species, whereas life history traits like reproductive age are weaker predictors of mutation spectrum divergence. Potential bioinformatic confounders are only weakly related to a small set of mutation spectrum features. We find that clocklike mutational signatures previously inferred from human cancers cannot explain the phylogenetic signal exhibited by the mammalian mutation spectrum, despite the ability of these clocklike signatures to fit each species’ 3-mer spectrum with high cosine similarity. In contrast, parental aging signatures inferred from human de novo mutation data appear to explain much of the mutation spectrum’s phylogenetic signal when fit to non-context-dependent mutation spectrum data in combination with a novel mutational signature. We posit that future models purporting to explain the etiology of mammalian mutagenesis need to capture the fact that more closely related species have more similar mutation spectra; a model that fits each marginal spectrum with high cosine similarity is not guaranteed to capture this hierarchy of mutation spectrum variation among species.

Methods

Mutation spectra were generated based on publicly-available whole genome sequencing polymorphism data (VCF format) from 13 mammal species (house mouse, Algerian mouse, humans, Bornean orangutan, Sumatran orangutan, chimpanzee, gorilla, bonobo, gray wolf, polar bear, brown bear, vaquita porpoise, and fin whale). 

Spectra were generated for each species at the 1-mer, 3-mer, 5-mer and 7-mer level using the program mutyper and a pipeline that is described extensively in the paper's SI Methods section and on the project's GitHub repository (https://github.com/harrispopgen/mammal_mutation_spectra/). Data files are described in depth in the Dryad repository's README file.

Usage notes

All code that was used to process the data, carry out analyses, and generate figures is on the projects GitHub repository (https://github.com/harrispopgen/mammal_mutation_spectra/).

Funding

National Institute of General Medical Sciences, Award: R35GM133428

National Institute on Aging, Award: T32 AG066574

Burroughs Wellcome Fund

Kinship Conservation Fellows

Pew Charitable Trusts

Alfred P. Sloan Foundation