Skip to main content
Dryad

Pedigree-based and phylogenetic methods support surprising patterns of mutation rate and spectrum in the gray mouse lemur

Cite this dataset

Campbell, Ryan et al. (2021). Pedigree-based and phylogenetic methods support surprising patterns of mutation rate and spectrum in the gray mouse lemur [Dataset]. Dryad. https://doi.org/10.5061/dryad.8pk0p2njx

Abstract

Mutations are the raw material on which evolution acts, and knowledge of their frequency and genomic distribution is crucial for understanding how evolution operates at both long and short timescales. At present, the rate and spectrum of de novo mutations have been directly characterized in relatively few lineages. Our study provides the first direct mutation rate estimate for a strepsirrhine (i.e., the lemurs and lorises), which comprise nearly half of the primate clade. Using high-coverage linked-read sequencing for a focal quartet of gray mouse lemurs (Microcebus murinus), we estimated the mutation rate to be 1.52 × 10–8 (95% credible interval: 1.28 × 10−8 to 1.78 × 10−8) mutations/site/generation, a rate among the highest calculated for a mammal. Further, we found an unexpectedly low count of paternal mutations, and only a modest overrepresentation of mutations at CpG-sites. Despite the surprising nature of these results, we found both the rate and spectrum to be robust to the manipulation of a wide range of computational filtering criteria. We also sequenced a technical replicate to estimate a false negative and false positive rate for our data and show that any point estimate of a de novo mutation rate should be considered with a large degree of uncertainty. To validate these observations, we conducted an independent analysis of context-dependent substitution types for gray mouse lemur and five additional primate species for which de novo mutation rates have also been estimated. These comparisons revealed general consistency of the mutation spectrum between the pedigree-based and the substitution rate analyses for all species compared.

Methods

Mutation rate estimation was done with a single pedigree using 10x linked read data. Alignment to the Mmur3.0 reference genome was done with LongRanger with initial variant calling by GATK. Mutations were called with two software 1) DeNovoGear and 2) VarScan2. VCFs produced by both methods are provided.

Context-dependent substitution rate analyses were done by MULTIDIVTIME, based on stochastic mappings from PhyloBayes. Analyses were based on whole-genome alignments from Ensembl, which included the Mmur3.0 reference genome. A Concatenated whole-genome alignment was used to produce 10 randomly selected alignments that were 1Mb in length. Scripts for analysing data with MCMCTREE and the resulting variance-covariance matrices from stochastic mappings (https://github.com/HuiJieLee/ParsePhyloBayes) with MULTIDIVTIME are provided.

Usage notes

There are three folders available which have readme files describing their contents:

1) VCF - VCF files resulting from mutation-calling software

2) SubstitutionRates - Contains folders for analyses with MCMCTREE and MULTIDIVTIME. Includes alignments used for analyses of context-dependent substitution rates.

3) MutationList - Contains a simple csv file with all of the final mutations analyzed in the manuscript and a readme describing the fields.

10x reads are deposited in NCBI's SRA database as bam files with identifiers SRR10130788-SRR10130796. Additional metadata and pedigree information are available in the supplementary material.

Funding

National Science Foundation, Award: DEB-1354610

National Science Foundation, Award: DEB-1754142

John Simon Guggenheim Memorial Foundation

Alexander von Humboldt Foundation