Skip to main content
Dryad

A relatively higher mutation rate in African humans is the dominant mechanism causing Neanderthals to appear closer to non-Africans, not introgression

Cite this dataset

Amos, Bill (2022). A relatively higher mutation rate in African humans is the dominant mechanism causing Neanderthals to appear closer to non-Africans, not introgression [Dataset]. Dryad. https://doi.org/10.5061/dryad.q2bvq83n9

Abstract

It is widely thought that humans carry genetic legacies due to inter-breeding with Neanderthals, but all methods used to infer legacies ignore recurrent mutations and assume a constant mutation rate. These assumptions cause automatic rejection of a second hypothesis, where a higher mutation rate in Africans caused increased divergence from Neanderthals. Any fair test should strive to treat both hypotheses equally. Here I use mutation spectra, the relative frequencies of different mutating three-base combinations, to compare contrasting expectations from the two hypotheses. I find that putative introgressed bases are strongly enriched for recurrent mutations and lie in regions with unusually high mutation rates, distorted mutation spectra and unusually large African minus non-African heterozygosity differences. Moreover, putative introgressed bases should be absent from Africa and rare outside, yet almost the entire signal of introgression is carried by sites where putative Neanderthal alleles are fixed in non-Africans and at high frequency in Africans. Together, these observations support a model where signals of introgression are driven mostly or even entirely by mutation rate differences between human populations. This new model helps to explain why introgression is inferred ubiquitously, including in scenarios involving great apes where inter-breeding is biologically implausible. 

Methods

These data comprise compressed processed master files for individual chromosomes containing aligned sequence data for humans (1000 genomes phase 3, either the low coverage GR37 or high coverage GR38), Neanderthals (the three high coverage genomes), the Denisovan and the chimpanzee. These files were compiled using custom C++ scripts (Archaic Masters) / vcftools (Master files). Full details of the file format are given in the ReadMe file. All raw source files are available in public data repositories. 

Usage notes

These masterfiles are designed to be processed using the C++ scripts given in Electonic Supplementary Materials in the paper. Bioinformaticians (which I am not!) may well be able to analyse the public domain data rapidly and directly, but I choose to use these derivative files to speed up analyses. To conduct all analyses presented in the paper, you will need to download individual GR38 fasta files for each human chromosome (https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/) and edit the C++ scripts to make sure that the paths are all valid for the local computer. I am more than happy to help advise as far as I am able with any problems encountered.