The impact of genetic adaptation on chimpanzee subspecies differentiation
Data files
Nov 08, 2019 version files 648.96 MB
-
allchr_pbs4nj.rds
-
allchr.20mam.nochimpbon.most_conserved.tc0.125.el20.4dsites.panTro4mapped_assembled.autosomes.bed.gz
-
allchr.20mam.nochimpbon2.tc0.125.el20.genic.snps.bed.gz
-
allchr.derived.allele.counts.humanAA.bed.gz
-
chimp.genes.human.names.homologs.and.amb.homology_mar.17.gtf
-
chimp.genes.human.names.homologs.and.amb.homology_mar.17.human.go.gene.set.txt
-
chr_all_western_chimp_allchr_LDhat_map_pantro4.sort.bedno_dup.bed_genetic_map_consistent.bed.gz
-
PBSnj_function.R
Abstract
Chimpanzees, humans’ closest relatives, are in danger of extinction. Aside from direct human impacts such as hunting and habitat destruction, a key threat is transmissible disease. As humans continue to encroach upon their habitats, which shrink in size and grow in density, the risk of inter-population and cross-species viral transmission increases, a point dramatically made in the reverse with the global HIV/AIDS pandemic. Inhabiting central Africa, the four subspecies of chimpanzees differ in demographic history and geographical range, and are likely differentially adapted to their particular local environments. To quantitatively explore s genetic adaptation, we investigated the genic enrichment for SNPs highly differentiated between chimpanzee subspecies. Previous analyses of such patterns in human populations exhibited limited evidence of adaptation. In contrast, chimpanzees show evidence of recent positive selection, with differences among subspecies. Specifically, we observe strong evidence of recent selection in eastern chimpanzees, with highly differentiated SNPs being uniquely enriched in genic sites in a way that is expected under recent adaptation but not under neutral evolution or background selection. These sites are enriched for genes involved in immune responses to pathogens, and for genes inferred to differentiate the immune response to infection by simian immunodeficiency virus (SIV) in natural vs. non-natural host species. Conversely, central chimpanzees exhibit an enrichment of signatures of positive selection only at cytokine receptors, due to selective sweeps in CCR3, CCR9 and CXCR6 –paralogs of CCR5 and CXCR4, the two major receptors utilized by HIV to enter human cells. Thus, our results suggest that positive selection has contributed to the genetic and phenotypic differentiation of chimpanzee subspecies, and that viruses likely play a predominate role in this differentiation, with SIV being a likely selective agent. Interestingly, our results suggest that SIV has elicited distinctive adaptive responses in these two chimpanzee subspecies.
Usage notes
PBSnj_function.R
R function to calculate scaled PBSnj statistics. Requires data.table and ape. Input data is a complete fst matrix for 4 taxa as columns, rows are SNPs/variant postions.
Read code comments for simple usage.
allchr.derived.allele.counts.humanAA.bed.gz
This file contains the derived allele counts for all autosomal variants. sample sizes for each sub-species: eastern = 19; central = 19; Nigeria-Cameroon = 10; Western = 11. These data were originally described by de Manuel et. al 2016. We excluded the hybrid Donald, a nominally western chimpanzee. This is a tab delimted file. Columns: chr, start, end, ref allele, alt allele, human reference allele, derived eastern allele count, n eastern allele count, d central, n central, d Nigeria-Cameroon, n Nigeria-Cameroon, d western, n western.
chr_all_western_chimp_allchr_LDhat_map_pantro4.sort.bedno_dup.bed_genetic_map_consistent.gz
This is the modified Pan genetic map, originally inferred by Auton et. al. 2012. These map data were inferred from sequences aligned to the pantro2.1.2 genome, so we used two successive liftover steps to convert the coordinates of sites used to infer the genetic map to pantro2.1.4 coordinates: pantro2.1.2 to pantro2.1.3, then pantro2.1.3 to pantro2.1.4. Two steps are required as there are no liftover chains relating pantro2.1.2 to pantro2.1.4. Of the 5,323,278 autosomal markers, 33,263 were not lifted from pantro2.1.2 to pantro2.1.3. The remaining 5,290,015 were also successfully converted to pantro2.1.4 coordinates. After liftover we filtered sites that after the two steps were mapped to unassigned scaffolds or the X chromosome, which left 5,289,844 SNPs. Next, we sorted loci by position to correct cases where their relative order was scrambled. This left a final number of 5,289,460 autosomal SNPs. Recombination rates were then recalculated by linear interpolation between consecutive markers (marker x, marker y) using the average of their estimated recombination rates (rate x, rate y).
allchr_pbs4nj.rds
These is an R data.table object. Each row (SNP) contains the scaled (0-1 scale) PBSnj for the eastern, central, Nigeria-Cameroon, western, and internal branches. These were generated form the full pairwise matrix of FST values, per SNP/site. requires the R package "data.table". Saved with saveRDS, with 'xz' compression.
allchr.20mam.nochimpbon.most_conserved.tc0.125.el20.4dsites.panTro4mapped_assembled.autosomes.gz
These are phastCons predicted conserved elements, in pantro2.1.4 coordinates. We used the 20-way multiz mammalian alignment from UCSC. To reduce the chance that polymorphism in chimpanzees affects inference of conservation, we removed both the chimp and bonobo reference genomes form these alignments. We estimated the phylogenetic models from fourfold degenerate (nonconserved model) and codon first position sites (conserved model). We then predicted base conservation scores and conserved fragments using the following options: --target-coverage 0.25 --expected-length 30. Resultant conserved elements covered 69.24% of the human exome, or an enrichment of 17.27. We note that although we attempted to remove the Pan branch from our estimates, it is impossible to completely avoid the use of these genomes, for example, when converting predicted conserved elements from hg38 to pantro2.1.4.
allchr.20mam.nochimpbon2.tc0.125.el20.genic.snps.bed.gz
These are phastCons scores for all genic SNPs, in pantro2.1.4 coordinates. We used the 20-way multiz mammalian alignment from UCSC. To reduce the chance that polymorphism in chimpanzees affects inference of conservation, we removed both the chimp and bonobo reference genomes form these alignments. We estimated the phylogenetic models from fourfold degenerate (nonconserved model) and codon first position sites (conserved model). We then predicted base conservation scores and conserved fragments using the following options: --target-coverage 0.25 --expected-length 30. Resultant conserved elements covered 69.24% of the human exome, or an enrichment of 17.27. We note that although we attempted to remove the Pan branch from our estimates, it is impossible to completely avoid the use of these genomes, for example, when converting predicted conserved elements from hg38 to pantro2.1.4.
chimp.genes.human.names.homologs.and.amb.homology_mar.17
This is a manually curated set of genes that are 1-1 homologs between chimpanzee and humans. ENSEMBL homology information was used to generate this dataset.
chimp.genes.human.names.homologs.and.amb.homology_mar.17.human.go.gene.set
This is a manually curated set of Gene Ontology annotations, corresponding to those chimpanzee genes (pantro2.1.4 assembly) defined as being 1-1 homologs. GO terms were taken from human homologs.