Skip to main content
Dryad

Splice altering variant predictions in four archaic hominin genomes

Cite this dataset

Brand, Colin; Colbran, Laura; Capra, John Anthony (2022). Splice altering variant predictions in four archaic hominin genomes [Dataset]. Dryad. https://doi.org/10.7272/Q6H993F9

Abstract

This file contains high-quality autosomal SNVs that occur among four high-coverage archaic genomes aligned to the hg19/GRCh37 reference genome. Each entry corresponds to a single variant with a distinct GENCODE, Human Release 24, annotation per genomic position. Data per variant includes the genomic position, reference/alternate alleles, archaic genotypes, gene annotation, and additional data relevant to the analysis of splicing variants:

  • SpliceAI annotations
  • gene constraint measured using data from gnomAD
  • variant conservation measured using phyloP
  • allele origin
  • allele frequencies in modern humans from the Thousand Genomes Project and gnomAD
  • introgression metadata
  • sQTL data from GTEx

Methods

All data in this file are publicly available (see below). Archaic variants were filtered using bcftools to retain high-quality sites and high-quality genotypes. Missing data and irrelevant fields per variant are marked as "n/a". Only variants matching a filtered archaic variant were included from the other datasets (see below). The dataframe was created using Pandas in a Python Jupyter notebook.

Data used in this notebook:

Usage notes

Any text editor can be used to open this file. We recommend using software that can handle large dataframes well such as R or Python.

Funding

National Institute of General Medical Sciences, Award: R35GM127087