Skip to main content
Dryad

Plink files from NYGC 1KG to be used for Cancer GWAS project QBIO475

Data files

Oct 30, 2025 version files 29.26 GB

Click names to download individual files Select up to 11 GB of files for zip download

Abstract

Cancer risk is influenced by genetic variation and environment. However, allele frequencies for many cancer-associated variants remain poorly characterized across global populations. To address this gap and provide a framework for teaching population genetics using real human genomic data, undergraduate researchers analyzed population-level allele frequency variation for a curated set of cancer-associated single nucleotide polymorphisms (SNPs). We assembled a set of variants from the GWAS Catalog databases based on reported associations with hereditary cancers, including breast, ovarian, colorectal, and lung cancer. Allele frequencies were extracted from the 1000 Genomes Project across five major continental groups. Students quantified differences in allele frequencies across these populations. The dataset includes curated PLINK files that may be used for future research or educational purposes in human genetics and bioinformatics.