Skip to main content
Dryad

Data from: Cannabis labelling is associated with genetic variation in terpene synthase genes

Cite this dataset

Watts, Sophie et al. (2021). Data from: Cannabis labelling is associated with genetic variation in terpene synthase genes [Dataset]. Dryad. https://doi.org/10.5061/dryad.gqnk98smm

Abstract

Genetic data consisting of >100,000 single nucleotide polymorphisms (SNPs) collected using genotype-by-sequencing, from 137 drug-type Cannabis samples from the Netherlands. This genetic data along with terpene and cannabinoid content data collected with GC-FID, was used to analyze Cannabis labelling and to perfom a genome-wide association study.

Methods

The DNA sequence data are available as NCBI BioProject PRJNA713792. Calling of single nucleotide polymorphisms (SNPs) was performed in TASSEL (version 5.0) by aligning to the CBDRx reference genome. The SNP data were filtered using PLINK to exclude SNPs with a MAF <0.05 and SNPs with excess heterozygosity. The final SNP data set used for GWAS consisted of 116,296 SNPs from 137 samples. For PCA, 1,257 unanchored SNPs were removed and the remaining 115,039 SNPs were LD-pruned using PLINK resulting in 80,939 SNPs.

Usage notes

Please note that the chromosomes within these genetic data files are numbered based on the old numbering system. The chromosomes will need to be re-numbered according to the new chromosome numbering system for the CBDRx reference genome that was adopted in 2020 (https://www.ncbi.nlm.nih.gov/assembly/GCF_900626175.2).