Genotyping by sequencing data for thirty-six modern and two historic populations of Arctic Bell-Heather (Cassiope tetragona)
Data files
Feb 07, 2026 version files 8.66 GB
-
Cassiope_Population_information_Dryad.txt
6.66 KB
-
README.md
2.50 KB
-
reference.fasta
5.07 MB
-
TotalRawSNPs.vcf
8.65 GB
Abstract
We sequenced C. tetragona from 36 extant pan-Arctic populations, two historic (250- to 500-year-old) populations sampled from under glacial ice in the Canadian High Arctic, and an outgroup (C. mertensiana) with genotyping by sequencing. We found 26,350 single-nucleotide polymorphisms (SNPs) (shared in a vcf file) when calling genetic variants from a de novo reference genome that we assembled.
https://doi.org/10.5061/dryad.pg4f4qrxq
Data Summary
The raw single-nucleotide polymorphisms from genotyping-by-sequencing (GBS) libraries of Cassiope tetragona plants from 36 circumpolar Arctic research sites and ~250-500-year-old samples collected under glacial ice on Ellesmere Island, Canada. Data was used in "Multiple Pleistocene refugia for Arctic Bell-Heather revealed with genomic analyses of modern and historic plants" to determine the demographic history of Cassiope tetragona populations around the Arctic.
Data processing
All code and analyses for generating and then working with this vcf file can be found on Github here: https://github.com/celphin/Population_genomics_Cassiope
Description of the data and file structure
-
Cassiope_Population_information_Dryad.txt: Text file including sample details for each sample in the vcf file. Columns include:
- Sampling location code
- Location lat and long
- Number of samples from that location
-
TotalRawSNPs.vcf: Variant call file (vcf) with all SNPs found prior to any filtering. Built using dDocent.
The VCF file includes the following columns standard to the format:
#CHROM: contig number from the de novo reference.fasta assembly (see below) POS: Position of the SNP on the contig ID: Identifier of the SNP REF: Reference nucleotide ALT: Alternate nucleotide(s) QUAL: Quality score of the SNP FILTER: Filtering information INFO: Additional information (e.g., allele frequency, number of samples) Sample columns: One per individual, contain genotype information for that individual, individuals are identified by their location ID from Cassiope_Population_information_Dryad.txt -
reference.fasta: De novo GBS reference assembly (fasta file) made in Rainbow (through dDocent).
Fasta files have a header line for every contig sequence starting with ">" that contains the contig name
The line(s) directly following the header contain the nucleotide sequence for that contig
Sharing/Access information
Raw sequence data are available in the GenBank/SRA database under accession number SUB11222726 and BioProject ID PRJNA824830.
Related manuscript: https://doi.org/10.1111/jbi.14961
We built genotyping-by-sequencing (GBS) libraries using Cassiope tetragona tissue from 36 Arctic locations, including two ∼250-500-year-old populations collected under glacial ice on Ellesmere Island, Canada. We assembled a de novo GBS reference and called variants in dDocent.
Detailed methods can be found here: https://doi.org/10.1111/jbi.14961
- Elphinstone, Cassandra; Hernández, Fernando; Todesco, Marco et al. (2024). Multiple Pleistocene refugia for Arctic Bell‐Heather revealed with genomic analyses of modern and historic plants. Journal of Biogeography. https://doi.org/10.1111/jbi.14961
- Elphinstone, Cassandra; Hernandez, Fernando; Todesco, Marco et al. (2023). Multiple Pleistocene refugia for Arctic White Heather ( Cassiope tetragona ) supported by population genomics analyses of contemporary and Little-Ice-Age samples [Preprint]. openRxiv. https://doi.org/10.1101/2023.07.05.547859
