Skip to main content
Dryad

Gasterosteus aculeatus gynogenetic reference genome and functional annotations version 1 and raw PacBio and Illumina data

Data files

Jan 20, 2023 version files 47.12 GB

Abstract

Whole genome sequencing enables us to ask fundamental questions about the genetic basis of adaptation, population structure, and epigenetic mechanisms, but usually requires a suitable reference genome for making sense of the sequence data. While the availability of reference genomes has significantly improvement in both taxonomic coverage and overall quality, this poses a challenge for researchers in determining which reference genome best suits their data. Here we compare the use of two different reference genomes for the three-spined stickleback (Gasterosteus aculeatus), one novel genome from a European individual and the published reference genome of a North American individual. Specifically, we investigate the impact of using a local reference versus one generated from a differentiated population on several commonly used metrics in population genomics. Through mapping genome resequencing data of 60 sticklebacks from across Europe and North America, we confirmed genome quality is an important factor in choosing a reference genome. A local reference genome did offer increased mapping efficiency and genotyping accuracy, likely stemming from the higher similarity in genome sequence and synteny. Despite comparable distributions of the metrics generated across the genome using SNP data (i.e., π, Tajima’s D, and FST), window-based statistics using different references resulted in different outlier genes and enriched gene functions. In contrast, the marker-based analysis utilising DNA methylation distributions had a considerably higher overlap in outlier genes and functions when using different reference genomes. Overall, our results highlight how using a local reference genome can increase the resolution of genome scans when multiple similar-quality reference genomes are available. Such results have implications in the detection of signatures of selection.