Resources for gasAcu1-4, a new stickleback reference genome
Data files
Nov 30, 2020 version files 117.84 MB
-
gasAcu1-4.2bit
-
gasAcu1-4ToGasAcu1.over.chain.gz
-
gasAcu1ToGasAcu1-4.over.chain.gz
Abstract
gasAcu1-4 is a new version of the stickleback reference genome. It only minorly differs from the 2017 Hi-C guided assembly by Peichel, Sullivan, Liachko, and White (https://doi.org/10.1093/jhered/esx058) to improve the subtelomeric Pitx1 locus and the mitochondrial genome. We here present basic resources for utilizing this version of the reference genome: the fasta sequence of the assembly and liftOver chains for converting coordinates between this reference version and the original (Broad S1) gasAcu1 assembly (https://www.nature.com/articles/nature10944).
This assembly can be visualized and other data downloaded via the Table Browser of the UCSC Genome Browser by copying the following track hub URL into the “My Hubs” tab at https://genome.ucsc.edu/cgi-bin/hgHubConnect: https://sbwdev.stanford.edu/kingsleyAssemblyHub/hub.txt.
Methods
The 2017 Hi-C guided assembly by Peichel, Sullivan, Liachko, and White (https://doi.org/10.1093/jhered/esx058) was modified to address two long-standing issues that have persisted throughout several versions of the stickleback reference genome:
First, the subtelomeric region of chrVII is of great biological interest due to its well-documented role in controlling pelvic spine development (Shapiro et al. 2004). However, due to its highly repetitive nature, it is extremely difficult to assemble and many important sequences, including the key gene Pitx1, are missing entirely from existing genome assemblies, while other sequences in the region are scattered in small unassembled scaffolds. We address this issue by including the sequence from Salmon River BAC clones (Genbank GU130435) (Chan et al. 2010) as chrP and removing overlapping fragmented sequences from chrUn and the end of chrVII. We note that chrP is derived from a marine population, while the rest of the genome is from a freshwater population (Bear Paw Lake), so all analyses concerning this chromosome must be interpreted with care.
Second, the mitochondrial genome was previously split into two fragments buried within chrUn, while a separate mitochondrial genome sequence from Northern Japan was added as chrM. We corrected these issues by removing the duplicated Bear Paw Lake mitochondrial genome from chrUn and using it to replace the exogenous chrM sequence, resulting in a single copy of the mitochondrial genome derived entirely from Bear Paw Lake.
The Peichel et al. 2017 reference genome was not modified in any other ways.