Data from: A de novo chromosome-level genome assembly of Coregonus sp. “Balchen”: one representative of the Swiss Alpine whitefish radiation
Cite this dataset
Feulner, Philine; De-Kayne, Rishi; Zoller, Stefan (2020). Data from: A de novo chromosome-level genome assembly of Coregonus sp. “Balchen”: one representative of the Swiss Alpine whitefish radiation [Dataset]. Dryad. https://doi.org/10.5061/dryad.xd2547ddf
Salmonids are of particular interest to evolutionary biologists due to their incredible diversity of life-history strategies and the speed at which many salmonid species have diversified. In Switzerland alone, over 30 species of Alpine whitefish from the subfamily Coregoninae have evolved since the last glacial maximum, with species exhibiting a diverse range of morphological and behavioural phenotypes. This, combined with the whole genome duplication which occurred in the ancestor of all salmonids, makes the Alpine whitefish radiation a particularly interesting system in which to study the genetic basis of adaptation and speciation and the impacts of ploidy changes and subsequent rediploidization on genome evolution. Although well curated genome assemblies exist for many species within Salmonidae, genomic resources for the subfamily Coregoninae are lacking. To assemble a whitefish reference genome, we carried out PacBio sequencing from one wild-caught Coregonus sp. “Balchen” from Lake Thun to ~90x coverage. PacBio reads were assembled independently using three different assemblers, Falcon, Canu and wtdbg2 and subsequently scaffolded with additional Hi-C data. All three assemblies were highly contiguous, had strong synteny to a previously published Coregonuslinkage map, and when mapping additional short-read data to each of the assemblies, coverage was fairly even across most chromosome-scale scaffolds. Here, we present the first de novogenome assembly for the Salmonid subfamily Coregoninae. The final 2.2 Gb wtdbg2 assembly included 40 scaffolds, an N50 of 51.9 Mb, and was 93.3% complete for BUSCOs. The assembly consisted of ~52% TEs and contained 44,525 genes.
For detailed methods please see the associated publication.
This repository contains supplementary files for De-Kayne, Zoller and Feulner 2020
- Coregonus sp "Balchen" genome assembly GCA_902810595.1
- The main assembly was produced with wtdbg2 and is on ENA: https://www.ebi.ac.uk/ena/browser/view/GCA_902810595.1
- The files included are:
- AWG_v2.AED0.6_pseudo_contigs.embl.gz - the embl format annotation for unscaffolded contigs
- AWG_v2_GO_annotation.out - the functional 'GO' annotation for the full GCA_902810595.1 assembly (made using Pannzer2)
- AWG_v2_MakerOutput.gff - the unfiltered/raw Maker annotation for the full GCA_902810595.1 assembly
- Canu.AWG.fasta.gz - the scaffolded Canu assembly containing 40 scaffolds and 3513 contigs
- Falcon.AWG.fasta.gz - the scaffolded Falcon assembly containing 40 scaffolds and 3705 contigs
- WF_RepeatLibrary.lib - the repeat library used to identify repeats in the whitefish genome (methods can be found in the paper)
Swiss National Science Foundation, Award: 31003A_163446/1