Data from: The genome of the cryopelagic Antarctic bald notothen, Trematomus borchgrevinki
Data files
Nov 06, 2024 version files 295.97 MB
-
README.md
2.56 KB
-
tbor.afgp.gtf
2.64 KB
-
tbor.agp.gz
76.67 KB
-
tbor.cds.fa.gz
10.76 MB
-
tbor.fa.gz
272.25 MB
-
tbor.gtf.gz
6.38 MB
-
tbor.protein.fa.gz
6.49 MB
Abstract
The Antarctic bald notothen, Trematomus borchgrevinki (Notothenioidae) occupies a high latitude, ice-laden environment and represents an extreme example of cold-specialization among fishes. We present the first, high quality, long-read genome of a female T. borchgrevinki genome comprised of 23 putative chromosomes, the largest of which is 65 megabasepairs (Mbp) in length. The total length of the genome 935.13 Mbp, composed of 2,095 scaffolds, with a scaffold N50 of 42.80 Mbp. Annotation yielded 22,567 protein coding genes while 54.75% of the genome was occupied by repetitive elements; an analysis of repeats demonstrated that an expansion occurred in recent time. Conserved synteny analysis revealed that the genome architecture of T. borchgrevinki is largely maintained with other members of the notothenioid clade, although several significant translocations and inversions are present, including the fusion of orthologous chromosomes 8 and 11 into a single element. This genome will serve as a cold-specialized model for comparisons to other members of the notothenioid adaptive radiation.
https://doi.org/10.5061/dryad.h44j0zpv7
This dataset contains the genome assembly and associated annotation for the Antarctic bald notothen, Trematomus borchgrevinki. It is associated with the following publication:
Niraj Rayamajhi, Angel G. Rivera-Colón, Bushra Fazal Minhas, C.-H. Christina Cheng, Julian M. Catchen. (2024) The genome of the cryopelagic Antarctic bald notothen, Trematomus borchgrevinki.
Description of the data and file structure
File | Description |
---|---|
tbor.fa.gz |
Genome assembly in FASTA format. File contains 2,094 sequences, including 23 chromosome-scale scaffolds. Total genome length of 935.13 Megabasepairs (Mbp) and a scaffold N50 of 42.8 Mbp. |
tbor.agp.gz |
Assembly structure file in AGP format. |
tbor.gtf.gz |
Genome annotation in GTF format. |
tbor.afgp.gtf |
Manual genome annotation of AFGP genes in GTF format. |
tbor.cds.fa.gz |
CDS for all annotated protein-coding genes in FASTA format. |
tbor.protein.fa.gz |
Amino acid sequences for all annotated protein-coding genes in FASTA format. |
Sharing/Access information
A version of this data is also available on NCBI under BioProject PRJNA907802
. Raw PacBio and Hi-C reads can be also found in the NCBI Sequence Read Archive under accessions SRX18476836
and SRX24931301
, respectively.
A female T. borchgrevinki individual was sampled from McMurdo Sound, Antarctica (77.5°S, 165°E). The fish was caught by hook and line through holes drilled through annual sea ice and transported back to the aquarium facility at McMurdo Station. The fish was anesthetized and tissues were dissected on ice, flash frozen in liquid nitrogen and stored in a -80oC freezer until use. High molecular weight (HMW) DNA was prepared from frozen muscle using the Nanobind HMW Tissue DNA Kit following vendor instructions.
PacBio continuous long read (CLR) library preparation and sequencing were carried out at the University of Oregon Genomics & Cell Characterization Core Facility. The HMW DNA was lightly sheared at ~75 Kbp target length for library construction using PacBio SMRTbell Express Template Prep Kit 2.0. The resulting library was selected for inserts approximately greater than 25 Kbp with the BluePippin (Sage Science) and sequenced on one SMRT cell 8M on Sequel II for 30 hours of data capture.
A Hi-C library was constructed from liver DNA of the same T. borchgrevinki individual by Phase Genomics, Inc. using their Proximo Hi-C kit and then sequenced on an Illumina NovaSeq6000 machine to generate 2x150bp paired-end reads.
Raw CLR long-reads were filtered with ycard and fpa and then assembled with the Flye assembler. The resulting assembled contigs were then scaffolded with Juicer and manually curated via conserved synteny analysis. Genes were annotated using the Braker3 pipeline and functions were assigned using the Interproscan pipeline. Gene names were assigned based on orthology to the zebrafish (Danio rerio) genome.