Kellet's whelk genome and transcriptome assembly
Data files
Nov 08, 2023 version files 2.83 GB
-
final_transcriptome.fasta
-
README.md
-
Scaffolds_pass1.fa
-
transcripts.fasta.transdecoder.genome.gff3
Jan 25, 2024 version files 2.83 GB
-
final_transcriptome.fasta
-
README.md
-
Scaffolds_pass1.fa
-
transcripts.fasta.transdecoder.genome.gff3
Abstract
Understanding genomic characteristics of non-model organisms can help bridge gaps in ecology and evolutionary sciences, but lack of a reference genome and transcriptome for these species challenges their study. We advance this goal by conducting the first full genome and transcriptome sequence assembly and analysis of the non-model organism Kellet’s whelk, Kelletia kelletii, a marine gastropod and fisheries species exhibiting a northern range expansion along the US west coast that is potentially driven by climate change. We used a combination of Oxford Nanopore Technologies, PacBio, and Illumina platforms for sequencing, and integrated a set of bioinformatic pipelines to create a comprehensive and contiguous de novo genome assembly. Our results represent the most complete and continuous documented genome among the Buccinoidea superfamily to date. Genome validation revealed its relatively high completeness with low missing metazoan BUSCOs, and an average coverage of ~70x for all contigs, indicating a robust assembly. Characteristics of the K. kelletii genome showed that short-read data contributed significantly to genome coverage and accuracy; however, long-read data was imperative to the completeness and continuity of the genome assembly. Genome annotation identified a large number of protein-coding genes compared to other closely related species, suggesting the presence of a complex genome structure. We conducted the transcriptome assembly and analysis of individuals during their period of peak embryonic development, and revealed highly expressed genes associated with specific GO terms and metabolic pathways, most notably lipid, carbohydrate, glycan, and phospholipid metabolism. We also identified numerous heat shock proteins (HSPs) in the transcriptome and genome with a potential association between the transcriptional expansion of HSP families and the marine environment experienced by the sessile life history stage of the developing embryo. This study offers a valuable reference genome and transcriptome for conducting comprehensive bioinformatic analyses of the non-model organism K. kelletii. Such resources will enhance our understanding of its ecology and evolution, as well as that of other coastal marine species facing environmental changes.
README: Kellet's whelk genome and transcriptome assembly
https://doi.org/10.5061/dryad.w0vt4b8zn
Description of the data and file structure
See https://github.com/bndaniel/Kellets-whelk-genome-assembly for methods used for the assembly and analysis of the genome and transcriptome.
Files
Genome assembly: Scaffolds_pass1.fasta
Genome annotation: transcripts.fasta.transdecoder.genome.gff3
Transcriptome assembly: final_transcriptome.fasta
Sharing/Access information
Data was derived from the following sources:
- All raw sequence data, including the PacBio sequel 2, Nanopore MinION, and Illumina NovaSeq DNA sequencing, as well as the Illumina NovaSeq RNA sequencing, are deposited in NCBI Sequence Read Archive (SRA) under PRJNA999368: https://www.ncbi.nlm.nih.gov/sra/PRJNA999368 and PRJNA1000198: https://www.ncbi.nlm.nih.gov/sra/PRJNA1000198.
Code/Software
See methods and commands used at https://github.com/bndaniel/Kellets-whelk-genome-assembly/blob/main/genome\_assembly/commands