Data for: Heterosigma akashiwo transcriptome gene annotations
Data files
Mar 13, 2023 version files 35.25 MB
-
Dryad_REaDME_Dataset_Heterosigmakashiwo_transcriptome.txt
2.68 KB
-
ICRV01_blast-match.txt
1.17 MB
-
ICRV01_GO.txt
25.43 MB
-
ICRV01_pfam.txt
8.64 MB
Abstract
Heterosigma akashiwo is a eukaryotic, cosmopolitan, and unicellular alga (class: Raphidophyceae), and produces fish-killing blooms. There is a substantial scientific and practical interest in its ecophysiological characteristics that determine bloom dynamics and its adaptation to broad climate zones. A well-annotated genomic/genetic sequence information enables researchers to characterize organisms using modern molecular technology. In the present study, we conducted H. akashiwo RNA sequencing, a de novo transcriptome assembly of 84,693,530 high-quality deduplicated short-read sequences. The obtained RNA reads were assembled by Trinity assembler and 144,777 contigs were identified with N50 values of 1085. The raw data were deposited in the NCBI SRA database (BioProject PRJDB6241 and PRJDB15108), and the assemblies are available in NCBI TSA database (ICRV01). Total 60,877 open reading frames with the length of 150 bp or greater were predicted. Here, the top Gene Ontology terms, the pfam hits, and the BLAST hits were annotated for all the predicted genes, and shared as text files.
For functional characterization of the predicted gene models, the transcriptome (ICRV01, ICRV01000001-ICRV01144777, https://www.ncbi.nlm.nih.gov/Traces/wgs/ICRV01) was subjected to gene ontology (GO) analysis, BLASTP search, and Pfam domain search.
The GO terms were assigned to the predicted peptides in a two-step process. First, the best-match homologs of the H. akashiwo peptides were identified following a BLASTP search (E-value<1) of a custom database composed of RefSeq gene models of Arabidopsis thaliana, Homo sapiens, Mus musculus, and Saccharomyces cerevisiae (S288C). Second, the H. akashiwo peptides were annotated with the GO terms (http://geneontology.org) assigned to their best-match homologs. The Pfam database (http://pfam.xfam.org) was used to predict the domains in the H. akashiwo gene models.
The provided data are all in text format.