Data from: Gene modelling and annotation for the Hawaiian bobtail squid, Euprymna scolopes
Data files
Dec 18, 2023 version files 258.70 MB
-
eupsc_models_v2.2_cds.fa
-
eupsc_models_v2.2_interproscan.tsv
-
eupsc_models_v2.2_prot.fa
-
eupsc_models_v2.2.gtf
-
eupsc_models_v2.2.tags.gtf
-
README.md
Abstract
Coleoid cephalopods possess numerous complex, species-specific morphological and behavioural adaptations, e.g., a uniquely structured nervous system that is the largest among the invertebrates. The Hawaiian bobtail squid Euprymna scolopes is one of the most established cephalopod species. With its recent publication of the chromosomal-scale genome assembly and regulatory genomic data, it also emerges as a key model for cephalopod gene regulation and evolution. However, the latest genome assembly has been lacking a native gene model set. Our manuscript describes the generation of new long-read transcriptomic data and, combined with a plethora of available transcriptomic datasets, a new reference annotation for E. scolopes.
README
Euprymna scolopes gene annotation for V2 genome
This repo contains the gene annotation (new BRAKER2 models, combined with some genes from Belcaid et al. 2019) for the Hawaiian bobtail squid, Euprymna scolopes genome (Schmidbaur et al., 2022). It is part of the following manuscript, in press at Scientific Data: Towards a comprehensive gene annotation for the Hawaiian bobtail squid, Euprymna scolopes. Thea F. Rogers, Gözde Yalçın, John Briseno, Nidhi Vijayan, Spencer V. Nyholm, Oleg Simakov.
The annotation files are as follows:
eupsc_models_v2.2.gtf #Gene annotation GTF
eupsc_models_v2.2_cds.fa #Coding sequence file
eupsc_models_v2.2_prot.fa #Protein sequence file
eupsc_models_v2.2_interproscan.tsv #Protein annotation
eupsc_models_v2.2.tags.gtf #Same gene annotation gtf as eupsc_models_v2.2.gtf but with explicit exon lines
Information on column headers for eupsc_models_v2.2.gtf and eupsc_models_v2.2.tags.gtf is as follows:
- seqname - name of the chromosome or scaffold; chromosome names can be given with or without the 'chr' prefix. Important note: the seqname must be one used within Ensembl, i.e. a standard chromosome name or an Ensembl identifier such as a scaffold ID, without any additional content such as species or assembly. See the example GFF output below.
- source - name of the program that generated this feature, or the data source (database or project name)
- feature - feature type name, e.g. Gene, Variation, Similarity
- start - Start position* of the feature, with sequence numbering starting at 1.
- end - End position* of the feature, with sequence numbering starting at 1.
- score - A floating point value.
- strand - defined as + (forward) or - (reverse).
- frame - One of '0', '1' or '2'. '0' indicates that the first base of the feature is the first base of a codon, '1' that the second base is the first base of a codon, and so on..
- attribute - A semicolon-separated list of tag-value pairs, providing additional information about each feature.
Information on column headers for eupsc_models_v2.2_interproscan.tsv is as follows:
- Protein accession (e.g. P51587)
- Sequence MD5 digest (e.g. 14086411a2cdf1c4cba63020e1622579)
- Sequence length (e.g. 3418)
- Analysis (e.g. Pfam / PRINTS / Gene3D)
- Signature accession (e.g. PF09103 / G3DSA:2.40.50.140)
- Signature description (e.g. BRCA2 repeat profile)
- Start location
- Stop location
- Score - is the e-value (or score) of the match reported by member database method (e.g. 3.1E-52)
- Status - is the status of the match (T: true)
- Date - is the date of the run
- InterPro annotations - accession (e.g. IPR002093)
- InterPro annotations - description (e.g. BRCA2 repeat)
- GO annotations with their source(s), e.g. GO:0005515(InterPro)|GO:0006302(PANTHER)|GO:0007195(InterPro,PANTHER). This is an optional column; only displayed if the
--goterms
option is switched on - Pathways annotations, e.g. REACT_71. This is an optional column; only displayed if the
--pathways
option is switched on
List of commands and scripts used to generate the annotation as well as resources provided to reviewers can be found in GitHub under:
https://github.com/TheaFrances/E.scolopes-V2.2-BRAKER2-gene-annotation
Methods
See methods in: Gene modelling and annotation for the Hawaiian bobtail squid, Euprymna scolopes, Rogers et al. (2023) Scientific Data.
Usage notes
See methods in: Gene modelling and annotation for the Hawaiian bobtail squid, Euprymna scolopes, Rogers et al. (2023) Scientific Data.