Skip to main content
Dryad logo

mRNA editing analysis of Doryteuthis pealeii


Medina Ruiz, Sofia et al. (2022), mRNA editing analysis of Doryteuthis pealeii, Dryad, Dataset,


Cephalopods are known for their large nervous systems, complex behaviors and morphological innovations. To investigate the genomic underpinnings of these features, we assembled the chromosomes of the Boston market squid Doryteuthis (Loligo) pealeii and the California two-spot octopus, Octopus bimaculoides, and compared them with those of the Hawaiian bobtail squid, Euprymna scolopes. The genomes of the soft-bodied (coleoid) cephalopods are highly rearranged relative to other extant molluscs, indicating an intense, early burst of genome restructuring. The coleoid genomes feature multi-megabase, tandem arrays of genes associated with brain development and cephalopod-specific innovations. We find that another coleoid hallmark, extensive A-to-I mRNA editing, displays two fundamentally distinct patterns: one exclusive to the nervous system and concentrated in genic sequences, the other widespread and directed toward repetitive elements. We conclude that coleoid novelty is mediated in part by substantial genome reorganization, gene family expansion, and tissue-dependent mRNA editing.


Files provided on this repository are used as inputs for the analysis of the RNA-editing in Doryteuthis (Loligo) pealeii.

Usage Notes

Raw transcriptome sequence data used in the current study (SRA as Bioproject PRJNA641326) is identified by 'Albertin' tag on the file name (for reference, alternate, depth, and annotation files). Analysis of a separate Doryteuthis pealeii specimen[1] following our analysis pipeline has the *Alon* identifier. 

All files presented on this repository are tabulated.


File Description Main RNA editing annotation table used to create all the manuscript figures (edit sites overlapping genic features that do not overlap with genomic variants). The columns integrate the annotation from PFAM, TMHMM, Repeat overlap. Each row corresponds to a unique edit site identified by {Chr:Position}. Find the column description: README_Source_Editing_Albertin.txt, Annotation of ADAR target sites on the Doryteuthis pealeii reference genome., Edit frequencies. The A>G edit sites (rows) and individual tissues samples (columns). Refer to, Number of reads with reference nucleotide 'A' for all edit sites (rows) and individual tissues samples (columns). Refer to, Number of reads with alternate/edited nucleotide 'G' for all edit sites (rows) and individual tissues samples (columns). Refer to, Sum of reads with 'A' or 'G' for a given edit site (rows) and individual tissues samples (columns). Refer to Weighted average edit frequencies and read depth for Neural and Non-neural samples for each edit target (rows). Number of Adenosines in genic regions, categorized by genomic feature (3', 5', Intron, Rec/recoding), SJ/splice junction, and Syn/synonymous) and subcategorized by the presence of repeats (True: Overlapping repeat; False: Non-overlapping repeat). The gene orientation strand orientation was is taken into account for these calculations. The numbers were obtained from the genomic variant calls.
Edit_prot_pos_intersect.bed bedtools intersect output from Edited aminoacids and PFAM domains. Columns = ['GeneID', 'Prot_pos', 'Prot_pos_v2', 'Gene_ID2', 'PFAM_start', 'PFAM_end', 'PFAM']. The first three columns correspond to the positions of edited amino acids. The remaining columns are the overlap with the PFAM table.
Edit_prot_pos_TMHMM_intersect.bed bedtools intersect output from Edited aminoacids and TMHMMv.2 transmembrane annotation. Columns = ['GeneID', 'Prot_pos', 'Prot_pos_v2', 'Gene_ID2', 'TMHMM_start', 'TMHMM_end', 'Location']. Note: The duplicate Prot_pos variable corresponds to the position of edited amino acid on the protein. The term 'Location' refers to the location of the amino acid with respect to the TMHMM annotation (outside, inside, or TMhelix protein segments) Genotypes of D. opalescens and H. bleekeri at sites where Doryteuthis pealeii is 'Adenosine' on coding regions. Genotypes derive from high-confidence genomic calls obtained from genomic shotgun sequence reads against the reference D. pealeii genome.
Dpealeiiv2.gene_exons.filt_Chr.gff3 GFF3 file of filtered exons
DpealeiiV2.filtered.annot.txt Gene description file of filtered genes
featureCount.tophat.M.primary.ignoreDup.filtered_tpm.txt TPM expression table obtained using tophat



[1] Alon, S. et al. The majority of transcripts in the squid nervous system are extensively recoded by A-to-I RNA editing. eLife (2015) doi:10.7554/eLife.05198.


Austrian Science Fund, Award: P30686-B29

NSF, Award: IOS-1354898

National Institutes of Health, Award: 5UL1TR002389-02

National Institutes of Health, Award: UL1 TR000430

Grass Foundation

Marine Biological Laboratory, Award: Hibbitt Early Career Fellowship

Marine Biological Laboratory, Award: Whitman Fellowship

Chan-Zuckerberg BioHub