Skip to main content
Dryad

Data for: Predicting m7G sites used by autoBioSeqpy

Data files

Dec 26, 2022 version files 301.24 KB

Abstract

As a vital post-transcriptional RNA modification, N7-methylguanosine plays a key role in the regulation of gene expression. Precise identification of the m7G sites is a crucial first step toward understanding its biological function and regulatory mechanisms. Although whole-genome sequencing is the gold standard for RNA modification site detection, it is time-consuming, expensive, and relatively complex. In recent years, computational approaches have become increasingly popular in achieving this goal, especially with the rise of deep learning techniques. Some deep learning algorithms, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), have emerged as versatile approaches for modeling biological sequence data. However, designing a successful network architecture with excellent performance remains a challenge because it requires substantial professional knowledge and commitment of time and effort. We previously developed a tool called autoBioSeqpy to efficiently design and apply deep learning networks for the classification of biological sequences. autoBioSeqpy has many unique features that make it a practical tool for a broad range of biological questions. In this work, we used autoBioSeqpy to develop, train, evaluate and analyze sequence-level deep learning models for predicting the m7G sites. We here provided a detailed description of the various models it implements, as well as a step-by-step guide on how to execute them. The same strategy can also be applied in other systems to address similar biological questions.