Skip to main content
Dryad

Two-step mixed model approach to analyzing differential alternative RNA splicing: Datasets and R scripts for analysis of alternative splicing

Cite this dataset

Luo, Li et al. (2020). Two-step mixed model approach to analyzing differential alternative RNA splicing: Datasets and R scripts for analysis of alternative splicing [Dataset]. Dryad. https://doi.org/10.5061/dryad.66t1g1k0h

Abstract

Changes in gene expression can correlate with poor disease outcomes in two ways: through changes in relative transcript levels or through alternative RNA splicing leading to changes in relative abundance of individual transcript isoforms. The objective of this research is to develop new statistical methods in detecting and analyzing both differentially expressed and spliced isoforms, which appropriately account for the dependence between isoforms and multiple testing corrections for the multi-dimensional structure of at both the gene- and isoform- level. We developed a linear mixed effects model-based approach for analyzing the complex alternative RNA splicing regulation patterns detected by whole-transcriptome RNA-sequencing technologies. This approach thoroughly characterizes and differentiates three types of genes related to alternative RNA splicing events with distinct differential expression/splicing patterns. We applied the concept of appropriately controlling for the gene-level overall false discovery rate (OFDR) in this multi-dimensional alternative RNA splicing analysis utilizing a two-step hierarchical hypothesis testing framework. In the initial screening test we identify genes that have differentially expressed or spliced isoforms; in the subsequent confirmatory testing stage we examine only the isoforms for genes that have passed the screening tests. Comparisons with other methods through application to a whole transcriptome RNA-Seq study of adenoid cystic carcinoma and extensive simulation studies have demonstrated the advantages and improved performances of our method. Our proposed method appropriately controls the gene-level OFDR, maintains statistical power, and is flexible to incorporate advanced experimental designs.

Methods

The dataset was collected through whole-transcriptome RNA-Sequencing technologies. The processing method was described in the manuscript.

Usage notes

  • Tools.R: An R script file that includes all the functions for performing the analysis with the method proposed in our manuscript.
  • Tutorial.html: A step-by-step tutorial illustrating how to perform the analyses that we have proposed in our manuscript using the functions included in Tools.R.
  • data4demo.RData: The R data set used in the above tutorial. It consists of a subset of 400 genes from the RNA-Sequencing study of adenoid cystic carcinoma (ACC) reported in our manuscript with the genes selected for ease of computation and demonstration. The data set contains isoform abundances of the 400 genes from 14 ACC patients. Eight of the patients are free of cancer, while the rest of the six are not during the follow-up time. 
  • ACC.html: This is an html documentation to explain the detailed steps and R codes for the application of our proposed method to an RNA-sequencing study of adenoid cystic carcinoma (ACC).
  • acc.RData: R Data set used in the above analysis of the RNA-Sequencing Study of Adenoid Cystic Carcinoma (ACC). The data set contains isoform abundances of the 2820 genes from 14 ACC patients. Eight of the patients are free of cancer, while the rest of the six are not during the follow-up time.
  • AML.html: This is an html documentation to explain the detailed steps and R codes for the application of our proposed method to an RNA-sequencing study of Pediatric Acute Myeloid Leukemia (AML).
  • AML_isoform_outcome.RData: R Data set used in the above analysis of the RNA-Sequencing Study of Pediatric Acute Myeloid Leukemia (ACC).

Funding

National Institute of Dental and Craniofacial Research, Award: R01DE023222

National Cancer Institute, Award: R01CA170250

National Cancer Institute, Award: P30CA118100