Data from: Nonsense-mediated decay of alternative pre-mRNA splicing variants is a major determinant of the Arabidopsis steady state transcriptome

Drechsel, Gabriele1; Kahles, André2; Kesarwani, Anil K.1; Stauffer, Eva1; Behr, Jonas2; Drewe, Philipp2; Rätsch, Gunnar2; Wachter, Andreas1

Published Oct 07, 2014 on Dryad. https://doi.org/10.5061/dryad.hb7j1

Data files

Oct 07, 2014 version files 39.56 MB

tpc115485SupplementalDS1.xlsx

39.04 MB
tpc115485SupplementalDS2.xlsx

79.92 KB
tpc115485SupplementalDS3.xlsx

62.44 KB
tpc115485SupplementalDS4.xlsx

323.42 KB
tpc115485SupplementalDS5.xlsx

51.26 KB

Abstract

The nonsense-mediated decay (NMD) surveillance pathway can recognize erroneous transcripts and physiological mRNAs, such as precursor mRNA alternative splicing (AS) variants. Currently, information on the global extent of coupled AS and NMD remains scarce and even absent for any plant species. To address this, we conducted transcriptome-wide splicing studies using Arabidopsis thaliana mutants in the NMD factor homologs UP FRAMESHIFT1 (UPF1) and UPF3 as well as wild-type samples treated with the translation inhibitor cycloheximide. Our analyses revealed that at least 17.4% of all multi-exon, protein-coding genes produce splicing variants that are targeted by NMD. Moreover, we provide evidence that UPF1 and UPF3 act in a translation-independent mRNA decay pathway. Importantly, 92.3% of the NMD-responsive mRNAs exhibit classical NMD-eliciting features, supporting their authenticity as direct targets. Genes generating NMD-sensitive AS variants function in diverse biological processes, including signaling and protein modification, for which NaCl stress–modulated AS-NMD was found. Besides mRNAs, numerous noncoding RNAs and transcripts derived from intergenic regions were shown to be NMD responsive. In summary, we provide evidence for a major function of AS-coupled NMD in shaping the Arabidopsis transcriptome, having fundamental implications in gene regulation and quality control of transcript processing.

Supplemental Data Set 1. Read Statistics of RNA-Seq Data and Computational Analysis of Transcriptome-Wide AS and Gene Expression.

(A) Alignment statistics of all RNA-seq reads derived from Illumina sequencing. (B) - (E) Event based alternative splicing analysis based on comparisons of WT vs. lba1 upf3-1 double mutant (B), WT vs. lba1 (C) and WT vs. upf3-1 (D) single mutants, and Mock vs. CHX treatment (E) datasets. For each AS event and comparison, p and Q values from testing AS variant ratio changes in either one (up) or the other (down) direction as well as the minimum (min) values are provided. Furthermore, rankings within each list according to the p values are provided. (F) Combined list of alternative splicing analyses (based on single comparisons displayed in (B) - (E)) allowing comparison of different datasets on a single event basis. Use matrix in columns R-U to analyze differential gene expression (GE) or AS pattern changes (TE) for the indicated comparisons by changing thresholds for FDR or p values. The matrix allows considering single tests as well as combinations, and both maximum (MAX THRESHOLDS) and minimum (MIN THRESHOLDS) cut-off values can be set. Number of significant changes with given settings are displayed under “SIGNIFICANCE COUNTS”. Results for single events can be viewed and sorted using columns A-P, with values “1” and “0” indicating “FALSE” and “TRUE”, respectively, for fulfilling the criteria set in the matrix shown in columns R-U. Note that the logic provided in this spreadsheet only works if sorting in sheets (B) - (E) is unchanged. (G) Splice Index Score (Percent spliced in, PSI) of all tested alternative splicing events in all samples and replicates (R) analyzed. (H) Table for internal lookup to compute differential gene expression data. Sorting of this table must not be changed. For analyzing differential gene expression use sheet (I). (I) Differential gene expression analysis for all genes and samples. Numbers provide p values. Matrix in columns P-AB can be used to enter gene types and cut-off values for the individual samples, displaying the total number of genes (“COUNT”) fulfilling the set criteria. Further information and a detailed description of the computational pipeline are provided in Supplemental Methods.

tpc115485SupplementalDS1.xlsx

Supplemental Data Set 2. Analysis of NMD Target Features for All Events.

(A) General information on mapping of all and the significantly changed alternative splicing events. (B) AS event positions relative to the cds of the representative transcript model annotated in TAIR10 for all AS events and those significantly changed in the different samples. Subsets as described in Figure 3B. (C) Analysis of NMD target feature frequencies for the AS events significantly changed in the indicated samples. For each AS event and dependent on the direction of the AS ratio change, one of the two corresponding splicing variants was assigned to the control sample (WT or Mock treatment), whereas the other was assigned to the NMD-impaired sample (“Δ NMD”). This splicing variant assignment then allowed counting how many of those contained classical NMD features, separately analyzing events mapping to the 5’ UTR, cds, and 3’ UTR. NMD feature inspection included upstream open reading frames (uORFs), translation initiation site (TIS) overlapping uORFs, occurrence of PTCs leading to 3’ UTRs > 347 nts, splice junctions more than 50 nts downstream of a stop codon, and PTC-independent, long 3’ UTRs > 347 nts. (D) Frequency patterns of NMD-eliciting features in different datasets. Occurrence of NMD features described in (C) were analyzed considering the following categories: splicing variant assigned to the control, but not to the NMD impairment has NMD feature (1,0), splicing variant assigned to the NMD impairment, but not to the control has NMD feature (0,1), both splicing variants have NMD feature (1,1), and none splicing variant has NMD feature (0,0). (E) 3’ UTR length distribution for transcripts assigned to the control or NMD-impaired samples for the indicated subsets. Assignments of two splicing variants for each event as described in (C). (F) 5’ UTR length distribution for transcripts assigned to the control or NMD-impaired samples for the indicated subsets. Assignments of two splicing variants for each event as described in (C). (G) Numbers of genes containing single or multiple significantly changed AS events for the indicated subsets. For the genes with multiple events, transcripts with all possible combinations were assembled and analyzed for the rescue of a PTC introduced by a single event. Further details on data analysis are provided in Supplemental Methods.

tpc115485SupplementalDS2.xlsx

Supplemental Data Set 3. Analysis of NMD Target Features for Genes Containing Single Events.

(A) AS event positions relative to the cds of the representative transcript model annotated in TAIR10 for all AS events and those significantly changed in the different samples. Subsets as described in Figure 3B. (B) Analysis of NMD target feature frequencies for the AS events significantly changed in the indicated samples. For each AS event and dependent on the direction of the AS ratio change, one of the two corresponding splicing variants was assigned to the control sample (WT or Mock treatment), whereas the other was assigned to the NMD-impaired sample (“Δ NMD”). This splicing variant assignment then allowed counting how many of those contained classical NMD features, separately analyzing events mapping to the 5’ UTR, cds, and 3’ UTR. NMD feature inspection included upstream open reading frames (uORFs), translation initiation site (TIS) overlapping uORFs, occurrence of PTCs leading to 3’ UTRs > 347 nts, splice junctions more than 50 nts downstream of a stop codon, and PTC-independent, long 3’ UTRs > 347 nts. (C) Frequency patterns of NMD-eliciting features in different datasets. Occurrence of NMD features described in (B) were analyzed considering the following categories: splicing variant assigned to the control, but not to the NMD impairment has NMD feature (1,0), splicing variant assigned to the NMD impairment, but not to the control has NMD feature (0,1), both splicing variants have NMD feature (1,1), and none splicing variant has NMD feature (0,0). (D) 3’ UTR length distribution for transcripts assigned to the control or NMD-impaired samples for the indicated subsets. Assignments of two splicing variants for each event as described in (B). (E) 5’ UTR length distribution for transcripts assigned to the control or NMD-impaired samples for the indicated subsets. Assignments of two splicing variants for each event as described in (B). Further details on data analysis are provided in Supplemental Methods.

tpc115485SupplementalDS3.xlsx

Supplemental Data Set 4. Categorization of NMD-Regulated and Reference Gene Sets into Functional Subgroups.

(A) Functional categorization of genes derived from the different subsets described in Figure 3B as well as all annotated genes based on the TAIR10 release and all genes displaying AS evidence based on our data (“all AS”). Furthermore, indicated subsets of cassette exon (CE)-containing genes were analyzed. Category bin and names and corresponding counts are listed. (B) Combination of MapMan based functional classifications into different functional subgroups as shown in table on the left side. Numbers of genes falling into these combined categories for the indicated subsets are shown. Below each set, data are also displayed in pie charts. Hypergeometrical test for differential representation of the combined categories are provided for the individual subsets. (C) Differential gene expression data for annotated splicing factors in NMD-impaired samples versus their respective controls based on the analysis described in Supplemental Dataset 1I. Furthermore, expression of the annotated, representative isoforms was analyzed in WT and lba1 upf3-1 using rQuant as described in Supplemental Methods.

tpc115485SupplementalDS4.xlsx

Supplemental Data Set 5. Expressed Intergenic Regions and NMD Impairment-Responsive ncRNAs and Pseudogenes.

(A) Differential gene expression data for all annotated ncRNAs based on the analysis described in Supplemental Dataset 1I. Separate display of ncRNAs differentially expressed (either up- or downregulated) in the indicated samples. (B) Differential gene expression of annotated pseudogenic transcripts based on the analysis described in Supplemental Dataset 1I. The matrix provides counts for genes and pseudogenes differentially expressed (p < 0.01) in the indicated NMD-impaired samples relative to their corresponding controls. (C) Expression of identified intergenic regions; listed are total read counts for the indicated subsets and regions. Significantly expressed intergenic regions are marked by an asterisk, colored in green are the regions analyzed in Figure 7D. Furthermore, information on the numbers of intergenic regions, which are more than twofold up- or downregulated in the indicated types of NMD impairments versus their corresponding controls, is indicated below the table. For more detailed information on data analysis see Supplemental Methods.

tpc115485SupplementalDS5.xlsx