Skip to main content
Dryad

tRNA anticodon cleavage by target-activated CRISPR-Cas13a effector

Cite this dataset

Ishita, Jain; Kolesnik, Matvey; Semenova, Ekaterina; Severinov, Konstantin (2024). tRNA anticodon cleavage by target-activated CRISPR-Cas13a effector [Dataset]. Dryad. https://doi.org/10.5061/dryad.sqv9s4n9w

Abstract

In this study, we conducted an analysis of RNA cleavage products mediated by the Type VI CRISPR-Cas system from the bacterium Leptotrichia shahii. Previous research has demonstrated that the LshCas13a effector protein, when loaded with CRISPR-RNA, exhibits collateral RNase activity upon recognizing the target transcript. To identify the products resulting from RNA cleavage mediated by the activated Type VI CRISPR-Cas system, we isolated total RNA samples from E. coli cells expressing either activated (targeting samples) or non-activated (non-targeting samples) LshCas13a effectors. Subsequently, the RNA molecules underwent sequencing using high-throughput techniques. Each experiment was performed in three biological replicates. The acquired data underwent processing to eliminate technical sequences and low-quality reads, followed by alignment to the reference genomic sequences. Subsequently, the counts of 5' end positions of the sequenced fragments were determined, and these counts were compared between targeting and non-targeting samples.

README: tRNA anticodon cleavage by target-activated CRISPR-Cas13a effector

https://doi.org/10.5061/dryad.sqv9s4n9w

Abstract

In this study, we conducted an analysis of RNA cleavage products mediated by the Type VI CRISPR-Cas system from the bacterium Leptotrichia shahii. Previous research has demonstrated that the LshCas13a effector protein, when loaded with CRISPR-RNA, exhibits collateral RNase activity upon recognizing the target transcript. To identify the products resulting from RNA cleavage mediated by the activated Type VI CRISPR-Cas system, we isolated total RNA samples from E. coli cells expressing either activated (targeting samples) or non-activated (non-targeting samples) LshCas13a effectors. In the experiments with the targeting of plasmid-borne transcript, expression of the target was induced by the addition of the anhydrotetracycline. In the experiments with phage infection, cells encoding or not encoding Type VI spacer targeting M13 transcript were infected with M13 phage. In in vitro experiments, purified LshCas13a proteins loaded with either targeting or non-targeting CRISPR-RNA were incubated with the target transcript in the presence of total RNA isolated from E. coli cells.

Subsequently, the RNA molecules underwent sequencing using high-throughput techniques. Each experiment was performed in three biological replicates. In the experiments with E. coli Δ10 strain, E. coli C3000 strain at 60 min post induction and in in vitro experiments with total RNA sequencing was performed in paired ends mode. The acquired data underwent processing to eliminate technical sequences and low-quality reads, followed by alignment to the reference genomic sequences. Subsequently, the counts of 5' end positions of the sequenced fragments were determined, and these counts were compared between targeting and non-targeting samples.

Description of the data and file structure

File name Sample description Experimental group
NT1_5min_S4_R1_001.fastq.gz E. coli C3000 cells encoding L. shahii Type VI CRISPR-Cas locus, 5 min post induction, non-targeting cells, replica 1 non-targeting
NT2_5min_S5_R1_001.fastq.gz E. coli C3000 cells encoding L. shahii Type VI CRISPR-Cas locus, 5 min post induction, non-targeting cells, replica 2 non-targeting
NT3_5min_S6_R1_001.fastq.gz E. coli C3000 cells encoding L. shahii Type VI CRISPR-Cas locus, 5 min post induction, non-targeting cells, replica 3 non-targeting
T1_5min_S1_R1_001.fastq.gz E. coli C3000 cells encoding L. shahii Type VI CRISPR-Cas locus, 5 min post induction, targeting cells, replica 1 targeting
T2_5min_S2_R1_001.fastq.gz E. coli C3000 cells encoding L. shahii Type VI CRISPR-Cas locus, 5 min post induction, targeting cells, replica 2 targeting
T3_5min_S3_R1_001.fastq.gz E. coli C3000 cells encoding L. shahii Type VI CRISPR-Cas locus, 5 min post induction, targeting cells, replica 3 targeting
NT1_60min_S16_R1_001.fastq.gz E. coli C3000 cells encoding L. shahii Type VI CRISPR-Cas locus, 60 min post induction, non-targeting cells, replica 1, forward reads non-targeting
NT1_60min_S16_R2_001.fastq.gz E. coli C3000 cells encoding L. shahii Type VI CRISPR-Cas locus, 60 min post induction, non-targeting cells, replica 1, reverse reads non-targeting
NT2_60min_S17_R1_001.fastq.gz E. coli C3000 cells encoding L. shahii Type VI CRISPR-Cas locus, 60 min post induction, non-targeting cells, replica 2, forward reads non-targeting
NT2_60min_S17_R2_001.fastq.gz E. coli C3000 cells encoding L. shahii Type VI CRISPR-Cas locus, 60 min post induction, non-targeting cells, replica 2, reverse reads non-targeting
NT3_60min_S18_R1_001.fastq.gz E. coli C3000 cells encoding L. shahii Type VI CRISPR-Cas locus, 60 min post induction, non-targeting cells, replica 3, forward reads non-targeting
NT3_60min_S18_R2_001.fastq.gz E. coli C3000 cells encoding L. shahii Type VI CRISPR-Cas locus, 60 min post induction, non-targeting cells, replica 3, reverse reads non-targeting
T1_60min_S13_R1_001.fastq.gz E. coli C3000 cells encoding L. shahii Type VI CRISPR-Cas locus, 60 min post induction, targeting cells, replica 1, forward reads targeting
T1_60min_S13_R2_001.fastq.gz E. coli C3000 cells encoding L. shahii Type VI CRISPR-Cas locus, 60 min post induction, targeting cells, replica 1, reverse reads targeting
T2_60min_S14_R1_001.fastq.gz E. coli C3000 cells encoding L. shahii Type VI CRISPR-Cas locus, 60 min post induction, targeting cells, replica 2, forward reads targeting
T2_60min_S14_R2_001.fastq.gz E. coli C3000 cells encoding L. shahii Type VI CRISPR-Cas locus, 60 min post induction, targeting cells, replica 2, reverse reads targeting
T3_60min_S15_R1_001.fastq.gz E. coli C3000 cells encoding L. shahii Type VI CRISPR-Cas locus, 60 min post induction, targeting cells, replica 3, forward reads targeting
T3_60min_S15_R2_001.fastq.gz E. coli C3000 cells encoding L. shahii Type VI CRISPR-Cas locus, 60 min post induction, targeting cells, replica 3, reverse reads targeting
d10_NT1_60min_S4_R1_001.fastq.gz E. coli Δ10 cells encoding L. shahii Type VI CRISPR-Cas locus, 60 min post induction, non-targeting cells, replica 1, forward reads non-targeting
d10_NT1_60min_S4_R2_001.fastq.gz E. coli Δ10 cells encoding L. shahii Type VI CRISPR-Cas locus, 60 min post induction, non-targeting cells, replica 1, reverse reads non-targeting
d10_NT2_60min_S5_R1_001.fastq.gz E. coli Δ10 cells encoding L. shahii Type VI CRISPR-Cas locus, 60 min post induction, non-targeting cells, replica 2, forward reads non-targeting
d10_NT2_60min_S5_R2_001.fastq.gz E. coli Δ10 cells encoding L. shahii Type VI CRISPR-Cas locus, 60 min post induction, non-targeting cells, replica 2, reverse reads non-targeting
d10_NT3_60min_S6_R1_001.fastq.gz E. coli Δ10 cells encoding L. shahii Type VI CRISPR-Cas locus, 60 min post induction, non-targeting cells, replica 3, forward reads non-targeting
d10_NT3_60min_S6_R2_001.fastq.gz E. coli Δ10 cells encoding L. shahii Type VI CRISPR-Cas locus, 60 min post induction, non-targeting cells, replica 3, reverse reads non-targeting
d10_T1_60min_S1_R1_001.fastq.gz E. coli Δ10 cells encoding L. shahii Type VI CRISPR-Cas locus, 60 min post induction, targeting cells, replica 1, forward reads targeting
d10_T1_60min_S1_R2_001.fastq.gz E. coli Δ10 cells encoding L. shahii Type VI CRISPR-Cas locus, 60 min post induction, targeting cells, replica 1, reverse reads targeting
d10_T2_60min_S2_R1_001.fastq.gz E. coli Δ10 cells encoding L. shahii Type VI CRISPR-Cas locus, 60 min post induction, targeting cells, replica 2, forward reads targeting
d10_T2_60min_S2_R2_001.fastq.gz E. coli Δ10 cells encoding L. shahii Type VI CRISPR-Cas locus, 60 min post induction, targeting cells, replica 2, reverse reads targeting
d10_T3_60min_S3_R1_001.fastq.gz E. coli Δ10 cells encoding L. shahii Type VI CRISPR-Cas locus, 60 min post induction, targeting cells, replica 3, forward reads targeting
d10_T3_60min_S3_R2_001.fastq.gz E. coli Δ10 cells encoding L. shahii Type VI CRISPR-Cas locus, 60 min post induction, targeting cells, replica 3, reverse reads targeting
M13_NT1_S10_R1_001.fastq.gz E. coli C3000 cells encoding L. shahii Type VI CRISPR-Cas locus infected with M13 phage, non-targeting cells, replica 1 non-targeting
M13_NT2_S11_R1_001.fastq.gz E. coli C3000 cells encoding L. shahii Type VI CRISPR-Cas locus infected with M13 phage, non-targeting cells, replica 2 non-targeting
M13_NT3_S12_R1_001.fastq.gz E. coli C3000 cells encoding L. shahii Type VI CRISPR-Cas locus infected with M13 phage, non-targeting cells, replica 3 non-targeting
M13_T1_S7_R1_001.fastq.gz E. coli C3000 cells encoding L. shahii Type VI CRISPR-Cas locus infected with M13 phage, targeting cells, replica 1 targeting
M13_T2_S8_R1_001.fastq.gz E. coli C3000 cells encoding L. shahii Type VI CRISPR-Cas locus infected with M13 phage, targeting cells, replica 2 targeting
M13_T3_S9_R1_001.fastq.gz E. coli C3000 cells encoding L. shahii Type VI CRISPR-Cas locus infected with M13 phage, targeting cells, replica 3 targeting
Ish_10_S23_R1_001.fastq.gz E. coli total RNA incubated with LshCas13a:CRISPR-RNA ribonucleoprotein in absence of target transcript, replica 1, forward reads non-targeting
Ish_10_S23_R2_001.fastq.gz E. coli total RNA incubated with LshCas13a:CRISPR-RNA ribonucleoprotein in absence of target transcript, replica 1, reverse reads non-targeting
Ish_11_S24_R1_001.fastq.gz E. coli total RNA incubated with LshCas13a:CRISPR-RNA ribonucleoprotein in absence of target transcript, replica 2, forward reads non-targeting
Ish_11_S24_R2_001.fastq.gz E. coli total RNA incubated with LshCas13a:CRISPR-RNA ribonucleoprotein in absence of target transcript, replica 2, reverse reads non-targeting
Ish_12_S25_R1_001.fastq.gz E. coli total RNA incubated with LshCas13a:CRISPR-RNA ribonucleoprotein in absence of target transcript, replica 3, forward reads non-targeting
Ish_12_S25_R2_001.fastq.gz E. coli total RNA incubated with LshCas13a:CRISPR-RNA ribonucleoprotein in absence of target transcript, replica 3, reverse reads non-targeting
Ish_7_S20_R1_001.fastq.gz E. coli total RNA incubated with LshCas13a:CRISPR-RNA ribonucleoprotein in presence of target transcript, replica 1, forward reads targeting
Ish_7_S20_R2_001.fastq.gz E. coli total RNA incubated with LshCas13a:CRISPR-RNA ribonucleoprotein in presence of target transcript, replica 1, reverse reads targeting
Ish_8_S21_R1_001.fastq.gz E. coli total RNA incubated with LshCas13a:CRISPR-RNA ribonucleoprotein in presence of target transcript, replica 2, forward reads targeting
Ish_8_S21_R2_001.fastq.gz E. coli total RNA incubated with LshCas13a:CRISPR-RNA ribonucleoprotein in presence of target transcript, replica 2, reverse reads targeting
Ish_9_S22_R1_001.fastq.gz E. coli total RNA incubated with LshCas13a:CRISPR-RNA ribonucleoprotein in presence of target transcript, replica 3, forward reads targeting
Ish_9_S22_R2_001.fastq.gz E. coli total RNA incubated with LshCas13a:CRISPR-RNA ribonucleoprotein in presence of target transcript, replica 3, reverse reads targeting

The scripts used for the analysis could be found in the archive LshCas13a_RNA_cleavage-master.zip. For the analysis, the raw fastq files should be put in the Data/Raw catalogs in the corresponding directory.

Each directory contains data and scripts for the particular experiment:

  • LshCas13a_C3000 - RNA-Seq of total RNA extracted from E. coli C3000 cells carrying activated/nonactivated LshCas13a enzyme;
  • LshCas13a_d10LVM - RNA-Seq of total RNA extracted from E. coli d10 cells carrying activated/nonactivated LshCas13a enzyme;
  • LshCas13a_in_vitro_total_RNA - RNA-Seq of total RNA extracted from E. coli C3000 cells after in vivo incubation with activated/nonactivated LshCas13a enzyme;
  • LshCas13a_M13_infection - RNA-Seq of total RNA extracted from E. coli C3000 cells harboring Type VI spacer against M13 phage and infected with M13 phage;

Each directory contains the following subdirectories:

  • Data - directory containing the raw reads data;
  • Annotations - directory containing GFF tables with genomic features;
  • Alignments - directory containing alignments produced with read_mapping.sh script;
  • Reference_sequences - directory containing FASTA files of sequences used for reads mapping;
  • Scripts - directory containing scripts for data processing;
  • Results - directory containing the results of data processing.

The "Results" directory contains the following subdirectories:

  • Tables
    • Ends_counts - contains files with coordinates of 5' ends of fragments;
    • Fragment_coords - contains files with coordinates of fragments (SeqID - Fragment_start - Fragment_end - Strand)
    • Merged_ends_counts - contains tables with 5' ends counts derived from samples designeted for comparison
    • Read_pairs_TABs - contains tables with coordinates of read pairs.
  • WIG_files - contains wig-files with 5' ends coverage.

The "Scripts" directory contains a set of scripts for the data processing. There is a "basic" set of scripts which is common for all experiments:

  • raw_data_processing.sh - performs reads quality assessment, removes adapters and discards low-quality reads.
    • Requirements:
      • fastqc, trimmomatic
  • read_mapping.sh - maps paired-end reads to the reference sequences. Since the SAM alignments file are quite large, the output data is compressed using gzip.
    • Requirements:
      • bowtie2
  • return_fragment_coords_table.py - receives alignment files (in gzipped SAM format) and generates all tables deposed in "Result-Tables" directory (except "Merged_ends_counts") and produces WIG files with 5' ends coverage;
    • Requirements:
      • python3 with gzip and pandas modules
  • merge_ends_count_tables.py - combines 5' ends counts tables from different tables into one table
    • Requirements:
      • python3 with pandas, gzip, re and functools modules
  • TCS_calling.R - performs statistical test producing table with the position, logFC and p-value values.
    • Requirements:
      • R with dplyr, data.table, tidyr and edgeR modules

Methods

Total RNA was isolated from cells pelleted at 1 hour post RFP induction. Cell lysis was done using Max Bacterial Enhancement Reagent (Invitrogen) for 4 min and then with TRIzol reagent (Invitrogen) for 5 min. RNA was extracted by chloroform and precipitated with isopropanol. RNA pellets were washed with 70% ethanol and then dissolved in nuclease free water and then treated with Turbo DNA-free kit (Invitrogen).

Total RNA samples were treated with MICROBExpress Bacterial mRNA Enrichment Kit (Invitrogen) for rRNA depletion prior to library preparation. To obtain both primary (5’ PPP) and processed (5’ P/5’ OH) transcripts, RNA samples were treated with RNA 5' Pyrophosphohydrolase (RppH) (NEB) for 30 min at 37C which removes pyrophosphate from 5’ end from triphosphorylated RNA to leave a 5’ monophoshphate RNA. Fragmentation was carried out by sonication using the Covaris protocol to obtain fragments of 200 nt size. T4 PNK (NEB) treatment was done to obtain 5’P ends for adapter ligation during library preparation. Samples were purified using the Zymo Research Oligo Clean and Concentrator kit. Library preparation for RNA sequencing was done using the NEBNext Multiplex Small RNA Library Prep Set for Illumina according to manufacturer’s protocol. BluePippin size selection was done using 2% agarose gel cassette (Sage Science) to select for 100 - 600 bp products. QC at each step was carried out by both Qubit and fragment analyzer. RNA sequencing was performed using Illumina NextSeq High-Output kit 2 × 35 bp paired end read length.

Raw RNA sequencing reads were filtered by quality with the simultaneous adapters removal using trimmomatic v. 0.36. The exact parameters of trimmomatic run could be found in raw_data_processing.sh file.

Processed reads were mapped onto reference sequences (RefSeq: NC_000913.3 supplemented with pC002 and pC008 plasmids in case of nontargeting samples and NC_000913.3 supplemented with pC003_RFP_spacer and pC008 plasmids in case of targeting samples) using bowtie2 v. 2.3.4.3 producing corresponding SAM files. In case of the analysis of data obtained from in vitro experiments on isolated E. coli total RNA reads were mapped only onto the NC_000913.3 sequence. The exact parameters of bowtie2 run could be found in read_mapping.sh file.For each nucleotide position of each strand of reference sequences the number of 5’ ends of aligned fragments were counted producing corresponding tables (see return_fragment_coords_table.py file for details).

The obtained tables were joined using merge_ends_count_tables.py script. The differences between the numbers of mapped 5’ ends in targeting and nontargeting samples were analyzed using edgeR package v. 3.26.3. The features (here, strand specific nucleotide positions) with low counts were excluded from the analysis. The TMM normalization method implemented in edgeR was applied. Next, the edgeR likelihood ratio test was performed. The obtained p-values were corrected using Benjamini-Hochberg method, and the result tables containing analyzed features with assigned log2FC and adjusted p-values were written to separate files (see TCS_calling.R file for details). The obtained trascript cleavage sites were overlapped with annotated genomic features using GenomicRanges package. Genome_build: NC_000913.3

Funding

National Institute of General Medical Sciences, Award: GM10407

St Petersburg University, Award: 73450983

Rowan University

Rutgers, The State University of New Jersey