A single-parasite transcriptional atlas of Toxoplasma gondii reveals novel control of antigen expression
Data files
Feb 20, 2020 version files 1.78 GB
-
191216_submission_scripts.tar
1.78 GB
Abstract
Cells were sorted with FACS in 384-well plates. Smart-seq2 and Nextera library preparation were performed as previously described and the resulting libraries were sequenced on NovaSeq 6000 using 2x150 bp paired-end sequencing. BCL output files from sequencing were converted into gzip compressed FastQs via a modified bcl2fastq demultiplexer which is designed to handle the higher throughput per sequencing run. To generate genome references with spike-in sequences, we concatenated ME49 or RH genome references (version 36 on ToxoDB) with ERCC sequences. The raw fastq files are aligned to the concatenated genomes with STAR aligner (version 2.6.0c) using the following settings: “--readFilesCommand zcat --outFilterType BySJout --outFilterMutlimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.04 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --outSAMstrandField intronMotif --outSAMtype BAM Unsorted --outSAMattributes NH HI AS NM MD --outFilterMatchNminOverLread 0.4 --outFilterScoreMinOverLread 0.4 --clip3pAdapterSeq CTGTCTCTTATACACATCT --outReadsUnmapped Fastx”. Transcripts were counted with a custom htseq-count script (version 0.10.0, https://github.com/simon-anders/htseq) using ME49 or RH GFF3 annotations (version 36 on ToxoDB) concatenated with ERCC annotation. Instead of discarding reads that mapped to multiple locations, we modified htseq-count to add transcript counts divided by the number of genomic locations with equal alignment score, thus rescuing measurement of duplicated genes in the Toxoplasma genome. Parallel jobs of STAR alignment and htseq-count were requested automatically by Bag of Stars (https://github.com/iosonofabio/bag_of_stars) and computed on Stanford high- performance computing cluster Sherlock 2.0. Estimation of reads containing exonic and intronic regions is computed with Velocyto estimation on the BAM output files and requested automatically by Bag of Velocyto (https://github.com/xuesoso/bag_of_velocyto) on Sherlock 2.0. Gene count matrix is obtained by summing up transcripts into genes using a custom python script. Scanpy velocyto package is then used to estimate transcriptional velocity on a given reduced dimension. Parameters used for generating the results are supplied as supplementary python scripts. Sample code to generate the analysis figures are provided in supplementary jupyter notebooks.