Data from: Challenges and solutions for analyzing dual RNA-seq data for non-model host/pathogen systems
Cite this dataset
O'Keeffe, Kayleigh R.; Jones, Corbin D. (2018). Data from: Challenges and solutions for analyzing dual RNA-seq data for non-model host/pathogen systems [Dataset]. Dryad. https://doi.org/10.5061/dryad.t40nj78
1. Dual RNA-seq simultaneously profiles the transcriptomes of a host and pathogen during infection and may reveal the mechanisms underlying host-pathogen interactions. Dual RNA-seq is inherently a mixture of transcripts from at least two species (host and pathogen), so this mixture must be computationally sorted into host and pathogen components. Sorting relies on aligning reads to respective reference genomes, which may be unavailable for both species in non-model host-pathogen pairs. This lack of genomic resources may present challenges to applying dual RNA-seq to non-model systems. 2. We assessed the accuracy of alignments of dual RNA-seq when using the genomic resources of a closely-related species to the species of interest by simulating datasets of mixed transcripts from a host and pathogen. Specifically, we compared how different aligners performed across different proportions of pathogen to host transcripts and across variation in the genetic distance between the pathogen genome and reference genome. We performed extensive analyses for a host plant with fungal pathogen, and then we extended the plant-fungus results by repeating key analyses in vertebrate (human)-fungus and vertebrate-bacterium systems. 3. Aligners that were able to map pathogen transcripts to the reference genome of a species closely related to the pathogen (a “related reference genome”) also mismapped transcripts originating from the host to the pathogen’s related reference genome, which results in regions where this occurred being quantified as overexpressed. If a host reference genome was available, we show that to minimize host transcript mismapping while retaining the ability to map pathogen transcripts, one could concatenate it with the pathogen’s related reference genome, then map transcripts to the concatenated genomes. If a host genome was unavailable, assembling reads de novo prior to aligning substantially decreased host read mismapping, while retaining the ability to map pathogen transcripts to a related reference genome. 4. The application of dual RNA-seq to organisms without reference genomes is currently limited. We propose an analytical workflow that leverages the genomic resources of species closely related to species of interest to facilitate application of dual RNA-seq to reveal the mechanisms of host-pathogen interactions across a wider array of systems.