1. Dual RNA-seq simultaneously profiles the transcriptomes of a host and pathogen during infection and may reveal the mechanisms underlying host-pathogen interactions. Dual RNA-seq is inherently a mixture of transcripts from at least two species (host and pathogen), so this mixture must be computationally sorted into host and pathogen components. Sorting relies on aligning reads to respective reference genomes, which may be unavailable for both species in non-model host-pathogen pairs. This lack of genomic resources may present challenges to applying dual RNA-seq to non-model systems. 2. We assessed the accuracy of alignments of dual RNA-seq when using the genomic resources of a closely-related species to the species of interest by simulating datasets of mixed transcripts from a host and pathogen. Specifically, we compared how different aligners performed across different proportions of pathogen to host transcripts and across variation in the genetic distance between the pathogen genome and reference genome. We performed extensive analyses for a host plant with fungal pathogen, and then we extended the plant-fungus results by repeating key analyses in vertebrate (human)-fungus and vertebrate-bacterium systems. 3. Aligners that were able to map pathogen transcripts to the reference genome of a species closely related to the pathogen (a “related reference genome”) also mismapped transcripts originating from the host to the pathogen’s related reference genome, which results in regions where this occurred being quantified as overexpressed. If a host reference genome was available, we show that to minimize host transcript mismapping while retaining the ability to map pathogen transcripts, one could concatenate it with the pathogen’s related reference genome, then map transcripts to the concatenated genomes. If a host genome was unavailable, assembling reads de novo prior to aligning substantially decreased host read mismapping, while retaining the ability to map pathogen transcripts to a related reference genome. 4. The application of dual RNA-seq to organisms without reference genomes is currently limited. We propose an analytical workflow that leverages the genomic resources of species closely related to species of interest to facilitate application of dual RNA-seq to reveal the mechanisms of host-pathogen interactions across a wider array of systems.
76 single-end dual RNA-seq simulated sequencing dataset: 1% pathogen
Simulated dual RNA-seq dataset in which host is Arabidopsis thaliana and pathogen is Schizosaccharomyces octosporus. Dataset contains 10 million 76 single-end reads, and the reads are 1% pathogen and 99% host.
76_SE_0.01.fasta.gz
76 single-end dual RNA-seq simulated sequencing dataset: 5% pathogen
Simulated dual RNA-seq dataset in which host is Arabidopsis thaliana and pathogen is Schizosaccharomyces octosporus. Dataset contains 10 million 76 single-end reads, and the reads are 5% pathogen and 95% host.
76_SE_0.05.fasta.gz
76 single-end dual RNA-seq simulated sequencing dataset: 10% pathogen
Simulated dual RNA-seq dataset in which host is Arabidopsis thaliana and pathogen is Schizosaccharomyces octosporus. Dataset contains 10 million 76 single-end reads, and the reads are 10% pathogen and 90% host.
76_SE_0.1.fasta.gz
76 single-end dual RNA-seq simulated sequencing dataset: 20% pathogen
Simulated dual RNA-seq dataset in which host is Arabidopsis thaliana and pathogen is Schizosaccharomyces octosporus. Dataset contains 10 million 76 single-end reads, and the reads are 20% pathogen and 80% host.
76_SE_0.2.fasta.gz
76 single-end dual RNA-seq simulated sequencing dataset: 30% pathogen
Simulated dual RNA-seq dataset in which host is Arabidopsis thaliana and pathogen is Schizosaccharomyces octosporus. Dataset contains 10 million 76 single-end reads, and the reads are 30% pathogen and 70% host.
76_SE_0.3.fasta.gz
76 single-end dual RNA-seq simulated sequencing dataset: 40% pathogen
Simulated dual RNA-seq dataset in which host is Arabidopsis thaliana and pathogen is Schizosaccharomyces octosporus. Dataset contains 10 million 76 single-end reads, and the reads are 40% pathogen and 60% host.
76_SE_0.4.fasta.gz
76 single-end dual RNA-seq simulated sequencing dataset: 50% pathogen
Simulated dual RNA-seq dataset in which host is Arabidopsis thaliana and pathogen is Schizosaccharomyces octosporus. Dataset contains 10 million 76 single-end reads, and the reads are 50% pathogen and 50% host.
76_SE_0.5.fasta.gz
76 single-end dual RNA-seq simulated sequencing dataset: 60% pathogen
Simulated dual RNA-seq dataset in which host is Arabidopsis thaliana and pathogen is Schizosaccharomyces octosporus. Dataset contains 10 million 76 single-end reads, and the reads are 60% pathogen and 40% host.
76_SE_0.6.fasta.gz
76 single-end dual RNA-seq simulated sequencing dataset: 70% pathogen
Simulated dual RNA-seq dataset in which host is Arabidopsis thaliana and pathogen is Schizosaccharomyces octosporus. Dataset contains 10 million 76 single-end reads, and the reads are 70% pathogen and 30% host.
76_SE_0.7.fasta.gz
76 single-end dual RNA-seq simulated sequencing dataset: 80% pathogen
Simulated dual RNA-seq dataset in which host is Arabidopsis thaliana and pathogen is Schizosaccharomyces octosporus. Dataset contains 10 million 76 single-end reads, and the reads are 80% pathogen and 20% host.
76_SE_0.8.fasta.gz
76 single-end dual RNA-seq simulated sequencing dataset: 90% pathogen
Simulated dual RNA-seq dataset in which host is Arabidopsis thaliana and pathogen is Schizosaccharomyces octosporus. Dataset contains 10 million 76 single-end reads, and the reads are 90% pathogen and 10% host.
76_SE_0.9.fasta.gz
76 single-end dual RNA-seq simulated sequencing dataset: 100% pathogen
Simulated dual RNA-seq dataset in which host is Arabidopsis thaliana and pathogen is Schizosaccharomyces octosporus. Dataset contains 10 million 76 single-end reads, and the reads are 100% pathogen and 0% host.
76_SE_1.0.fasta.gz