The study of the koala transcriptome has the potential to advance our understanding of its immunome—immunological reaction of a given host to foreign antigens—and to help combat infectious diseases (e.g., chlamydiosis) that impede ongoing conservation efforts. We used Illumina sequencing of cDNA to characterize genes expressed in two different koala tissues of immunological importance, blood and spleen. We generated nearly 600 million raw sequence reads, and about 285 million of these were subsequently assembled and condensed into ~70,000 subcomponents that represent putative transcripts. We annotated ~16 % of these subcomponents and identified those related to infection and the immune response, including Toll-like receptors (TLRs), RIG-I-like receptors (RLRs), major histocompatibility complex (MHC) genes, and koala retrovirus (KoRV). Using phylogenetic analyses, we identified 29 koala genes in these target categories and report their concordance with currently accepted gene groups. By mapping multiple sequencing reads to transcripts, we identified 56 putative SNPs in genes of interest. The distribution of these SNPs indicates that MHC genes (34 SNPs) are more diverse than KoRV (12 SNPs), TLRs (8 SNPs), or RLRs (2 SNPs). Our sequence data also indicate that KoRV sequences are highly expressed in the transcriptome. Our efforts have produced full-length sequences for potentially important immune genes in koala, which should serve as targets for future investigations that aim to conserve koala populations.
DeconSeq vertebrate rRNA database
This is a fasta file containing the custom vertebrate ribosomal RNA database to be used with DeconSeq to remove rRNA contaminants from sequencing data. It was created by extracting vertebrate rRNA sequences from NCBI using the query "(ribosomal RNA) AND "vertebrates"[porgn:_txid7742]", checking the sequence descriptions by eye (removing undesired sequences), and then following the procedure to create a custom database from the DeconSeq manual (http://deconseq.sourceforge.net/manual.html). Briefly, long stretches of ambiguous bases (Ns) were removed, and then the sequences were filtered (using PRINSEQ) to remove short sequences (< 200 bp), those with >10 ambiguous bases (Ns), and duplicate sequences. Finally, the fasta file was indexed using BWA and used as the database for all DeconSeq filtering in this project.
vetebrateRRNA_trimmed.split.filtered.fasta
spleen/buffy coat combined trinity assembly
RNA was extracted from the spleen and buffy coat (white blood cell portion of a blood sample) of one and two individuals from the San Diego Zoo koala colony, respectively, via TRIzol (Life Technologies) and Direct-zol (Zymo Research) according to manufacturer's instructions. ~5 ug of RNA was used to create cDNA libraries for each sample according to the Illumina TruSeq RNA sample preparation manual and using random hexamer priming. The buffy coat libraries were barcoded, pooled, and sequenced on one lane of the Illumina HiSeq 2000 platform while the spleen library was sequenced on one lane of the Illumina HiSeq 2500 platform. The raw reads were then filtered for quality (PHRED Q score > 20) and length ( > 30 bp). Ribosomal RNA reads were reduced using DeconSeq (version 0.4.2) and a custom vertebrate rRNA database. The remaining reads were then assembled using Trinity (version r2013-02-25) and the assembled sequences are contained within.
trinity.fasta.gz
trinotate annotation report of spleen/buffy coat trinity transcripts
The Trinity transcripts (contained within spleen/buffy coat combined trinity assembly) were annotated using the Trinotate pipeline (part of the Trinity package version r2013-02-25). Briefly this pipeline searches the transcripts for open reading frames, BLASTs the resulting protein sequences to the NCBI's SwissProt database, and assigns GO terms off of the best hit. It also searches the transcripts for conserved PFAM domains.
08-02-13 trinotate_annotation_report_copy.txt.gz
alignments of target protein sequences
The target gene open reading frame protein sequences (MHC, TLR, RLR, KoRV) from the Trinotate analysis were first aligned with eutherian, marsupial, and other vertebrate sequences using Jalview 2.8.0b1 and MUSCLE using default parameters. An intital neighbor joining tree was created (using MEGA 5.2.2) and used to guide the alignments of each individual gene clade via Jalview and MUSCLE (with the exception of MHC sequences which were aligned using HMMER 3.1b and PFAM seed alignments). These gene group alignments were then combined using COACH (version 11/21/2002 Linux) and additional sequences were added as needed. The ends of the alignments were trimmed, when necessary, and both the full and trimmed alignments (.fas format) are included in this file package. For more detailed methods please refer to the associated publication.
Target_gene_alignments.tar.gz
bootstrap trees (maximum likelihood and neighbor joining) for target protein sequences
The target protein alignments (MHC, TLR, RLR, KoRV) were used to build both maximum likelihood and neighbor joining trees using MEGA 5.2.2. The alignments and PROTEST 3.2 were used to determine the most appropriate substitution model to use while building the tree. For more detailed methods, please refer to the associated publication. Each tree is presented here in Newick format. The trees were run with 1,000 replicates and the branch lengths represent the bootstrap values. The accession numbers for the sequences used in these trees can be found in "Online Resource 1" of the associated publication.
Bootstrap_trees.tar.gz