The utility of reptile blood transcriptomes in molecular ecology
Cite this dataset
Schwartz, Tonia S et al. (2019). The utility of reptile blood transcriptomes in molecular ecology [Dataset]. Dryad. https://doi.org/10.5061/dryad.63xsj3tz8
Reptiles and other non-mammalian vertebrates have transcriptionally active nucleated red blood cells. If blood transcriptomes can provide quantitative data to address questions relevant to molecular ecology, this could circumvent the need to euthanize animals to assay tissues. This would allow longitudinal sampling of animals’ responses to treatments, as well as sampling of protected taxa. We developed and annotated blood transcriptomes from six reptile species. We found on average 25,000 proteins are being transcribed in the blood, and there is a CORE group of 9,282 orthogroups that are found in at least four of six species. In comparison to liver transcriptomes from the same taxa, approximately two-thirds of the orthogroups were found in both blood and liver; and a similar percentage of ecologically relevant gene groups (insulin and insulin-like signaling, electron transport chain, oxidative stress, glucocorticoid receptors) were found transcribed in both blood and liver. As a resource, we provide a user-friendly database of gene ids identified in each blood transcriptome. Although, on average 37% of reads mapped to hemoglobin, importantly, the majority of non-hemoglobin transcripts had sufficient depth (e.g., 97% at >10 reads) to be included in differential gene expression analysis. Thus, we demonstrate that RNAseq blood transcriptomes from a very small blood sample (<10 ul) is a minimally invasive option in non-mammalian vertebrates for quantifying expression of a large number of ecologically relevant genes longitudinally and in protected populations.
Six reptile species (3 snakes, 2 lizards, and a turtle) were included in this study (Table 1) for development of blood transcriptomes. These were chosen based on reptile taxonomic diversity and interest to our research groups. Blood was taken using a heparinized needle (U-100 BD Micro-Fine™ IV Insulin Syringes 28 Gauge, 1 mL 12.7 mm (1/2")). For all animals <1% of body weight of blood was taken from the caudal vein in the tail, this was (20 ml to 300 ml for these animals). Blood was either (1) put immediately into RNAlater (Ambion) (<1:5 ratio of blood to RNAlater) in a 2 ml screw-cap tube and kept on ice as would be typical in field settings until stored 4°C, or (2) centrifuged (1000xG for 5 min) to separate blood components, which were flash frozen in liquid nitrogen as plasma and red blood cells in 2 ml screw-cap tubes and stored at -80°C as is common in laboratory settings. These are two approaches (RNAlater versus snap-freeze in liquid nitrogen) that are commonly used to preserve the RNA in tissue, and this study demonstrates that both produce high quality RNA and RNAseq data when used on blood. Typically, the blood cell pellet (hematocrit) is approximately one half of the whole blood cell volume, thus 10 ml of blood cells is obtainable from 20 ml of whole blood. Typically, the blood cell pellet from reptiles is almost all red blood cells with a fine layer of white blood cells at the interface between the red blood cells and the plasma, and thereby is referred to as the red blood cell (RBC) pellet. All procedures were approved by the IACUC at the respective universities or agency of the individual collecting the sample.
Blood that was in RNAlater was centrifuged at 1000xG for 5 minutes to pellet the RBC and the RNAlater was pipetted off. From either the snap-frozen RBC pellet, or the RNAlater RBC pellet, we used 10ml of pelleted blood cells for RNA isolation using the Ambion RiboPure Kit, with DNAse digestion as described by the manufacturer. Purified RNA was analyzed on a Bioanalyzer (Agilent) to validate the quality and to quantify of RNA. From these 10 ml of blood cells we obtained between 4.5 mg and 8.9 mg of RNA, far more than was needed for RNAseq. All samples had a RIN >7.5.
RNA-seq Library Preparation and Sequencing
We sent 1 mg of total RNA to the Heflin Genomic Center at the University of Alabama at Birmingham. Barcoded libraries were prepared using the Agilent SureSelect Stranded library kit (Agilent Technologies, Santa Clara, CA) as described by the manufacturers. Briefly, 100ng of total RNA was subjected to two rounds of poly A+ selection using oligo dT magnetic beads. The mRNA was randomly fragmented, and first strand cDNA synthesis was performed in the presence of random hexamers and 2.4ng/µL (final concentration) of Actinomycin D using standard techniques. After second strand synthesis was complete, the cDNA was adenylated and used in a ligation reaction to add primary adaptors for flow cell attachment with bar code information. The sequencing libraries were mixed to equal molar amounts and run on the Illumina HiSeq2500 using a Rapid Run flow cell with paired-end 100 bp sequencing reads, aiming for 20 million reads/sample. Following completion of the run, the .bcl files were converted to FASTQ file format using BCL2FASTQ 1.8.4 from Illumina.
For comparison to our blood transcriptomes, we downloaded the liver transcriptomes from Dryad (McGaugh et al., 2015a) for two species that overlap with our blood transcriptomes (T. elegans, E. multicarinata), and a third species that shared a genus (Sceloporus undulatus) (Table 1).
Blood Transcriptome Assembly
The bioinformatic pipeline is represented in Figure 1. FASTQ files were assessed using FastQC (http://bioinformatics.babraham.ac.uk/projects/fastqc/) to assess quality control before cleaning. Using Trimmomatic (Bolger, Lohse, & Usadel, 2014), low quality base pairs were removed from raw reads. To reduce biases the first 10 base pairs of each read were removed from each read, and any sequences shorter than 30 base pairs were removed. Quality of the reads was assessed again using FastQC. Transcriptomic reads were de novo assembled using Trinity 2.2.0 (Haas et al., 2013) with the default parameters, we refer to this as the Raw Assembly.
Metagenomic Contamination Screening
Contamination screening was performed on the contigs from each assembled blood transcriptome and the liver transcriptomes from McGaugh et al. (2015a). We performed DIAMOND (Buchfink, Xie, & Huson, 2015) blastp searches against NCBI’s non-redundant nucleotide database (e-value cutoff of 1E-10) and sorted the resulting reports by bitscore, then e-value, then percent identity and isolated any sequences whose top hit matched to a non-vertebrate from the reports using a custom perl script. The contaminate non-vertebrate contigs were cleaned (removed) from the raw transcriptomes, and we refer to these as the “cleaned transcriptomes”.
Reference Blood Transcriptomes
After removing the non-vertebrate sequences from the original blood transcriptome assemblies, we used TransDecoder (https://github.com/TransDecoder/) to generate longest open reading frames and peptide files. The longest open reading frames were passed to the UCLUST algorithm implemented in usearch7 (Edgar, 2010) to cluster the transcripts within each transcriptome using an identity threshold of 90%. Resulting centroids were kept as representative sequences for each cluster. These centroids from the cleaned-clustered assemblies we refer to as the Reference Blood Transcriptomes.
Both the raw and the reference transcriptome assemblies were annotated with the Trinotate annotation pipeline version 3.0. (Bryant et al., 2017), which used TransDecoder to identify the longest open reading frame peptide candidates and compares them to Swiss-Prot (Bateman et al., 2017), PFAM (Finn et al., 2016), SignalP (Petersen, Brunak, Heijne, & Nielsen, 2011), TMHMM (Sonnhammer, von Heijne, & Krogh, 1998) databases. We also did custom BLAST searches (blastx and blastp, e-value cutoff of 1E-10) (Altschul, Gish, Miller, Myers, & Lipman, 1990) to genomes: Anolis carolinensis 2.0; Gallus gallus 5.0; and Homo sapiens GRCh38.p12 from ENSEMBL release 92 (Zerbino et al., 2018). These transcriptomes and annotation files are provided as a Dryad Repository. Additionally, translated transcripts were checked for completeness using the BUSCO tetrapoda database (Simao, Waterhouse, Ioannidis, Kriventseva, & Zdobnov, 2015).
TransDecoder peptide files from both cleaned blood transcriptomes and the cleaned liver transcriptomes were passed to OrthoFinder (Emms & Kelly, 2015) for orthology inference using all-vs-all blastp searches. To assign similarity-based protein identifications of resulting putative homologous proteins, we performed BLAST searches to the three genomes noted above.
This package contains the assmbled transcriptomes from blood (.fastq) and annotation files for the blood transcriptomes and liver transcriptomes from McGaugh et al. 2015 PNAS (.xls), and two supplemental .xls files.
Supplemental File 1 is an excel file of the Candidate Functional Pathway Genes and the transcripts IDs (from translated raw transcriptomes) from each species.
Supplemental File 2 serves as an excel database listing the genes found in each transcriptome as a resource for researchers considering using blood transcriptomes to investigate which genes and candidate gene groups are being expressed and thereby may be assayed in their reptile system.
Raw Blood RNAseq data has been deposited in the NCBI SRA database. Accession SRP135786: Runs SRR6841717 to SRR6841722
James S. McDonnell Foundation, Award: 220020353
National Science Foundation, Award: 1560115