Background: The reptiles, characterized by both diversity and unique evolutionary adaptations, provide a comprehensive system for comparative studies of metabolism, physiology, and development. However, molecular resources for ectothermic reptiles are severely limited, hampering our ability to study the genetic basis for many evolutionarily important traits such as metabolic plasticity, extreme longevity, limblessness, venom, and freeze tolerance. Here we use massively parallel sequencing (454 GS-FLX Titanium) to generate a transcriptome of the western terrestrial garter snake (Thamnophis elegans) with two goals in mind. First, we develop a molecular resource for an ectothermic reptile; and second, we use these sex-specific transcriptomes to identify differences in the presence of expressed transcripts and potential genes of evolutionary interest. Results: Using sex-specific pools of RNA (one pool for females, one pool for males) representing 7 tissue types and 35 diverse individuals, we produced 1.24 million sequence reads, which averaged 366 bp in length after cleaning. Assembly of the cleaned reads from both sexes with NEWBLER and MIRA resulted in 96,379 contigs containing 87% of the cleaned reads. Over 34% of these contigs and 13% of the singletons were annotated based on homology to previously identified proteins. From these homology assignments, additional clustering, and ORF predictions, we estimate that this transcriptome contains ~13,000 unique genes that were previously identified in other species and over 66,000 transcripts from unidentified protein-coding genes. Furthermore, we use a graph-clustering method to identify contigs linked by NEWBLER-split reads that represent divergent alleles, gene duplications, and alternatively spliced transcripts. Beyond gene identification, we identified 95,295 SNPs and 31,651 INDELs. From these sex-specific transcriptomes, we identified 190 genes that were only present in the mRNA sequenced from one of the sexes (84 female-specific, 106 male-specific), and many highly variable genes of evolutionary interest. Conclusions: This is the first large-scale, multi-organ transcriptome for an ectothermic reptile. This resource provides the most comprehensive set of EST sequences available for an individual ectothermic reptile species, increasing the number of snake ESTs 50-fold. We have identified genes that appear to be under evolutionary selection and those that are sex-specific. This resource will assist studies on gene expression and comparative genomics, and will facilitate the study of evolutionarily important traits at the molecular level.
GarterSnakeTranscriptome.ContigsOnly
This is the multi-organ garter snake transcriptome assembled from 454 data. This file contains only the contigs.
454AllContigs.fna
GarterSnakeTranscriptome.Contigs.and.Singletons
This garter snake transcriptome contains all the contigs and singletons. Sequence IDs in this file corresponds to the SeqID column in the Annotation File
454Contigs_Newb_Mira_sing.fna
Annotation File
This is the Annotation file that corresponds to the transcriptome contigs and singletons
AnnoData_updated2010.08.16.xlsx
Add.file1_Table_Samples
Additional files that are also available from BMC Genomics website. Additional file 1 – Table containing details of the samples used for the sex-specific RNA pools.
Tissue samples of the same type were pooled across individuals (either laboratory or field born animals) for total RNA extraction. Extracted pools of RNA were quantified and the quality checked on the Bioanalyzer. Equal amounts of RNA from each tissue type were pooled by sex.
Add.file2_Dist.Reads
Additional files that are also available from BMC Genomics website. Additional file 2 – Graphs illustrating the size distribution of the reads for each sex. Length (bp) distribution of reads obtained with the 454 GS-FLX Titanium sequencing. Read number (N) and length (L) in base pairs. A) Female run. B) Male runs.
Add.file3_Graph-cluster.description
Additional files that are also available from BMC Genomics website. Additional file 3 – Description of NEWBLER assembly and graph-clustering procedure.
Add.file4_GOgraphs
Additional files that are also available from BMC Genomics website. Additional file 4 – Pie graphs of GO assignments. GO slim (level 1, Biological Processes) assignments for all the sequences with annotation, broken down by class of sequences: male singletons, male contigs, both contigs (containing male and female reads), female contigs, female singletons.
Add.file5_mapAnoCar1
Additional files that are also available from BMC Genomics website. Additional file 5 - Snake transcripts mapped to coding and non-coding regions of the Anolis lizard draft genome (AnoCar1.0). A 2007 Excel file (.xlsx) providing details where the snake transcripts mapped to the Anolis draft genome. See the ReadME tab for description of columns.
Add.file6_Clustering
Additional files that are also available from BMC Genomics website. Additional file 6 - Clustering based on homology and contig-graphs. A) Distribution of the number of contigs in a HomoloGene accession, and B) the number of HomoloGene accessions that a contig is assigned to, both at e-value =1e-20. C) Distribution of the number of contigs belonging to a graph-cluster.
Add.file7_Variants
Additional files that are also available from BMC Genomics website. Additional file 7 – Details of variants. A 2007 Excel file (.xlsx) providing details for the variants (SNPs and INDELs). See the ReadME tab for description of columns.
Add.file8_ContigOfInterest
Additional files that are also available from BMC Genomics website. Additional file 8 - Contigs of interest. A 2007 Excel file (.xlsx) containing the sequences of interest including those that are sex-specific, that have homology to the chicken Z chromosome, those in the 1st percentile of TS/TV ratios, those in the top 99th percentile of Ka/Ks ratios, and those in the top 99th percentile of variability (number of variants per bp). See the ReadME tab for description of columns.
Add.file9_sex-specificGOgraph
Additional files that are also available from BMC Genomics website. Additional file 9 – Sex-specific enrichment of GO terms (level 2, Biological Processes) assigned to the 190 sex-specific sequences. The * indicates the significant over-enrichment of sequences involved in biosynthetic processes in the female-specific sequences (Fisher’s Exact Test, FDR <0.006, p-value < 0.0002).