Data from: "White-tailed deer (Odocoileus virginianus) transcriptome assembly and SNP discovery" in Genomic Resources Notes accepted 1 June 2013-31 July 2013
Malenfant, Rene, University of Alberta
Davis, Corey, Nature Research Centre
Moore, Stephen, University of Alberta
Coltman, David, Nature Research Centre
Published Sep 19, 2013 on Dryad.
Cite this dataset
Malenfant, Rene; Davis, Corey; Moore, Stephen; Coltman, David (2013). Data from: "White-tailed deer (Odocoileus virginianus) transcriptome assembly and SNP discovery" in Genomic Resources Notes accepted 1 June 2013-31 July 2013 [Dataset]. Dryad. https://doi.org/10.5061/dryad.79kh0
White-tailed deer (Odocoileus virginianus) are among the most abundant and widespread large mammals in the Americas, comprising up to 38 subspecies ranging from Northern Canada to Peru. Although believed to have high genetic diversity, surprisingly few genomic resources are currently available, despite the species’ ecological and economic importance. White-tailed deer and other cervids throughout central North America are currently being afflicted by chronic wasting disease (CWD), one of the degenerative prion diseases collectively known as transmissible spongiform encephalopathies. Although CWD is of major importance to white-tailed deer management, little is currently known about innate resistance or susceptibility to CWD outside of polymorphisms in the prion protein gene, Prnp, though a recent study using microsatellites suggests that the disease may have additional underlying genetic components. Further association analysis is hindered by low marker density. In this study, we used high-throughput SOLiD sequencing to create novel sequence data for white-tailed deer and identify single-nucleotide polymorphisms, using the pooled blood transcriptomes of six individuals. In total, we generated 14,010 contigs of length ≥ 200 nt, representing 4,104,760 nt of unique sequence data, and we identified 66,596 SNPs. This data represents one of the largest genetic resources currently available for any cervid. We hope it will facilitate future research for population genomics and assist with the identification of genetic factors that underlie disease resistance and other traits relevant for conservation and management.
Trimmed Un-mapped Data assembly
Resultant BAM file from a pool of rRNA-depleted, non-normalized cDNA of six white-tailed deer from the University of Saskatchewan's Specialized Livestock Research Facility that was assembled de novo in CLC Genomics Workbench 6.0.1.
Top BLAST results of consensus contigs extracted from the .bam file. blastx+ 2.2.27 was used with an e-value threshold of 0.000001 against a custom database of RefSeq eutherian protein sequences.
dbSNP ss Accession Numbers
List of SNP ss accession numbers detected in the assembly using CLC Genomics Workbench 6.0.1. These SNPs will be available online in the next dbSNP update, currently scheduled for Oct. 2013.
We detected SNPs using CLC’s quality-based variant detection algorithm (min. average quality of five flanking bases on either side of SNP = 15; min. quality of central base = 20; min. coverage = 20; max. coverage = 1000; min. variant frequency = 15%; min. variant count = 4), discarding all reads with non-specific alignments. Multiallelic SNPs were discarded as they may result from incorrect alignment of reads from paralogous genes in the transcriptome data. All SNPs have been submitted to dbSNP and will be available online in the October 2013 update. This VCF file is available in the interim.