Data from: De novo transcriptomic analyses for non-model organisms: an evaluation of methods across a multi-species data set

Singhal, Sonal1

Published Jan 09, 2013 on Dryad. https://doi.org/10.5061/dryad.7c99f

Data files

Jan 09, 2013 version files 187.03 MB

assemblies.tar.gz

186.99 MB
pipeline.tar.gz

48.33 KB

Abstract

High-throughput sequencing (HTS) is revolutionizing biological research by enabling scientists to quickly and cheaply query variation at a genomic scale. Despite the increasing ease of obtaining such data, using these data effectively still poses notable challenges, especially for those working with organisms without a high-quality reference genome. For every stage of analysis – from assembly to annotation to variant discovery – researchers have to distinguish technical artefacts from the biological realities of their data before they can make inference. In this work, I explore these challenges by generating a large de novo comparative transcriptomic data set data for a clade of lizards and constructing a pipeline to analyse these data. Then, using a combination of novel metrics and an externally validated variant data set, I test the efficacy of my approach, identify areas of improvement, and propose ways to minimize these errors. I find that with careful data curation, HTS can be a powerful tool for generating genomic data for non-model organisms.

Data from: De novo transcriptomic analyses for non-model organisms: an evaluation of methods across a multi-species data set

Data files

Abstract

Assemblies

Pipeline

Data from: De novo transcriptomic analyses for non-model organisms: an evaluation of methods across a multi-species data set

Data files

Abstract

Usage notes

Assemblies

Pipeline

Works referencing this dataset