Data from: SNP discovery in non-model organisms: strand-bias and base-substitution errors reduce conversion rates

Gonçalves da Silva, Anders1 2; Barendse, William2; Kijas, James W.2; Barris, Wes C.2; McWilliam, Sean2; Bunch, Rowan J.2; McCulloch, Russell2; Harrison, Blair2; Hoelzel, A. Rus3; England, Phillip R.3; McCullough, Russell2

Published May 22, 2015 on Dryad. https://doi.org/10.5061/dryad.n3bb2

Data files

May 22, 2015 version files 1.66 GB

454Contigs.afg

289.61 MB
orpg_scaffolds_velvetAssembly.fa

576.61 MB
orpg_sevenLibrary_velvetAssembly.afg

778.76 MB
orpg_snpdev_glm_analysis.zip

17.81 MB
perl_scripts.zip

146.85 KB

Abstract

Single nucleotide polymorphisms (SNPs) have become the marker of choice for genetic studies in organisms of conservation, commercial or biological interest. Most SNP discovery projects in nonmodel organisms apply a strategy for identifying putative SNPs based on filtering rules that account for random sequencing errors. Here, we analyse data used to develop 4723 novel SNPs for the commercially important deep-sea fish, orange roughy (Hoplostethus atlanticus), to assess the impact of not accounting for systematic sequencing errors when filtering identified polymorphisms when discovering SNPs. We used SAMtools to identify polymorphisms in a velvet assembly of genomic DNA sequence data from seven individuals. The resulting set of polymorphisms were filtered to minimize ‘bycatch’—polymorphisms caused by sequencing or assembly error. An Illumina Infinium SNP chip was used to genotype a final set of 7714 polymorphisms across 1734 individuals. Five predictors were examined for their effect on the probability of obtaining an assayable SNP: depth of coverage, number of reads that support a variant, polymorphism type (e.g. A/C), strand-bias and Illumina SNP probe design score. Our results indicate that filtering out systematic sequencing errors could substantially improve the efficiency of SNP discovery. We show that BLASTX can be used as an efficient tool to identify single-copy genomic regions in the absence of a reference genome. The results have implications for research aiming to identify assayable SNPs and build SNP genotyping assays for nonmodel organisms.

Data from: SNP discovery in non-model organisms: strand-bias and base-substitution errors reduce conversion rates

Data files

Abstract

Usage notes

orpg_snpdev_glm_analysis

orpg_sevenLibrary_velvetAssembly

orpg_scaffolds_velvetAssembly

454Contigs

perl_scripts

Works referencing this dataset