Skip to main content
Dryad

Data from: Attack of the PCR clones: rates of clonality have little effect on RAD-seq genotype calls

Data files

Aug 06, 2019 version files 336.03 MB
Aug 06, 2019 version files 672.06 MB

Abstract

Interpretation of high-throughput sequence data requires an understanding of how decisions made during bioinformatic data processing can influence results. One source of bias that is often cited is PCR clones (or PCR duplicates). PCR clones are common in restriction site associated sequencing (RAD-seq) datasets, which are increasingly being used for molecular ecology. To determine the influence PCR clones and the bioinformatic handling of clones have on genotyping, we evaluate four RAD-seq datasets. Datasets were compared before and after clones were removed to estimate the number of clones present in RAD-seq data, quantify how often the presence of clones in a dataset cause genotype calls to change compared to when clones were removed, investigate the mechanisms that lead to genotype call changes, and test if clones bias heterozygosity estimates. Our RAD-seq datasets contained 30 – 60% PCR clones, but 95% of RAD-tags had five or fewer clones. Relatively few genotypes changed once clones were removed (5-10%), and the vast majority of these changes (98%) were associated with genotypes switching from a called to no-call state or vice versa. PCR clones had a larger influence on genotype calls in individuals with low read depth but appeared to influence genotype calls at all loci similarly. Removal of PCR clones reduced the number of called genotypes by 2% but had almost no influence on estimates of heterozygosity. As such, while steps should be taken to limit PCR clones during library preparation, PCR clones are likely not a substantial source of bias for most RAD-seq studies.