Interpretation of high-throughput sequence data requires an understanding of how decisions made during bioinformatic data processing can influence results. One source of bias that is often cited is PCR clones (or PCR duplicates). PCR clones are common in restriction site associated sequencing (RAD-seq) datasets, which are increasingly being used for molecular ecology. To determine the influence PCR clones and the bioinformatic handling of clones have on genotyping, we evaluate four RAD-seq datasets. Datasets were compared before and after clones were removed to estimate the number of clones present in RAD-seq data, quantify how often the presence of clones in a dataset cause genotype calls to change compared to when clones were removed, investigate the mechanisms that lead to genotype call changes, and test if clones bias heterozygosity estimates. Our RAD-seq datasets contained 30 – 60% PCR clones, but 95% of RAD-tags had five or fewer clones. Relatively few genotypes changed once clones were removed (5-10%), and the vast majority of these changes (98%) were associated with genotypes switching from a called to no-call state or vice versa. PCR clones had a larger influence on genotype calls in individuals with low read depth but appeared to influence genotype calls at all loci similarly. Removal of PCR clones reduced the number of called genotypes by 2% but had almost no influence on estimates of heterozygosity. As such, while steps should be taken to limit PCR clones during library preparation, PCR clones are likely not a substantial source of bias for most RAD-seq studies.

Brook trout clone filtered

Clone filtered VCF file of brook trout genotype data. VCF files were generated using stacks 2.46 with minimal filters (STACKS flags = -r 0.3, --min_maf 0.05). Data was generated using the SbfI enzyme, methods outlined in Ali et al. (2016)and prepared in the Genomic Variation Lab at the University of California--Davis and sequenced on Illumina NextSeq 500 (PE 75 bp reads, 96 samples/lane) at the Cornell Institute of Biotechnology.

bt_CF.vcf

Brook trout unfiltered

Non-clone filtered (unfiltered) VCF file of brook trout genotype data. VCF files were generated using stacks 2.46 with minimal filters (STACKS flags = -r 0.3, --min_maf 0.05). Data was generated using the SbfI enzyme, methods outlined in Ali et al. (2016)and prepared in the Genomic Variation Lab at the University of California--Davis and sequenced on Illumina NextSeq 500 (PE 75 bp reads, 96 samples/lane) at the Cornell Institute of Biotechnology

bt_noCF.vcf

Cisco clone filtered

Clone filtered (filtered) VCF file of cisco genotype data. VCF files were generated using stacks 2.46 with minimal filters (STACKS flags = -r 0.3, --min_maf 0.05). Data generated using the SbfI enzyme, methods outlined in Ali et al. (2016) prepared in the Larson Laboratory at the University of Wisconsin-Stevens Point and sequenced on a HiSeq 4000 (PE 150bp reads, 96 samples/lane) at the Michigan State Genomics Core Facility.

cisco_CF.vcf

Cisco unfiltered

Non-clone filtered (unfiltered) VCF file of cisco genotype data. VCF files were generated using stacks 2.46 with minimal filters (STACKS flags = -r 0.3, --min_maf 0.05). Data generated using the SbfI enzyme, methods outlined in Ali et al. (2016) prepared in the Larson Laboratory at the University of Wisconsin-Stevens Point and sequenced on a HiSeq 4000 (PE 150bp reads, 96 samples/lane) at the Michigan State Genomics Core Facility.

cisco_noCF.vcf

Walleye clone filtered

Clone filtered (filtered) VCF file of walleye genotype data. VCF files were generated using stacks 2.46 with minimal filters (STACKS flags = -r 0.3, --min_maf 0.05). Data generated using the SbfI enzyme, methods outlined in Ali et al. (2016) prepared in the Larson Laboratory at the University of Wisconsin-Stevens Point and sequenced on a HiSeq 4000 (PE 150bp reads, 192 samples/lane) at the Michigan State Genomics Core Facility.

wal_CF.vcf

Walleye unfiltered

Non-clone filtered (unfiltered) VCF file of walleye genotype data. Clone filtered (filtered) VCF file of walleye genotype data. VCF files were generated using stacks 2.46 with minimal filters (STACKS flags = -r 0.3, --min_maf 0.05). Data generated using the SbfI enzyme, methods outlined in Ali et al. (2016) prepared in the Larson Laboratory at the University of Wisconsin-Stevens Point and sequenced on a HiSeq 4000 (PE 150bp reads, 192 samples/lane) at the Michigan State Genomics Core Facility.

wal_noCF.vcf

Data from: Attack of the PCR clones: rates of clonality have little effect on RAD-seq genotype calls

Data files

Abstract

Brook trout clone filtered

Brook trout unfiltered

Cisco clone filtered

Cisco unfiltered

Walleye clone filtered

Walleye unfiltered

Data from: Attack of the PCR clones: rates of clonality have little effect on RAD-seq genotype calls

Data files

Abstract

Usage notes

Brook trout clone filtered

Brook trout unfiltered

Cisco clone filtered

Cisco unfiltered

Walleye clone filtered

Walleye unfiltered

Works referencing this dataset