Data from: CHOPER filters enable rare mutation detection in complex mutagenesis populations by next-generation sequencing
Data files
Dec 19, 2015 version files 6.72 GB
-
ACAGTG1_part1.fq.bz2
838.86 MB
-
ACAGTG1_part2.fq.bz2
453.46 MB
-
ACAGTG2_part1.fq.bz2
838.86 MB
-
ACAGTG2_part2.fq.bz2
521.95 MB
-
CAGATC1.fq.bz2
872.51 MB
-
CAGATC2.fq.bz2
922.28 MB
-
choper.py
11.66 KB
-
codons.py
3.51 KB
-
m237i1.fa
845 B
-
PhredFilter.jar
174.79 KB
-
TGACCA1_part1.fq.bz2
838.86 MB
-
TGACCA1_part2.fq.bz2
268.39 MB
-
TGACCA2_part1.fq.bz2
838.86 MB
-
TGACCA2_part2.fq.bz2
322.15 MB
Abstract
Next-generation sequencing (NGS) has revolutionized genetics and enabled the accurate identification of many genetic variants across many genomes. However, detection of biologically important low-frequency variants within genetically heterogeneous populations remains challenging, because they are difficult to distinguish from intrinsic NGS sequencing error rates. Approaches to overcome these limitations are essential to detect rare mutations in large cohorts, virus or microbial populations, mitochondria heteroplasmy, and other heterogeneous mixtures such as tumors. Modifications in library preparation can overcome some of these limitations, but are experimentally challenging and restricted to skilled biologists. This paper describes a novel quality filtering and base pruning pipeline, called Complex Heterogeneous Overlapped Paired-End Reads (CHOPER), designed to detect sequence variants in a complex population with high sequence similarity derived from All-Codon-Scanning (ACS) mutagenesis. A novel fast alignment algorithm, designed for the specified application, has O(n) time complexity. CHOPER was applied to a p53 cancer mutant reactivation study derived from ACS mutagenesis. Relative to error filtering based on Phred quality scores, CHOPER improved accuracy by about 13% while discarding only half as many bases. These results are a step toward extending the power of NGS to the analysis of genetically heterogeneous populations.