Skip to main content

CarcSeq measurement of lung cancer driver mutations predicts mouse strain- and sex-related incidence of spontaneous lung neoplasia

Cite this dataset

Parsons, Barbara et al. (2021). CarcSeq measurement of lung cancer driver mutations predicts mouse strain- and sex-related incidence of spontaneous lung neoplasia [Dataset]. Dryad.


The data contained herein are the fastq files collected for individual mouse lung DNA samples. Ten samples each of male and female, B6C3F1 and CD-1 mouse lung DNA were analyzed using the CarcSeq method for error-corrected next-generation sequencing as previously described [Harris, KL, Walia, V, Gong, B, et al. Quantification of cancer driver mutations in human breast and lung DNA using targeted, error-corrected CarcSeq. Environ Mol Mutagen. 2020; 61: 872– 889. and Karen L McKim, Meagan B Myers, Kelly L Harris, Binsheng Gong, Joshua Xu, Barbara L Parsons, CarcSeq Measurement of Rat Mammary Cancer Driver Mutations and Relation to Spontaneous Mammary Neoplasia, Toxicological Sciences, 2021;, kfab040,]. A readme text file provides the key to which sample numbers correspond with which mouse strain and sex. All the fastq files within a given folder were used to construct single strand consensus sequences, which were then subject to downstream analyses as reported in the associated publication.


From each mouse sample (40, 10 each male and female, B6C3F1 and CD-1 mice), first-round PCR was conducted to generate14 amplicons of mouse homologs of human sequences encompassing hotspot cancer driver mutations (CDMs). The primers used were constructed with 9 bases of degenerate sequence at their 5' ends, resulting in 18 basepairs of sequence that served as a unique molecular identifier during error correction. The amplicons from the same mouse sample were combined and the Illumina® TruSeq® ChIP Sample Preparation Kit (Illumina, San Diego, CA) was used to construct a library, suitable for Illumina sequencing. The method for library preparation was modified from that provided by the manufacturer in that DNAs recovered from the gel purification step were diluted serially and numbers of PCR-amplifiable molecules in the dilutions were determined by digital droplet PCR (ddPCR), using the ddPCR Library Quantification Kit for Illumina (BioRad, Hercules, CA) and the QX200 Droplet Digital PCR System (Bio-Rad). Based on this ddPCR quantification, 1.5 x 106 molecules were added to the final PCR described in the Illumina® TruSeq® ChIP kit protocol. Mouse lung DNA libraries were pooled, denatured, and diluted according to the Illumina® TruSeq® Library Prep Pooling Guide. Specifically, libraries were diluted to 2 nM, combined with 2 nM of PhiX (Illumina), denatured and diluted, then loaded onto an Illumina NextSeq500 for cluster generation and 151-cycle, paired-end sequencing using NSQ 500/550 Mid or High Output KT (300 CYS) reagents using a six bp index read. We employed methods reported by Kennedy et al. [Kennedy, S.R, Schmitt, M.W, Fox, E.J, Kohrn, B.F, Salk, J.J, Ahn, E.H, Prindle, M.J, Kuong, K.J, Shen, J.-C, Risques, R.-A & Loeb, L.A (2014). Detecting ultralow-frequency mutations by Duplex Sequencing.
Nature Protocols, 9(11), 2586–2606] for error correction, except recovering single-strand consensus sequences (SSCSs) for use in all downstream analyses. Library preparation and sequencing were repeated if 100,000 SSCSs were not recovered, in which case fastq files from multiple library preparations were combined for SSCS construction.

Usage notes

A ReadMe text file was uploaded that provides a sample key for the data in the folders.


United States Food and Drug Administration, Award: National Center for Toxicological Research (NCTR) and Center for Drug Evaluation Research (CDER)

United States Department of Energy, Award: ORISE Research Participation Program at NCTR, administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and FDA/NCTR