Data for: Lorcaserin-induced rat mammary mutants quantified by CarcSeq
Data files
May 30, 2024 version files 183.54 GB
-
M19_Rep_2.zip
11.06 GB
-
M20_Rep_2.zip
10.83 GB
-
M21_Rep_2.zip
17.43 GB
-
M22_Rep_2.zip
8.62 GB
-
M23_Rep_2.zip
9.03 GB
-
M24_Rep_2.zip
10.30 GB
-
M25_Rep_2.zip
9.29 GB
-
M26_Rep_2.zip
9.01 GB
-
M27_Rep_2.zip
10.03 GB
-
M28_Rep_2.zip
11.03 GB
-
M29_Rep_2.zip
7.88 GB
-
M30_Rep_2.zip
12.89 GB
-
M31_Rep_2.zip
8.57 GB
-
M32_Rep_2.zip
10.37 GB
-
M33_Rep_2.zip
8.91 GB
-
M34_Rep_2.zip
10.47 GB
-
M35_Rep_2.zip
7.45 GB
-
M36_Rep_2.zip
10.37 GB
-
README.md
3.75 KB
Abstract
Lorcaserin, a drug for weight management, is a selective agonist of the serotonin (5-hydroxytryptamine) 2C receptor. Although lorcaserin is a non-genotoxic rat carcinogen, FDA approval was granted in part based on dose extrapolation considerations. A post-marketing study, CAMELLIA-TIMI, designed to detect potential cardiovascular effects of lorcaserin therapy detected excess cancer risk in the lorcaserin treatment arm. Consequently, a study of lorcaserin-treated rats was conducted to elucidate the mechanism of lorcaserin-induced carcinogenesis and facilitate detection of other carcinogens operating through the same mechanism in the future. Another study goal was to characterize CarcSeq utility in detecting the neoplasia-related effects of a non-genotoxic carcinogen. CarcSeq is an error-corrected next-generation sequencing method for quantitation of panels of hotspot cancer driver mutations (CDMs) and can detect mutations with mutant fractions (MFs) ≥10-4. Female Sprague Dawley rats were treated by gavage daily with 0, 30, or 100 mg/kg lorcaserin, replicating the tumor bioassay doses but with shorter duration treatments of 12 or 24 weeks. Lorcaserin and N-nitroso-lorcaserin were quantified in dosing solutions, terminal plasma and terminal liver samples using ultra high-performance liquid chromatography-electrospray tandem mass spectrometry. N-nitroso-lorcaserin was not detected. Mammary DNAs (n = 6/group) were used to synthesize PCR products from genes containing known hotspot CDMs (Apc, Braf, Egfr, Hras, Kras, Nfe2l2, Pik3ca, Setbp1, Stk11, and Tp53) and variant MFs were quantified by CarcSeq. Considering MFs in all targets, no significant effects of lorcaserin treatment were observed. However, significant induction of Pik3ca H1047R mutation was observed after 12 and 24 weeks of treatment (ANOVA, P<0.05), with greater numbers of mutants and mutants with higher MFs observed in 24-week samples compared to 12-week samples. Given that Pik3ca H1047R mutation can be detected in normal tissues, is the most prevalent mutation in human breast cancer, and occurs in several other cancers, these results suggest lorcaserin-induced carcinogenesis involves promoting the outgrowth of spontaneously-occurring Pik3ca H1047R mutant clones. The underlying mechanism(s) of promotion are under investigation. This study provides the first demonstration that CarcSeq can identify the carcinogenic impact of a non-genotoxic carcinogen, doing so within a shorter timeframe than is needed to measure a tumor response.
README: Analysis of lorcaserin induced clonal expansion of cancer driver mutants by CarcSeq
Description of the data and file structure
Although three datasets were generated in the conduct of this study, based upon dataset size considerations, only one dataset is provided to enable independent re-analysis of CarcSeq output. Specifically, the data obtained in the replications of the CarcSeq analysis of the 24-week samples is provided (24 week replicate 2, Rep 2). The data is provided in the form of one compressed folder for the sequencing output of each rat treated for 24 weeks. More precisely, each compressed folder contains all the fastq files produced by the CarcSeq analysis of genomic DNA from one rat that were run through the Kennedy error correction pipeline [8] and post-error correction filtering steps to obtain the final data. The folder naming convention uses M to signify mammary tissue as the source of genomic DNA, incorporates the rat identifier number (ID#, between 19 and 36 for the rats treated for 24 weeks), and Rep 2 to denote the replicated 24 week dataset (i.e., M19_Rep 2.zip to M36_Rep2.zip). For rats treated with lorcaserin for 24 weeks, rats with ID#s M19 to M24 were treated with 0 mg/kg, rats M25 to M30 were treated with 30 mg/kg, and rats M31 to M36 were treated with 100 mg/kg.
Two additional items/files are provided. The excel template file used to collect the mutpos data is provided and identified as: EC-NGS Template. Also, the custom R script used to filter the raw mutpos data to obtain the final data used in statistical analyses is provided as: CS-ecNGS-spreadsheet-processing-08-08-2023v2.Rmd.
Sharing/Access information
A manuscript describing the results of this study has been submitted to Toxicological Sciences. If accepted, a link to the publication will be added here.
Code/Software
Error correction was performed on each sample independently, by running all the fastq files for a particular sample through the Kennedy pipeline [8]. The output of the pipeline was a mutpos file. An Excel workbook was assembled for each dataset, with one worksheet capturing the mutpos data from one sample. The assembled Excel workbooks were used, separately, as input for a custom R script (provided as: CS-ecNGS-spreadsheet-processing-08-08-2023v2.Rmd). The custom R script was written by Dr. Page B. McKinzie (US Food and Drug Administration, National Center for Toxicological Research, Jefferson, AR). The script accomplishes several filtering steps and delivers the mutant data in several useful forms (MF values and mutant counts, reported and searchable by amplicon, chromosome, genomic location, and mutation type). The R script processes the mutation data, such that three conditions are met for mutants to be considered in the final data: 1) three or more mutant SSCSs were recovered, 2) mutant SSCSs/total SSCSs was greater than or equal to 10-4, and 3) across the samples in a dataset/workbook, the coefficient of variation of MFs at a particular position for a particular type of mutation was greater than or equal to 60%. The R script also performs random downsampling of mutants to normalize the output relative to the sample with the lowest number of SSCSs recovered for each amplicon. The R script reports MFs of mutants recovered before and after downsampling. The R script generates plots in which the panel position of each mutant is plotted on the X axis (arraryed from lowest chromosome number/position to highest) and the mutant's log10 MF is plotted on the Y axis. Further, the R script has functionality that allows the user to identify which amplicons are considered tissue-specific drivers and reports can be generated that parse the data accordingly.
Methods
Study Rationale
There is a need to improve the assessment of carcinogenic risk associated with exposure to various test articles, including pharmaceuticals intended for chronic use. The need is particularly acute in terms of detecting the carcinogenic impacts of non-genotoxic carcinogens. The CarcSeq error corrected sequencing method was developed as an approach to address this need. CarcSeq was designed to quantify mutant fraction (MF, mutant bases per total number of bases characterized at a given position) of mutants in segments of DNA encompassing hotspot cancer driver mutations (CDMs), including segments of the Apc, Braf, Hras, Kras, Nfe2l2, Pik3ca, Setbp1, Stk11, and Tp53 genes. Using CarcSeq, it was shown that CDMs with MFs greater than or equal to 10<sup>-4</sup> are present in normal human tissues [1], as well as untreated tissues (mammary and lung) of rat and mouse [2,3]. Given that spontaneously occurring CDMs are obligatory for carcinogenesis induced by a non-genotoxic carcinogen, this study investigated whether CarcSeq could detect the early clonal expansion of CDMs induced by the non-genotoxic carcinogen, lorcaserin. Lorcaserin is an agonist of the G protein-coupled 5-hydroxytryptamine (serotonin) receptor subtype 2c (5-HT2c)[4], which was intended to reduce appetite and body weight as the active pharmaceutical ingredient in the diet drug Belviq [5]. Significant tumor findings were observed in a rat bioassay of lorcaserin (mammary adenocarcinoma/fibroadenoma in male and female rats, along with additional tumors in male rats)[5]. Further, total malignancies, patients with malignancy, multiple primary tumors, metastatic malignancies and cancer deaths were increased in the lorcaserin treatment arm of the CAMELLIA-TIMI trial relative to the placebo arm [6]. This resulted in Belviq being voluntarily withdrawn from the US market [7]. To investigate whether CarcSeq could detect early clonal expansion of CDMs, potentially indicative of future tumor development, a repeat-dose rat study was conducted in which female Sprague Dawley rats were treated by gavage daily with 0, 30, or 100 mg/kg of lorcaserin hydrochloride hemihydrate, for 12 and 24 weeks. These doses were chosen because they were the mid and high bioassay doses that induced mammary tumors in female Sprague Dawley rats.
Study Procedures
Cancer driver mutation (CDM) levels were measured in mammary DNA of lorcaserin treated rats using CarcSeq and six rats per treatment group. Six different treatment groups were examined (three dose groups each, for the 12- and 24-week treatments), using a total of 36 rats. Genomic DNA was isolated from the flash-frozen mammary tissue of each rat and high-fidelity PCR amplification was performed to amplify 15 gene segments encompassing hotspot CDMs from each rat. These PCR reactions were carried out using primers synthesized with eight bases of degenerate sequence at their 5'ends. After PCR amplification, these degenerate sequences constitute a 16-basepair unique molecular identifier sequence (eight basepairs from each end) that can be used for error correction by consensus. The DNA concentrations of the 15 individual PCR products generated from each rat were quantified, then appropriate amounts of the amplicons from each rat were pooled and used in library preparation. Libraries were constructed using Illumina TruSeq ChIP kits (Illumina, San Diego, CA). One step in library preparation is the ligation of adapters that include an index sequence to discriminate reads from different samples sequenced on the same flow cell. The final PCR step in libary preparation was carried out using a fraction of each library that contained 22.5 million copies, as determined using the QX200 Droplet Digital PCR (ddPCR) System and ddPCR Library Quantification Kits for Illumina TruSeq (Bio-Rad, Hercules, CA). Finally, libraries were sequenced using 151-cycle, paired-end sequencing and NSQ 500/550 Mid Output (300 CYS) flow cells on an Illumina NextSeq 500. Two independent CarcSeq analyses were performed using genomic DNA from rats treated for 24 weeks, allowing the reproducibility of the CarcSeq MF measurements to be assessed.
Error Correction and Post-Error Correction Filtering
The fastq files derived from sequencing the genomic DNA of each rat were run through the Kennedy error correction pipeline [8]. For each position, consensus among reads with the same UMI was used to distinguish correct base calls from PCR or sequencing errors. At least two reads with the same UMI and 90% identity among base calls were required to create one single strand consensus sequence (SSCS) and between 300,000 and 600,000 SSCSs were analyzed across 1077 sequenced positions for each rat. The output of the Kennedy pipeline is a mutpos (mutant position) file, which identifies the number of mutant SSCSs, their genomic location, mutation type, and total number of SSCSs recovered at each position. For each type of mutation, MF was quantified as mutant SSCSs/total SSCSs at a specific position. To reach a minimum of 300,000 SSCSs, in some cases libraries were re-sequenced, the fastq files were combined with those from previous sequencing runs, and the combined fastq files were re-run through the error correction pipeline. The error corrected output (mutpos.txt file) for each rat analyzed was transferred to a Microsoft Excel template worksheet (containing preset functions) within an Excel workbook. The mutpos output was filtered in three ways to obtain the final data. For any mutant call to be considered further, three or more mutant SSCSs had to have been recovered. Also, only mutants with MFs greater than or equal to 10-4 were considered further. For each mutation, MF was calculated and the two filters were applied using Excel worksheet functions. Because variability across samples has been identified as a measure indicative of clonal expansion [2, 3, 9, 10], mutants at positions with invariant MFs were removed from the dataset, as the third filtering step. Specifically, mutants (of a particular type and at a particular position) with a coefficient of variation less than 60% considering all samples in a dataset were filtered out. A custom R script (CS-ecNGS-spreadsheet-processing-08-08-2023v2.Rmd) was used to filter the data as described. An Excel workbook (with one worksheet corresponding to the mutpos output of one rat and a worksheet for each rat in the dataset) was used as input for the R script. The R script reports the error corrected and filtered mutant output in several useful forms (MF values and mutant counts, reported and searchable by amplicon, chromosome, genomic position, and mutation type). The R script also generates a report on SSCS number recovery, identifies the minimum SSCS number recovered in a dataset, and calculates the ratio (quotient) of recovered SSCS number for each amplicon of each sample divided by the minimum SSCS recovery of each amplicon across dataset samples. This ratio is used to calculate the number of mutants that are randomly downsampled, and the R script performs the random downsampling of mutants to normalize the output relative to the rat sample with the lowest number of SSCSs recovered. The R script reports MFs of mutants recovered before and after downsampling. Downsampled data, normalized relative to the sample with the lowest SSCS recovery for each amplicon was used in all statistical analyses.
Study Results
Many mutants with MFs greater than or equal to 10-4 were recovered in this study (15 to 55 mutants per rat, with a total of 565 mutants recovered in the replicate analysis of the 24 week samples). Significantly greater MFs were observed in amplicons corresponding to known rat mammary-specific driver genes (Hras, Pik3ca, and Tp53), as compared to the remaining amplicons. Importantly, in both the 12 and 24 week datasets, significant dose-dependent increases in Pik3ca H1047R MFs and numbers of samples with Pik3ca H1047R mutants were observed. This was interpreted as evidence of clonal expansion. Also, a greater number of samples with measureable Pik3ca H1047R MFs and greater clonal expansion (identified by larger MF values) were observed in the 24 week samples relative to the 12 week samples. Replicate measurements of Pik3ca H1047R MFs in the 24 weeks samples were highly concordant. The median absolute deviation in MFs observed among the dose groups after 24 weeks of lorcaserin treatment correlated significantly with mammary tumor incidence (i.e., number of mammary tumors per number of female rats tested in each dose group). Lorcaserin-induced clonal expansion of Pik3ca H1047R mutants in rat was considered relevant to human because Pik3ca H1047R is the most prevalent CDM reported to occur in human breast cancer. In conclusion, CarcSeq detected the carcinogenic effect of a non-genotoxic carcinogen using a duration of treatment as short as three months.
[1] Harris, K. L., Walia, V., Gong, B., McKim, K. L., Myers, M. B., Xu, J. and Parsons, B. L. (2020). Quantification of cancer driver mutations in human breast and lung DNA using targeted, error-corrected carcseq. Environ Mol Mutagen 61, 872-889.
[2] McKim, K. L., Myers, M. B., Harris, K. L., Gong, B., Xu, J. and Parsons, B. L. (2021). Carcseq measurement of rat mammary cancer driver mutations and relation to spontaneous mammary neoplasia. Toxicol Sci 182, 142-158.
[3] Harris, K. L., McKim, K. L., Myers, M. B., Gong, B., Xu, J. and Parsons, B. L. (2021). Assessment of clonal expansion using carcseq measurement of lung cancer driver mutations and correlation with mouse strain- and sex-related incidence of spontaneous lung neoplasia. Toxicol Sci 184, 1-14.
[4] Gustafson, A., King, C. and Rey, J. A. (2013). Lorcaserin (belviq): A selective serotonin 5-HT2c agonist in the treatment of obesity. P T 38, 525-534.
[5] U.S. Food and Drug Administration. 2012. Center for Drug Evaluation and Research Application Number: 022529orig1s000 pharmacology review(s) [accessed May 15, 2023]. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2012/022529Orig1s000PharmR.pdf.
[6] Sharretts, J., Galescu, O., Gomatam, S., Andraca-Carrera, E., Hampp, C. and Yanoff, L. (2020). Cancer risk associated with lorcaserin — the FDA’s review of the CAMELLIA-TIMI 61 trial. N Engl J Med 383, 1000-1002.
[7] U.S. Food and Drug Administration. 2021. Determination that belviq (lorcaserin hydrochloride) tablets, 10 milligrams, and belviq xr (lorcaserin hydrochloride) extended-release tablets, 20 milligrams, were withdrawn from sale for reasons of safety or effectiveness. Federal Register. College Park, MD: The National Archives and Records Administration. p. 12697-12698.
[8] Kennedy, S. R., Schmitt, M. W., Fox, E. J., Kohrn, B. F., Salk, J. J., Ahn, E. H., Prindle, M. J., Kuong, K. J., Shen, J.-C., Risques, R.-A. et al. (2014). Detecting ultralow-frequency mutations by duplex sequencing. Nat Protoc 9, 2586-2606.
[9] Parsons, B. L., McKim, K. L. and Myers, M. B. (2017). Variation in organ-specific <i>Pik3ca</i> and <i>Kras</i> mutant levels in normal human tissues correlates with mutation prevalence in corresponding carcinomas. Environ Mol Mutagen 58, 466-476.
[10] Parsons, B. L. (2018). Modern conception of carcinogenesis creates opportunities to advance cancer risk assessment. Curr Opin Toxicol 11-12, 1-9.