Data from: The specificity and structure of DNA crosslinking by the gut bacterial genotoxin colibactin
Data files
Oct 09, 2025 version files 5.68 GB
-
Figure_1_-_AT-rich_Cleavage.zip
1.28 GB
-
Figure_3_14mer_ICL_Mass_Determination.zip
16.05 MB
-
Figure_S11_3-Deaza-dAdo_Cleavage.zip
1.54 GB
-
Figure_S12_25mer_ICL_Mass_Determination.zip
6.21 MB
-
Figure_S2_-_Cleavage_Assay_-_AAATT.zip
103.74 MB
-
Figure_S7_GC-containing_Cleavage.zip
2.74 GB
-
README.md
4.45 KB
Abstract
We report here LC-MS data, used in conjunction with NMR analysis, to elucidate the specificity and structure of interstrand crosslinks (ICL) of DNA resulting from the genotoxin colibactin. The measured masses of 14- and 25-mer double strand DNA oligomers exposed to colibactin provided for the determination of the chemical structure of colibactin-ICL. It revealed an α-ketoiminium in the central region of colibactin that likely serves as a key DNA recognition element, explaining colibactin’s sequence selectivity. A strand cleavage assay with high resolution LC-MS detection was used to study the site-specific products of various base sequences of 25mer double-strand DNA oligomer sequences containing colibactin ICLs. This data was used to understand how and when colibactin-ICLs form.
Dataset DOI: 10.5061/dryad.vmcvdnd5g
Description of the data and file structure
Data was acquired on Thermo Scientific Orbitrap Fusion and Lumos mass spectrometers and therefore the files are the vendors .raw data format. It can be processed directly using the vendors Xcalibur FreeStyle or Qualbrowser (older but still functional). An alternative free software which could be used is OpenChrom - an open-source alternative that can handle various mass spectrometry data formats.
The samples with names containing "clb-" in the name are control samples which were exposed to e.coli which do not have the clb gene and therefore do not produce colibactin. The samples with "clb+" in the name are those which do produce colibactin. The data which was collected in triplicate has the each individual replicates indicated in its name so the first replicate has "Rep1" in the file name, the second replicate has "Rep2" in the file name, and the third replicate has "Rep3" in the file name. The varying part of the sequence is indicated in the file name.
The data used to determine the mass of the colibactin interstrand crosslinked double strand 14-mer oligomer (15-17 min retention time) is in the Figure_3_14mer_ICL_Mass_Determination.zip folder.
The data used to determine the mass of colibactin interstrand crosslinked double strand 25-mer oligomer (15.3-15.7 min retention time) is in the Figure_S12_25mer_ICL_Mass_Determination.zip folder.
The data used to determine the masses of the 3-Deaza-dAdo cleavage samples (32.3-35.2 min retention time) is in the Figure_S11_3-Deaza-dAdo_Cleavage.zip folder.
The data used to determine the masses of the AT-rich cleavage samples (32-35.2 min retention time) is in the Figure_1_-_AT-rich_Cleavage.zip folder.
The data used to illustrate how the cleavage data analysis works (32-35.2 min retention time) is contained in the Figure_S2_-_Cleavage_Assay_-_AAATT.zip folder.
The data used to determine the masses of the GC-containing cleavage samples (32.5-35.7 min retention time) is in the Figure_S7_GC-containing_Cleavage.zip folder.
Oligonucleotide Data Analysis
Oligonucleotide data analysis was performed using Thermo Scientific’s Protein Deconvolution and FreeStyle software packages and the online Mongo Oligo Mass Calculator tool.
Cleavage Assay MS Data Analysis
Identifications and relative abundance measurements of the base treatment-induced cleavage sites of the colibactin-exposed double strand oligonucleotides were made using isotopically-resolved charge-state mass deconvolution of their LC-HRAM-MS spectra. The molecular formulas of the base-treatment cleavage products are the same as the “w” and “d” product ions formed upon MS2 collisional induced dissociation (CID) of negatively charged unmodified single strand oligonucleotides (64). The “w” and “d” product ions for each single strand oligonucleotide were calculated online using the Mongo Oligo Mass Calculator v2.06 (http://mass.rega.kuleuven.be/mass/mongo.htm) with the “CID fragments” feature and the “monoisotopic mass”, “negative mode”, “DNA”, and the “5’-OH” and “3’-OH” terminals selected. Mass identities were verified by comparing to calculated masses of proposed structures in ChemDraw. The measured masses of the cleavage products were determined by deconvoluting the multiple charge states seen in full scan spectra acquired during the cleavage product retention period using Freestyle software (Thermo Scientific, Waltham, MA). The deconvoluted experimental masses from FreeStyle were compared to cleavage products masses calculated using the Mongo software (accounting for the charge state) to assign the cleavage products. The assigned cleavage products were then used to identify the location of the colibactin adduct of the intact oligonucleotide. Signal intensities from both the [M+H] and [M + Na] ions from each cleavage product were summed and normalized to the amount of DNA injected. Values plotted are the difference between the average signal intensities observed in assays with pks+ and pks– E. coli.
Code/software
Thermo Scientific Protein Deconvolution and Xcalibur FreeStyle and Qualbrowser software packages
