Fourier transform ion cyclotron resonance (FT ICR) mass spectrometry data
Data files
Nov 26, 2024 version files 13.47 MB
Jan 15, 2025 version files 13.47 MB
Jan 18, 2025 version files 13.47 MB
Abstract
Understanding dissolved organic matter (DOM) transformation is crucial for comprehending soil biogeochemical cycling. However, the extent that soil microbes mediate DOM transformation at the molecular level, and whether this is regulated by soil management practices such as fertilization remain largely unknown. We investigated soil DOM transformations under long-term fertilization using Fourier-transform ion cyclotron resonance mass spectrometry, high-throughput sequencing, and machine learning. Fertilization greatly promoted transformation potential of DOM molecules. Notably, organic fertilization increased the mean transformation number of DOM molecules by 260% compared to no-fertilization, while chemical fertilization increased it by 193.33%. Machine learning indicated that intrinsic DOM molecular characteristics (aromaticity index, oxygen/carbon, and hydrogen/carbon ratios) could predict transformation potential, especially for medium- or low-transformation-potential molecules. However, high-transformation-potential DOM molecules were more influenced by soil microorganisms that contributed to DOM transformation (e.g., Desulfobacterota). Our study provides a parameter to characterize potential transformation capacity of DOM molecules, the effects of different fertilization treatments on this potential, and highlights microbial contributions to molecular transformation processes, identifying the key microbial groups.
https://doi.org/10.5061/dryad.pvmcvdnw4
Description of the data and file structure
The soil samples were collected from a long-term paddy field experiment site in Yingtan City, Jiangxi Province, China (28°15′30″N, 116°55′30″E). The DOM was extracted from soil samples (6 g) using a soil-to-water ratio of 1:5 and ultrapure water. The collected DOM eluates were stored at −20°C in darkness prior to electrospray ionization FT-ICR-MS analysis. The DOM data obtained were processed using the Data Analysis software (Bruker Daltonics version 4.2).
Files and variables
File: Fourier_transform_ion_cyclotron_resonance_(FT_ICR)_mass_spectrometry_data.zip
Description: This data includes Fourier high resolution mass spectrometry data of DOM molecules under different fertilization treatments (21 tables represent 21 samples). Soil samples were collected in November 2021 from the 0–20 cm depth range, with three replicates for each of the seven different fertilization treatments, 21 tables represent the following 7 types of treatments (7 x 3): (1) CK, plots with no fertilization added and complete removal of crop straw (1-3); (2) NP, plots with 115 kg ha−1 N and 68 kg ha−1 P2O5 per season (4-6); (3) NK, plots with 115 kg ha−1 urea and 41 kg ha−1 K2O per season (7-9); (4) NPK, plots with 115 kg ha−1 N, 68 kg ha−1 P2O5, and 41 kg ha−1 K2O per season (10-12); (5) NPKst, plots with NPK chemical fertilization plus 2500 kg ha−1 rice straw per season (13-15); (6) NPKpm, plots with NPK chemical fertilization plus 2500 kg ha−1 pig manure per season (16-18); and (7) NPKgm, plots with NPK chemical fertilization plus 2500 kg ha−1 green manure (*Astragalus sinicus *L.) per season (19-21).
The parameters in the table are described as follows:
category: Identifies the compound category; here, "CHO" likely represents compounds containing Carbon (C), Hydrogen (H), and Oxygen (O).
isotope kind: Indicates the isotopic variant or type used for identification (empty here, so no isotope labeling is specified).
C, H, N, O, S, P: Counts of Carbon (C), Hydrogen (H), Nitrogen (N), Oxygen (O), Sulfur (S), and Phosphorus (P) atoms in the molecular formula.
theo: Theoretical mass of the compound, calculated based on its molecular formula.
m/z: Mass-to-charge ratio, the measured value in the mass spectrometer.
intensity: Signal intensity from the mass spectrum, indicating the abundance of the compound.
s/n: Signal-to-noise ratio, assessing the quality or detectability of the signal.
ppm: Error in parts per million (ppm) between the theoretical and observed m/z values, indicating mass accuracy.
class: Compound classification, such as the presence of certain functional groups (e.g., "O3" indicates three oxygen atoms as a key feature).
DBE: Double Bond Equivalent, which reflects the degree of unsaturation (number of double bonds and rings in the molecule).
KMD: Kendrick Mass Defect, a metric for analyzing patterns of homologous compounds in mass spectrometry.
mole: Molecular formula of the compound (e.g., C8H8O3 for the first row).
When N (Nitrogen) and P (Phosphorus) columns contain "NULL," it implies that the compound does not have any nitrogen or phosphorus atoms in its molecular structure. In this context, the molecular formulas (e.g., C8H8O3 and C9H12O2) confirm this, as they do not include N or P.
the absence of these elements (represented by "NULL" or an empty value) means they are simply not present in the given compounds. This is typical for some categories of compounds, like those classified as "CHO," which primarily consist of carbon, hydrogen, and oxygen.
Version changes
15-Jan-2025: Added Supplementary Table 2 (Zenodo): Maximal Transformation Number of Every Specific Molecule, which presents the maximal transformation number for each specific molecule, quantifying the maximum number of transformations each molecule can undergo. Added Supplementary Table 3 (Zenodo): The Key Molecular Formulas and Their Potential Transformation Number, which lists key molecular formulas along with their potential transformation numbers, based on an analysis of molecular characteristics and transformation reactions. Added Supplementary Table 4 (Zenodo): Functional Genes Associated with the Transformation Process, which contains genes identified through genomic analysis that are associated with the transformation process, providing insights into their potential roles.
18-Jan-2025: Change the names of Supplementary Tables 2, 3 and 4 to Supplementary Data 1, 2 and 3.
Code/software
The assigned molecular formulas were classified into distinct categories based on the ratios of oxygen to carbon (O/C) and hydrogen to carbon (H/C) as follows: lipids for H/C = 1.5–2.0, O/C = 0–0.3; aliphatic/proteins for H/C = 1.5–2.2, O/C = 0.3–0.67; lignin/CRAM-like for H/C = 0.7–1.5, O/C = 0.1–0.67; carbohydrates for H/C = 1.5–2.4, O/C = 0.67–1.2; unsaturated hydrocarbons for H/C =0.7–1.5, O/C = 0–0.1; aromatic structures for H/C = 0.2–0.7, O/C =0–0.67; and tannin for H/C = 0.6–1.5, O/C = 0.67–1.0. By performing PMD-based metabolomics analysis using the R package ‘pmd’ in R version 4.2.3, we constructed a peak list and applied the ‘GlobaStd’ function to identify independent peaks within the list. The ‘getrda’ function was used for targeted analysis of MS data. Subsequently, the ‘getstd’ function was employed to remove adducts, neutral losses, and common fragment ions27. All statistical analyses were performed using R (v4.2.3).
Access information
Other publicly accessible locations of the data:
- The raw sequence data reported in this paper are available in the NCBI Sequence Read Archive repository under the BioProject ID of PRJNA1134104 (https://www.ncbi.nlm.nih.gov/sra/PRJNA1134104).
Data was derived from the following sources:
- The Fourier transform ion cyclotron resonance mass spectrometry
This data includes Fourier high resolution mass spectrometry data of dissolved organic matter (DOM) molecules under different fertilization treatments
The DOM was extracted from soil samples (6 g) using a soil-to-water ratio of 1:5 and ultrapure water. Ultrapure water used for all experiments and solutions had a resistivity of 18.2 MΩ·cm at 25℃ and a total organic carbon content lower than 5 ppb. The mixture was subjected to shaking for 12 h at room temperature on a horizontal shaker. Subsequently, the solutions were centrifuged at 1200 × g for 10 min and filtered through a 0.45 μm membrane filter. For clean-up, HPLC methanol (10 mL) and acidified ultrapure water (10 mL, pH 2) were passed through PPL cartridges (Agilent Technologies, Santa Clara, CA, USA). The DOM solution was then loaded onto the PPL cartridges by gravity flow. Following that, DOM was collected from the cartridges using 10 mL of methanol (HPLC grade; Merck, Darmstadt, Germany). The collected DOM eluates were stored at −20°C in darkness prior to electrospray ionization FT-ICR-MS analysis.
A deuterated octadecanoic acid compound was added to the samples as an internal standard, with a dosage of 15 μL (5 × 10−7 mol L−1) per milliliter of the sample. The FT-ICR MS instrument (Bruker, Billerica, MA, USA) utilized a 9.4 Tesla actively shielded superconducting magnet in negative-ion mode. Each sample was injected into the ESI source at a flow rate of 180 μL h−1 using a syringe pump. The polarization voltage was set at 4.0 kV, while the capillary column introduction and outlet voltage were 4.5 kV and 320 V, respectively. Ions were accumulated in the hexapole for 0.001 s before being transferred to the ICR cell. The mass-to-charge ratio (m/z) range analyzed was 150–800 Da. A time-domain signal acquisition with a 4 M word size was selected. Signal-to-noise ratio and dynamic range were enhanced through the accumulation of 128 domain FT-ICR transients.
The data obtained were processed using the Data Analysis software (Bruker Daltonics version 4.2). The raw spectra were converted into a list of mass-to-charge ratio (m/z) values using the FTMS peak picker algorithm, with a signal-to-noise (S/N) threshold of 6 and an absolute intensity threshold of 100. To minimize cumulative errors, all peaks from the entire dataset were aligned to each other, ensuring elimination of potential mass shifts. The molecular formulas of the identified mass peaks were determined using custom software designed for this purpose. The assigned molecular formulas were classified into distinct categories based on the ratios of oxygen to carbon (O/C) and hydrogen to carbon (H/C) as follows: lipids for H/C = 1.5–2.0, O/C = 0–0.3; aliphatic/proteins for H/C = 1.5–2.2, O/C = 0.3–0.67; lignin/CRAM-like for H/C = 0.7–1.5, O/C = 0.1–0.67; carbohydrates for H/C = 1.5–2.4, O/C = 0.67–1.2; unsaturated hydrocarbons for H/C =0.7–1.5, O/C = 0–0.1; aromatic structures for H/C = 0.2–0.7, O/C =0–0.67; and tannin for H/C = 0.6–1.5, O/C = 0.67–1.0