Data from: Emerging drivers of urban aerosol increase global change vulnerability in a US megacity
Data files
Jun 03, 2026 version files 1.64 MB
-
Frog_NY_Data_Archive.xlsx
1.63 MB
-
README.md
4.64 KB
Abstract
This dataset describes speciated organic compounds observed at the FROG-NY 2023 field campaign in Mineola, New York. Submicron aerosol filters were collected on quartz fiber filters and analyzed by TD-GCxGC-EI-HR-ToF-MS. Detected compounds are identified by a unique code. Variability over the course of the measurement period is described in units of normalized instrument response and estimated concentration in ng/m3. Descriptive information regarding compounds is also provided in the form of their comparison to the NIST 2020 mass spectral database, unit mass resolution mass spectra, identified and predicted properties, and methods of quantification.
https://doi.org/10.5061/dryad.xpnvx0krd
Description of the data and file structure
Speciated Organic Compounds from GCxGC Analysis of Submicron Aerosol at FROG-NY 2023
Measurement Description:
Site Coordinates: 40.750, -73.638
Analysis Duration: 14 July 2023 - 14 August 2023
Sample Collection: Submicron aerosol (supermicron excluded by cyclone) collected on quartz fiber filters
Sample Analysis: TD-GCxGC-EI-HR-ToF-MS, as described in Franklin et al., 2021
File: Frog_NY_Data_Archive.xlsx
Sheet: Signal_Variability
Description: Time series of compound concentrations across all analyzed samples in units of instrument response
Variables
- Sample_startdate: Sample on time in EDT (GMT - 4)
- Sample_enddate: Sample off time in EDT (GMT - 4)
- All Others: Unique compound ID codes as described in Compound_Info "Compound_ID_Code"; cell data indicates blank-subtracted and internal sample normalized compound signal. Normalization method described in detail in Franklin et al., 2022.
Sheet: Quant_Variability
Description: Time series of compound concentrations across all analyzed samples in units of estimated concentration (ng/m³)
Variables:
- Sample_startdate: Sample on time in EDT (GMT - 4)
- Sample_enddate: Sample off time in EDT (GMT - 4)
- All Others: Unique compound ID codes as described in Compound_Info "Compound_ID_Code"; cell data indicates estimated compound concentration in air (ng/m³). Quantification methods described in detail in Franklin et al., 2022.
Sheet: Compound_Info
Description: Metadata related to each compound described in Signal_Variability and Quant_Variability
Variables:
- Compound_ID_Code: Unique compound identifier for purposes of tracing across campaign and archiving
- MS : Unit resolution mass spectrum. Structure as follows: mz, signal; mz, signal; mz, signal; etc. Note that samples were derivatized by MSTFA during analysis to enhance recovery of oxygenated organics, which is reflected in the mass spectra of compounds containing OH groups.
- Cluster: Compound cluster identity classes as identified by dynamic time warping hierarchical clustering and identity-based source assignment. "Biogenic OA" = biogenic secondary organic aerosol. "Biomass Burning" = primary or secondary biomass burning products. "Cooking" = cooking organic aerosol. "Hydrocarbons and PAHs" = hydrocarbons and polycyclic aromatic hydrocarbons. "VCPs" = volatile chemical products. "Ungrouped" = compounds that could not be assigned to any of these groupings. Note that groupings are primarily based on time series variability and cannot be considered definitive source assignments.
- Tracer_Identity: Compound tracer identity as defined by matches to the NIST library or UCB-GLOBES matches to previously identified species
- Compound.Name_NIST: Compound identity name of best compound match in the NIST 2020 mass spectral database. Note that hydrocarbons were not searched against the NIST database due to challenges related to duplication because of highly similar mass spectra of branched hydrocarbons.
- Library.Match.Factor_NIST: Compound match factor with identified best match compound in NIST 2020 database. Match factors range from 0 to 999.
- Property_Method: Property prediction methods for "Property_Cnum", "Property_OSc", "Property_H_C", and "Property O_C". "Identification" = compound properties were calculated from a the underivatized form of identified molecular formulae. "Ch3MS-RF" = compound properties were predicted using the Ch3MS-RF model (Franklin et al., 2023). "Not Included" = compound properties were not predicted, typically due to heteroatoms incompatible with Ch3MS-RF methodologies.
- Property_Cnum: Estimated carbon number of underivatized compound formula. Non-integer carbon number estimations are produced by Ch3MS-RF.
- Property_OSc: Estimated average carbon oxidation state of underivatized compound formula
- Property_H_C: Estimated H:C for underivatized compound formula
- Property_O_C: Estimated O:C for underivatized compound formula
- Quantification_Method: Quantification method used to produce Quant_Variability estimates in ng/m3 from Signal_Variability. "Direct or Proxy External Standard" = compound quantification was directly inferred from calibration curve of identical or highly chemically similar external standard compound. "Ch3MS-RF" = compound quantification factor was predicted by the Ch3MS-RF model (Franklin et al., 2023).
Submicron aerosol samples were collected on quartz fiber filters (sharp cut cyclone used to exclude supermicron particles). Filters were frozen between collection and analysis (within 2 weeks of campaign conclusion). Samples were analyzed by thermal desorption two dimensional gas chromatography coupled with electron ionization time-of-flight mass spectrometry (TD-GCxGC-EI-ToF-MS). During thermal desorption samples were derivatized with MSTFA to enhance recovery of polar organics. A 27-component internal standard was applied to each sample to enable correction for matrix effects. Raw data was processed, and compounds were detected using GC-Image. Compounds were assigned identity codes based on a subset of template samples, from which a custom mass spectral library of unique compound identities was developed. Compounds were traced across all samples using NIST mass spectral searching software comparing every compound and every sample to the custom mass spectral library formed from the templates. Instrument response for each compound was corrected for filter sampled time, abundance in field blanks, and matrix effects. Matrix effects were corrected by comparison to the signal deviation of the nearest 3 internal standard compounds. Compound quantifications were estimated based on calibration curves of known external compound species or were predicted using the Ch3MS-RF machine learning model. Compound mass spectra were compared to the NIST 2020 mass spectral database and searched other ambient observations catalogued in the UCB-GLOBES mass spectral database. Atmospherically-relevant compound properties were calculated based on identified compound formulae or predicted by the Ch3MS-RF machine learning model.
