Technology- and facility-level energy, cost, and environmental performance in U.S. chemicals, cement, iron and steel, food, and non-manufacturing industries
Data files
Nov 04, 2025 version files 6.27 MB
-
README.md
16.32 KB
-
US_industrial_facilities_and_technology_dataset.json
6.26 MB
Abstract
This U.S. industrial facilities and technology dataset is a technology- and facility-level collection of technological, cost, energy, and emissions attributes for six manufacturing and three non-manufacturing U.S. industries. The dataset is a JSON array organized by industry. Each industry entry (except for mining and agriculture) contains four sections: author list, assumptions, emerging technologies, and existing facilities. The non-manufacturing industry inventories 18 solutions across agriculture, mining and construction sectors and 6 categories, documenting qualitative benefits and quantitative energy/emissions reduction potentials with low/average/high estimates. The dataset integrates data exclusively from publicly available data sources including EPA's Greenhouse Gas Reporting Program, U.S. Geological Survey, industry reports, peer-reviewed research to provide a unified resource for energy systems modeling and analysis.
The assumptions section standardizes units and conversions and provides fuel and feedstock prices for operating expenditure (OPEX) calculations, Chemical Engineering Plant Cost Index (CEPCI) values for capital cost (CAPEX) harmonization, and price/inflation indices to align values to a common base year. Emerging technologies with minimal to no market share in U.S. commercial-scale production facilities are characterized across six industries: ammonia (10 processes including autothermal reforming, renewable hydrogen-based, biomass gasification, methane pyrolysis, and ATR with CCS); cement (26 processes including conventional wet/dry kiln variants, low-SCM dry kiln with preheater+precalciner, full CCS, and full electrification options); ethanol (8 processes including dry mill and wet mill variants, electrified process heat via heat pumps, and dry mill BAT with CCS); ethylene and propylene (21 processes including electrified steam cracking with electricity-cost bands, ethanol-to-ethylene, MTO, and NGL-to-olefins); iron and steel (10 processes including H2DRI–EAF, NGDRI–EAF with CCS, and molten oxide electrolysis, with varied scrap utilization scenarios); and food (9 cross-cutting process-heat decarbonization options such as hot-water and steam heat pumps, electric boilers, RNG/biogas boilers, and solar thermal steam).
The existing facilities section covers: ammonia (36 facilities across 20 states; SMR 97.2%, coal gasification 2.8%); cement (97 facilities across 35 states; conventional dry kiln 90.7%, wet kiln 9.3%); ethanol (201 facilities; dry mill 95.5%, wet mill 4.5%); ethylene/propylene (35 facilities across 6 states; steam cracking 100%); iron and steel (102 facilities across 31 states; EAF 86.4%, BF–BOF 7.8%, DRI 2.9%, hybrid BF–BOF/EAF 0.9%); and food (production and energy intensity by state and five subsectors: animal slaughtering, dairy, fruit and vegetable, grain and oilseed milling, and sugar).
The dataset offers a broad range of use cases through its standardized JSON structure and comprehensive documentation, potentially offering interoperability with common analytical tools. Primary uses envisioned for this dataset include energy systems optimization modeling, multisectoral and integrated assessment modeling of the industrial sector or the broader economy (but with higher fidelity of technology characterization), technology assessment comparing conventional and emerging production routes, spatially resolved production capacity planning analysis, and economic analysis of technology deployment costs. The dataset's facility-level granularity enables bottom-up modeling approaches while maintaining compatibility with top-down sectoral analyses. Technical features enhancing reusability include standardized coordinate systems (WGS84) for GIS integration, consistent economic units (2018 USD) for temporal comparisons, and modular data structure supporting selective extraction.
This document consolidates per-industry READMEs and augments them with field types and examples of the US_industrial_facilities_and_technology_dataset.json.
Data description
This dataset was assembled to enable engineering analysis of energy and material flows and technology assessments of various iron and steel manufacturing processes. Publicly available facility-level data (EPA GHGRP, USGS, EIA, industry reports, peer-reviewed literature) were curated, standardized (geocoding, technology classification), and harmonized. Process-stage energy/feedstock intensities and direct CO2 factors were cross-validated against mass/energy balances. Capital and operating costs were normalized using CEPCI and consistent price assumptions. Emerging routes (e.g., hydrogen-based DRI, CCS-enabled, electrolytic) were parameterized from literature/pilot studies. The JSON supports bottom-up modeling, spatial planning, and benchmarking. No proprietary data were used. The industries covered in the dataset include:
- Iron and steel
- Cement
- Ammonia
- Ethanol
- Ethylene and propylene
- Food (state-level industry characteristics)
- Non-manufacturing (agriculture, mining, construction; strategies/metadata)
Data provenance
Sources were curated, standardized, and cross-validated:
- U.S. EPA Greenhouse Gas Reporting Program (GHGRP)
- U.S. Energy Information Administration (EIA)
- U.S. Geological Survey (USGS)
- Industry reports and peer-reviewed literature
- Mission-specific datasets (e.g., Argonne GREET, DOE IEDO studies, Global Energy Monitor trackers)
QA/QC included geocoding, technology classification, energy/feedstock mass-balance checks, and CEPCI/inflation normalization for costs.
File format and top-level structure
File: US_industrial_facilities_and_technology_dataset.json
Encoding: UTF-8
Organization: A list of industry blocks; each block is segmented into:
- metadata
- assumptions
- technology_characteristics
- existing_facilities
Notes:
- The food block uses existing_industry_by_state (not facility-level).
- The nonmanufacturing block uses emission_reduction_strategies (not facility-level).
- “Assumptions” provide datasets for CEPCI, fuel/feedstock prices, inflation, units, and notes to compute costs and normalize values.
Units and boundaries
- Energy intensity: GJ per metric ton of product (GJ/t).
- Feedstock intensity: metric ton input per metric ton output (t/t).
- Direct/process emissions: metric ton CO2 per metric ton product (t CO2/t). “Process” excludes combustion unless noted; CCS capture rates may be included.
- Capacity: typically metric tons per year (t/y).
- Costs:
- CAPEX: normalized, with CEPCI and inflation references.
- Fixed O&M: ratio to CAPEX and/or fixed amounts per t product.
- Variable O&M: fuel and feedstock costs computed using Assumptions’ price tables.
- Boundaries: Intensities are conversion-stage (onsite) unless explicitly noted. Upstream production and delivery of fuels/feedstocks are excluded unless stated.
Technology readiness level (TRL)
- TRL follows IEA scale 1–9 (9 = fully commercial).
- Reported per technology/process stage; references provided per block.
Core field dictionary (common across industries)
metadata
- Description of the industry block, scope, coverage, references, and any block-specific notes.
assumptions
- abbreviations: key abbreviations for fuels, feedstocks, and processes.
- cepci: Capital cost index values and notes (reference year/index).
- fuel_price: price tables for fuels (units and date/period noted).
- feedstock_price: price tables for feedstocks (when applicable).
- inflation: CPI/PPI or other indices used to normalize historical values.
- units: canonical units and conversions used across the dataset.
- notes: boundary, calculation, and normalization notes.
technology_characteristics
- capacity: Dict; capacity arrays and references by process stage.
- Example keys: ironmaking, steelmaking; clinker_production; ammonia_production; ethylene_production; propylene_production.
- capacity: List of capacities (t/y or stage-specific basis).
- technology: string name of the technology (e.g., H2DRI, Electrified steam cracker).
- reference: citations for capacity ranges or basis.
- capex: Dict; normalized capital cost arrays and references by stage.
- capex: List of CAPEX values (units noted in references).
- reference: list of citations and normalization notes.
- notes: optional text for scope (brownfield vs greenfield, included systems).
- characteristics: Dict; descriptors of the technology.
- technology: canonical label (e.g., “EAF-0% scrap-H2DRI”).
- technology_id: unique identifier (string or integer).
- technology_status: “emerging” or “commercial”.
- Additional tech-specific attributes (e.g., clinker_to_cement_ratio).
- direct_emissions: Dict; process emissions and capture rates.
- process_ghg_emissions: t CO2/t product; may be null if not applicable.
- carbon_capture_rate (or carbon_capture_rate_from_process/combustion in cement): fraction of CO2 captured (0–1).
- reference: citations.
- energy_intensity: Dict; fuel/electricity use by stage and fuel type.
- Keys for stages (e.g., ironmaking, steelmaking; clinker_production; ammonia_production; ethylene_production).
- For each stage:
- electricity_intensity: GJ/t
- fuel intensities (GJ/t): natural_gas, industrial_coal, metallurgical_coal, residual_fuel_oil, distillate_fuel_oil, propane, steam, other (or technology-specific fuels like naphtha).
- alternative_energy_intensity and benchmarks (when present): bandwidth_study_2015, greet, usitc_2025 — alternative/reference intensities for comparison (GJ/t).
- notes and reference: boundary clarifications (e.g., whether feedstock energy is counted as fuel).
- feedstock_intensity: Dict; material input rates by stage.
- Keys vary by stage and industry:
- Ironmaking: bf_grade_pellets, sinter, dri_grade_pellets, electrolysis_grade_ore, limestone, coke, o2, etc. (t/t).
- Steelmaking: iron_intensity, scrap, charge_carbon, injected_carbon, lime, dolime, o2, ng_reductant_or_feedstock, etc. (t/t).
- Cement: limestone, clay, silica_sand, iron_ore (clinker stage); clinker, gypsum, fly_ash, blast_furnace_slag, other_additives (cement grinding) (t/t).
- Ammonia: natural_gas (feedstock), oxygen (if relevant), etc. (t/t NH3).
- Ethylene/propylene: feedstock rates (e.g., ethane, naphtha, methanol, ethanol) (t/t product basis noted).
- reference: citations.
- Keys vary by stage and industry:
- fixed_om_cost: Dict; fixed O&M parameters.
- ratio_of_capex: List of fractions per stage (e.g., 0.03 → 3% of CAPEX annually).
- fixed_om_cost or fixed_om_cost_in_addition_to_ratio_of_capex: List of fixed $/t values in addition to ratio.
- notes: modeling guidance for combining components.
- reference: citations.
- variable_om_cost: Dict; variable O&M costs.
- fuel_and_feedstock_cost: List of computed $/t using Assumptions’ prices and energy/feedstock_intensity.
- maintenance_operating_and_labor: optional $/t values (if provided).
- reference: Dict or String explaining calculation method and price sources.
- trl: Dict; TRL per stage, with references.
- Keys: stage names (e.g., ironmaking, steelmaking; clinker_production; ammonia_production; ethylene_production).
- Values: TRL integers (1–9).
- reference: source (e.g., IEA ETP Clean Energy Technology Guide).
- existing_facilities
- capacity: Dict; capacity by stage and references.
- Example keys: ironmaking, steelmaking; clinker_production; cement_grinding; ammonia_production; ethylene_production; propylene_production.
- capacity: numeric capacity per stage (t/y).
- feedstock_type: for ethylene/propylene, notes like “Multi-feed (ethane, propane, butane)”.
- reference: citations (e.g., trackers, USGS yearbooks).
- technology: installed technology (e.g., BF, BOF, EAF, SMR, Steam cracking).
- capex: Dict; normalized capital values and references (when available).
- capex: arrays of CAPEX values for relevant stages.
- reference: citations and normalization notes.
- characteristics: Dict; facility descriptors.
- facility_name, company, facility_id (internal), ghgrp_id (EPA), federal_register_id (EIS), global_steel_plant_monitor_id (industry tracker), eis_facility_id, naics, facility_status (operating, idle), vintage (years since online), address, city, county, state, zip_code, latitude, longitude.
- technology: installed technology route (e.g., BF-BOF).
- feedstock: for ethylene facilities, feedstock mix (text).
- production: Dict; production and utilization metrics.
- production: annual production (t/y) when available.
- capacity_utilization or capacity_utilization_rate: fraction (0–1).
- reference: citations (e.g., USGS average utilization).
- technology: stage technology (e.g., BOF).
- energy_intensity: Dict; stage energy intensities (GJ/t product) and references.
- Stage keys (ironmaking, steelmaking; clinker_production; ammonia_production; ethylene_production).
- Fuel keys (electricity, natural_gas, industrial_coal, residual_fuel_oil, distillate_fuel_oil, propane, steam, other).
- alternative_energy_breakdown may include:
- fuel_types: list of fuels used onsite.
- fuel_values: fractions (0–1) of total fuel energy by type.
- natural_gas_ratio: fraction (0–1).
- reference: source (e.g., FIED).
- notes: boundary clarifications (e.g., whether dilution steam is excluded from hydrocarbon feed).
- capacity: Dict; capacity by stage and references.
- feedstock_intensity: Dict; stage material intensities (t/t), references.
- Ironmaking examples: bf_grade_pellets, sinter, coke, limestone, o2.
- Steelmaking examples: iron_intensity, scrap, charge_carbon, injected_carbon, lime, dolime, o2, ng_reductant_or_feedstock.
- Cement examples: clinker/gypsum/additives for cement grinding; limestone/clay/sand/iron_ore for clinker.
- Ethylene examples: feedstock_intensity {ethane: 1.2, naphtha: 0} depending on facility feedstock.
- fixed_om_cost and variable_om_cost: Dicts (when available); same structure and meaning as in technology_characteristics.
- direct_emissions: Dict; process emissions and capture.
- process_ghg_emissions: t CO2/t stage output (e.g., BF ironmaking = 0.64 t CO2/t iron).
- greenhouse_gasrp_reported_carbon_dioxide_emissions_non_biogenic_metric_tons: facility-level annual value (cement examples).
- reference: citations.
- trl: Dict; TRL per stage (typically TRL 9 for commercial technologies), with references.
Industry-specific schema and content notes
Iron and steel
- Stages: ironmaking (BF, DRI/H2DRI) and steelmaking (BOF, EAF); combined routes may appear (ironmaking_steelmaking).
Cement
- Stages: raw_material_preparation, clinker_production, cement_grinding.
Ammonia
- Stage: ammonia_production (SMR, ATR, and variants).
Ethanol
- Technologies: dry mill and wet mill, with BAT variants.
Ethylene and propylene
- Technologies: steam cracking (NGL, naphtha, gas oils), electrified furnaces, hydrogen-fired furnaces, ethanol dehydration to ethylene, methanol-to-olefins (MTO).
Food
- Structure differs: existing_industry_by_state instead of facility-level records.
- characteristics: NAICS_code, industry_by_state_id, state, state_and_subsector_name, technology (e.g., “Conventional Grain and Oilseed Milling Processing”).
- production: production (t/y) for given year and technology
- energy_intensity: end_use_breakdown (e.g., conventional_boiler_use electricity/natural_gas shares), references (DOE IEDO).
- Emerging technologies: Hot Water Heat Pump entries with capacity_emerging_technology, capex_emerging_technology, characteristics (technology_id, status), direct_emissions (GHG reduction potential), TRL, references.
Nonmanufacturing
- Sections: nonmanufacturing_metadata, emission_reduction_strategies (for agriculture, mining, construction).
- Focus: strategies and metadata; no facility-level capacity/production entries.
Examples of variable names and content (by type)
- capacity (Dict)
- Structure: {stage_name: {capacity: [values], technology: "name", reference: [citations]}}
- Types: Integer/Float arrays; technology string; reference list of strings.
- Content: Minimum/average/maximum capacities per stage; example ranges drawn from industry trackers.
- capex (Dict)
- Structure: {stage_name: {capex: [values], reference: [citations], notes: "optional"}}
- Types: Float arrays; reference list.
- Content: Normalized capital costs in M (units noted in references); adjusted via CEPCI/inflation.
- characteristics (Dict)
- Emerging technologies: {technology, technology_id, technology_status, optional attributes (e.g., clinker_to_cement_ratio)}.
- Existing facilities: {facility_name, company, facility_id, ghgrp_id, federal_register_id, global_steel_plant_monitor_id, eis_facility_id, technology, feedstock, address, city, county, state, zip_code, latitude, longitude, naics, vintage, facility_status}.
- direct_emissions (Dict)
- Structure: {stage_name: {process_ghg_emissions: Float|Null, carbon_capture_rate: Float (0–1) or CCS split for cement, reference: String}}
- Content: Process CO2 per t product, CCS fractions, and facility-level non-biogenic annual emissions (cement).
- energy_intensity (Dict)
- Structure: {stage_name: {electricity: [GJ/t], natural_gas: [GJ/t], industrial_coal: [GJ/t], residual_fuel_oil: [GJ/t], distillate_fuel_oil: [GJ/t], propane: [GJ/t], steam: [GJ/t], other: [GJ/t]}, reference: String, notes: String}
- Optional benchmarking keys: alternative_energy_intensity (bandwidth_study_2015, greet, usitc_2025).
- Content: Conversion-stage energy use; per-fuel breakdown with references and boundary notes.
- feedstock_intensity (Dict)
- Structure: {stage_name: {material_key: Float (t/t), ...}, technology: String, reference: String}
- Content: Material consumption per t output; industry-specific keys (ore/pellets/scrap for steel; limestone/clinker/gypsum/additives for cement; NG/methanol/ethanol/naphtha for chemicals).
- fixed_om_cost (Dict)
- Structure: {stage_name: {ratio_of_capex: [Float|Null], fixed_om_cost_in_addition_to_ratio_of_capex or fixed_om_cost: [Float], reference: [String], notes: String}}
- Content: Annual fixed O&M modeled as % of CAPEX plus fixed $/t components.
- variable_om_cost (Dict)
- Structure: {stage_name: {fuel_and_feedstock_cost: [Float], maintenance_operating_and_labor: [Float|Null], reference: Dict|String}}
- Content: Computed $/t using Assumptions prices and energy/feedstock intensities; references note price basis.
- production (Dict)
- Structure: {stage_name: {production: Float|Integer, capacity_utilization: Float, reference: String, technology: String}}
- Content: Annual production, utilization rates, and references (e.g., USGS averages).
- trl (Dict)
- Structure: {stage_name: Integer, reference: String}
- Content: TRL by stage (IEA scale).
- alternative_energy_breakdown (Dict, existing facilities)
- Structure: {fuel_types: [String], fuel_values: [Float fractions], natural_gas_ratio: Float, reference: String, notes: String}
- Content: Plant fuel mix shares for cross-checking intensity breakdowns.
Code/software
Any text editor or viewer can open the primary JSON file:
- Text editors: Notepad, Notepad++, Visual Studio Code, etc.
Recommendation: Use Python, Julia, or R to extract, validate, and analyze the data programmatically.
Access information
Data was derived from the following sources:
- U.S. EPA Greenhouse Gas Reporting Program (GHGRP)
Program page: https://www.epa.gov/ghgreporting
Facility-level access (FLIGHT): https://ghgdata.epa.gov/ghgp/main.do - U.S. Energy Information Administration (EIA)
Fuel prices and industrial energy data: https://www.eia.gov/ - U.S. Geological Survey (USGS)
Mineral/industry statistics: https://www.usgs.gov/ - Industry reports and peer-reviewed literature
Technology specifications, costs, and performance parameters
