This dataset contains timeseries data from hydronic heating (or heating hot water (HHW))) systems in 216 buildings across 49 US organizations. The dataset comprises over 100 million measurements taken by building automation systems from 2014-2023. A typical building’s dataset contains measured supply and return water temperature, flow rate, output heating power (or heating load), system state and outdoor temperature spanning 2.2 years (15-minute interval), though the types, span, and interval of data vary based on what was available for each building. Pump and boiler data are available for smaller subsets of buildings. The dataset also includes a broad range of metadata characteristics related to both the building and the HHW system, such as floor area, year of construction, building type, climate zone, heating system type, and heating design day temperature. Further heating system metadata such as equipment design capacity, nominal efficiency, and minimum turndown are available for a smaller subset of buildings. This dataset also includes a timeseries visualization for each building in the dataset. Last, the dataset also includes the R code and markdown used to analyse and visualize the data in the associated journal paper. The README.txt file contains a detailed description of each of the files included and its contents.

Description of the data and file structure

Please note that the open-access journal paper associated with this dataset contains further information regarding this dataset, including software code used to analyze the data, visualize it, and create the manuscript file submitted to the journal. The journal paper analyzes a total of 259 buildings. The public dataset stored on Dryad contains the data from 216 buildings which is a subset of the 259 buildings that includes all the buildings which the donors allowed the timeseries data to be shared. The supplementary material associated with the journal paper includes the metadata and high level summary statistics for the full 259 building dataset, but not the underlying timeseries data.

metadata.csv

This is a csv (comma separated value) format file. This anonymized public dataset contains metadata where each row of the table describes a unique building and it’s characteristics, where available.

Data contents:

tag: a number that uniquely identifies a specific building
org: a string that uniquely identifies an organization that provided data for one or more buildings
area: gross floor area (m2). For further anonymization, floor area rounded to two significant digits (in meters squared). The full dataset used in the paper uses these values at originally available precision.
year: year of construction (or last major renovation). For further anonymization, year of construction is rounded to the nearest decade. The full dataset used in the paper uses these values at originally available precision.
bldg_type: string describing type of building
climate: ASHRAE climate zone
t_hdd: heating design day outdoor drybulb temperature
system: string representing type of HHW system, either “Condensing”: known to be a condensing gas boiler, “Non-condensing”: known to be a non-condensing gas boiler, “Boiler”: known to be a boiler, but unknown what type, “District HW”: a campus or district HHW system, with or without a heat exchanger at the building, “District Steam”: a distrcit or campus steam system, with a HHW heat exchanger at the building
b_model: name of boiler model, where applicable and known
b_manufacturer: name of boiler manufacturer, where applicable and known
b_input: nominal input power of smallest boiler model, where applicable and known (W)
b_output: nominal output power of smallest boiler model, where applicable and known (W)
b_efficiency: boiler nominal efficiency, where applicable and known (fraction)
b_min_turndown: minimum turndown of smallest boiler model, where applicable and known (fraction of output)
b_min_flow: minimum flow requirement of smallest boiler model, where applicable and known (l/s)
b_redundancy: level of redundancy provided by one boiler (fraction). For example, a value of 0.75 for a system with two boilers indicates that each boiler was sized to provided 75% of the design heating load required and the total installed system capacity is 150% of the design heating load.
b_number: number of boilers, total, in system
design_supply: original design supply water temperature for the system (degC)
design_return: original design return water temperature for the system (degC)
system_hl: higher level HHW system type, either “Boiler”: any kind of boiler or “District”: any kind of campus or district system, either steam or hot water
bldg_type_hl: higher level building type, either: “Office”, “Medical Office”, “Library”, or “Other”
decade: decade of construction

initial_data.rds

This is an RDS format file. It contains data prior to processing for all of the publicly available buildings in the dataset, typically available at 15-minute intervals though in rare cases the original data was only available at hourly intervals. The data cleaning steps prior to writing this file were to: convert the data from a wide variety of different building automation system output formates into a standard data format, column naming structure, and unit system. In some cases data was available at higher frequency than 15 minutes (e.g. 5 minute), in this case it was averaged (with NA values ignored).

Data contents:

sup: supply water temperature entering the building (degC).
sup_stpt: setpoint for supply water temperature entering the building (degC).
ret: return water temperature leaving the building (degC).
flow: flow rate of water entering the building (l/s).
hw: heating power (or load) being supplied to the building (W).
enab: a clear indicator of system state as either operating or not operating (1/0).

hhw_system_data.rds

This is an RDS format file. It contains hourly average data for all of the publicly available buildings in the dataset, after the data processing steps described in the paper, briefly described below:
a) remove infeasible high and low outlier data,
b) ensure a consistent time interval between datapoints for each building by interpolating missing timesteps with NA values for all columns
c) estimate the operating state of the system using the data available for that building
d) average the data (with NA values ignored) at hourly intervals
e) merge the data with the closest publicly available weather station data to obtain outdoor drybulb temperature. In rare cases where the building information was provided completely anonymously to the research team, this value is the outdoor temperature measurement from the building automation system data as the location of the building was not known as a mataching weather station could not be identified.

Data contents:

(local datetime features for convenience: hr - hour of day, wd - day of week, mnth - month of year, season - season, yr - year, dt - date)
sup: supply water temperature entering the building (degC).
sup_stpt: setpoint for supply water temperature entering the building (degC).
ret: return water temperature leaving the building (degC).
flow: flow rate of water entering the building (l/s).
hw: heating power (or load) being supplied to the building, as measured using the concurrent flow and temperature difference (sup-ret) (W). Where this information was available on the automation system, the value was used directly. Where directly calculated from the building automation system it was calculated using the isobaric heat capacity of water at 70degC, ie. hw = 4190 x flow x (sup-ret).
enab: a clear indicator of system state as either operating, or not operating (1/0, fractional values indicate system was enabled for part of the hour).
oper: estimated operating state of the system (1/0, fractional values indicate the system was estimated to be operating for part of the hour)
t_out: outdoor drybulb temperature (degC)

ancilliary_data.rds

This is an RDS format file. It contains any of the less common timeseries data that was rarely available at some sites and that is not already included in the initial_data.rds or hhw_system_data.rds files. It is in ‘long’ file format to reduce storage space required, with the ‘variable’ column describing the type of data, and the ‘value’ column representing the value at that point in time.

Data contents:

retp: common return water temperature on the primary circulation circuit (degC)
ret[N]: return water temperature enterring a specific boiler (i.e. the boiler inlet water temperature) (degC)
supp: common supply water temperature on the primary circulation circuit (degC)
ret[N]: supply water temperature leaving a specific boiler (i.e. the boiler outlet water temperature) (degC)
fire[N]: the firing rate of a specific boiler (original units)
pmp[N]_spd: the pump speed of a specific pump (normalized from 0-1)
pmp[N]_freq: the output frequency of the variable frequency drive serving a specific pump (original units, can typically be assumed to be Hz)
pmp[N]_pwr: the power consumption of a specific pump (W)
dp: end-of-line differential pressure (Pa)
dp_stpt: end-of-line differential pressure setpoint (Pa)
gas: natural gas consumption measured at the supply to the boiler plant (W, converted without normalizing for gas contents - i.e. assuming a therm factor of 1)
gas_u: natural gas consumption measured at the utility meter (W, converted without normalizing for gas contents - i.e. assuming a therm factor of 1)

csv/[N]_initial_data.csv, csv/[N]_hhw_system_data.csv, csv/[N]_ancilliary_data.csv

These are csv files. The data contents are the same as the .rds files, though in larger, uncompressed format, and individually disaggregated by individual building (i.e. each unique tag). Note, to reduce disk space: a) each file excludes any columns that entirely consist of NA values for a specific building and b) the ancilliary_data.csv files are in ‘wide’ vs ‘long’ format (to remove duplicate timestamps)

vars_available_by_building.csv

This is a csv (comma separated value) format file. This notes the data available (i.e. at least one value that is not NA) for each building in the dataset.

all_hours_summary_stats.csv

For convenience, this file contains statistical summary data for all of the buildings used in the paper, assessed at hourly intervals (i.e., the contents of hhw_system_data.rds plus the data from non-publicly available buildings). It first describes the start and end date and number of days spanned by the data for each building (i.e., tag). For each variable, it states the mean, standard deviation, interquartile range, skewness, min, 5th, 10th, 25th, 50th,75th, 90th,95th, max, number of NA and non-NA values, the fraction of the dataset that is an NA value.

operating_hours_summary_stats.csv

This file is the same as the previous file except that the underlying dataset being summarized is first filtered to contain only time periods where the system was estimated to be operating (i.e. oper >0).

timeseries_figs/[N]_timeseries.jpeg

These are jpeg image files. The N signifies the building tag. The files contains a timeseries visualization of the primary data for each of the publicly available buildings, showing the full range of operating state, supply and return temperature, supply to return temperature difference, flow, load, and enabled state.

General data notes

The datetime_UTC value represents the datetime in UTC timezone. Any other datetime related features (e.g. ‘hr’), or datetimes displayed on visualizations, are in local time at the building. Buildings that were provided totally anonymously use a US/Pacific timezone.
All datasets use SI units of degree celsius, liter/sec, watt, or pascal. Some visualizations also include dual units, as noted on the figure, for convenience.The ‘tag’ value represents a unique building identifier which can be used to join datasets with each other (e.g. joining the metadata to the timeseries data).

Data from hydronic heating systems in 216 commercial buildings

Data files

Abstract

Description of the data and file structure

metadata.csv

Data contents:

initial_data.rds

Data contents:

hhw_system_data.rds

Data contents:

ancilliary_data.rds

Data contents:

csv/[N]_initial_data.csv, csv/[N]_hhw_system_data.csv, csv/[N]_ancilliary_data.csv

vars_available_by_building.csv

all_hours_summary_stats.csv

operating_hours_summary_stats.csv

timeseries_figs/[N]_timeseries.jpeg

General data notes

Data from hydronic heating systems in 216 commercial buildings

Data files

Abstract

README: Data from hydronic heating systems in 216 commercial buildings

Description of the data and file structure

metadata.csv

Data contents:

initial_data.rds

Data contents:

hhw_system_data.rds

Data contents:

ancilliary_data.rds

Data contents:

csv/[N]_initial_data.csv, csv/[N]_hhw_system_data.csv, csv/[N]_ancilliary_data.csv

vars_available_by_building.csv

all_hours_summary_stats.csv

operating_hours_summary_stats.csv

timeseries_figs/[N]_timeseries.jpeg

General data notes

Methods

Works referencing this dataset