Data from hydronic heating systems in 216 commercial buildings
Data files
Jul 27, 2024 version files 1.34 GB
-
all_hours_summary_stats.csv
396.30 KB
-
ancilliary_data.rds
45.85 MB
-
csv.zip
505.30 MB
-
hhw_system_data.rds
96.07 MB
-
initial_data.rds
295.55 MB
-
metadata.csv
25.85 KB
-
operating_hours_summary_stats.csv
396.70 KB
-
README.md
10.91 KB
-
timeseries_figs.zip
398.93 MB
-
vars_available_by_building.csv
22.51 KB
Abstract
This dataset contains timeseries data from hydronic heating (or heating hot water (HHW))) systems in 216 buildings across 49 US organizations. The dataset comprises over 100 million measurements taken by building automation systems from 2014-2023. A typical building’s dataset contains measured supply and return water temperature, flow rate, output heating power (or heating load), system state and outdoor temperature spanning 2.2 years (15-minute interval), though the types, span, and interval of data vary based on what was available for each building. Pump and boiler data are available for smaller subsets of buildings. The dataset also includes a broad range of metadata characteristics related to both the building and the HHW system, such as floor area, year of construction, building type, climate zone, heating system type, and heating design day temperature. Further heating system metadata such as equipment design capacity, nominal efficiency, and minimum turndown are available for a smaller subset of buildings. This dataset also includes a timeseries visualization for each building in the dataset. Last, the dataset also includes the R code and markdown used to analyse and visualize the data in the associated journal paper. The README.txt file contains a detailed description of each of the files included and its contents.
https://doi.org/10.5061/dryad.t4b8gtj8n
Description of the data and file structure
Please note that the open-access journal paper associated with this dataset contains further information regarding this dataset, including software code used to analyze the data, visualize it, and create the manuscript file submitted to the journal. The journal paper analyzes a total of 259 buildings. The public dataset stored on Dryad contains the data from 216 buildings which is a subset of the 259 buildings that includes all the buildings which the donors allowed the timeseries data to be shared. The supplementary material associated with the journal paper includes the metadata and high level summary statistics for the full 259 building dataset, but not the underlying timeseries data.
metadata.csv
This is a csv (comma separated value) format file. This anonymized public dataset contains metadata where each row of the table describes a unique building and it’s characteristics, where available.
Data contents:
tag: a number that uniquely identifies a specific building
org: a string that uniquely identifies an organization that provided data for one or more buildings
area: gross floor area (m2). For further anonymization, floor area rounded to two significant digits (in meters squared). The full dataset used in the paper uses these values at originally available precision.
year: year of construction (or last major renovation). For further anonymization, year of construction is rounded to the nearest decade. The full dataset used in the paper uses these values at originally available precision.
bldg_type: string describing type of building
climate: ASHRAE climate zone
t_hdd: heating design day outdoor drybulb temperature
system: string representing type of HHW system, either “Condensing”: known to be a condensing gas boiler, “Non-condensing”: known to be a non-condensing gas boiler, “Boiler”: known to be a boiler, but unknown what type, “District HW”: a campus or district HHW system, with or without a heat exchanger at the building, “District Steam”: a distrcit or campus steam system, with a HHW heat exchanger at the building
b_model: name of boiler model, where applicable and known
b_manufacturer: name of boiler manufacturer, where applicable and known
b_input: nominal input power of smallest boiler model, where applicable and known (W)
b_output: nominal output power of smallest boiler model, where applicable and known (W)
b_efficiency: boiler nominal efficiency, where applicable and known (fraction)
b_min_turndown: minimum turndown of smallest boiler model, where applicable and known (fraction of output)
b_min_flow: minimum flow requirement of smallest boiler model, where applicable and known (l/s)
b_redundancy: level of redundancy provided by one boiler (fraction). For example, a value of 0.75 for a system with two boilers indicates that each boiler was sized to provided 75% of the design heating load required and the total installed system capacity is 150% of the design heating load.
b_number: number of boilers, total, in system
design_supply: original design supply water temperature for the system (degC)
design_return: original design return water temperature for the system (degC)
system_hl: higher level HHW system type, either “Boiler”: any kind of boiler or “District”: any kind of campus or district system, either steam or hot water
bldg_type_hl: higher level building type, either: “Office”, “Medical Office”, “Library”, or “Other”
decade: decade of construction
initial_data.rds
This is an RDS format file. It contains data prior to processing for all of the publicly available buildings in the dataset, typically available at 15-minute intervals though in rare cases the original data was only available at hourly intervals. The data cleaning steps prior to writing this file were to: convert the data from a wide variety of different building automation system output formates into a standard data format, column naming structure, and unit system. In some cases data was available at higher frequency than 15 minutes (e.g. 5 minute), in this case it was averaged (with NA values ignored).
Data contents:
sup: supply water temperature entering the building (degC).
sup_stpt: setpoint for supply water temperature entering the building (degC).
ret: return water temperature leaving the building (degC).
flow: flow rate of water entering the building (l/s).
hw: heating power (or load) being supplied to the building (W).
enab: a clear indicator of system state as either operating or not operating (1/0).
hhw_system_data.rds
This is an RDS format file. It contains hourly average data for all of the publicly available buildings in the dataset, after the data processing steps described in the paper, briefly described below:
a) remove infeasible high and low outlier data,
b) ensure a consistent time interval between datapoints for each building by interpolating missing timesteps with NA values for all columns
c) estimate the operating state of the system using the data available for that building
d) average the data (with NA values ignored) at hourly intervals
e) merge the data with the closest publicly available weather station data to obtain outdoor drybulb temperature. In rare cases where the building information was provided completely anonymously to the research team, this value is the outdoor temperature measurement from the building automation system data as the location of the building was not known as a mataching weather station could not be identified.
Data contents:
(local datetime features for convenience: hr - hour of day, wd - day of week, mnth - month of year, season - season, yr - year, dt - date)
sup: supply water temperature entering the building (degC).
sup_stpt: setpoint for supply water temperature entering the building (degC).
ret: return water temperature leaving the building (degC).
flow: flow rate of water entering the building (l/s).
hw: heating power (or load) being supplied to the building, as measured using the concurrent flow and temperature difference (sup-ret) (W). Where this information was available on the automation system, the value was used directly. Where directly calculated from the building automation system it was calculated using the isobaric heat capacity of water at 70degC, ie. hw = 4190 x flow x (sup-ret).
enab: a clear indicator of system state as either operating, or not operating (1/0, fractional values indicate system was enabled for part of the hour).
oper: estimated operating state of the system (1/0, fractional values indicate the system was estimated to be operating for part of the hour)
t_out: outdoor drybulb temperature (degC)
ancilliary_data.rds
This is an RDS format file. It contains any of the less common timeseries data that was rarely available at some sites and that is not already included in the initial_data.rds or hhw_system_data.rds files. It is in ‘long’ file format to reduce storage space required, with the ‘variable’ column describing the type of data, and the ‘value’ column representing the value at that point in time.
Data contents:
retp: common return water temperature on the primary circulation circuit (degC)
ret[N]: return water temperature enterring a specific boiler (i.e. the boiler inlet water temperature) (degC)
supp: common supply water temperature on the primary circulation circuit (degC)
ret[N]: supply water temperature leaving a specific boiler (i.e. the boiler outlet water temperature) (degC)
fire[N]: the firing rate of a specific boiler (original units)
pmp[N]_spd: the pump speed of a specific pump (normalized from 0-1)
pmp[N]_freq: the output frequency of the variable frequency drive serving a specific pump (original units, can typically be assumed to be Hz)
pmp[N]_pwr: the power consumption of a specific pump (W)
dp: end-of-line differential pressure (Pa)
dp_stpt: end-of-line differential pressure setpoint (Pa)
gas: natural gas consumption measured at the supply to the boiler plant (W, converted without normalizing for gas contents - i.e. assuming a therm factor of 1)
gas_u: natural gas consumption measured at the utility meter (W, converted without normalizing for gas contents - i.e. assuming a therm factor of 1)
csv/[N]_initial_data.csv, csv/[N]_hhw_system_data.csv, csv/[N]_ancilliary_data.csv
These are csv files. The data contents are the same as the .rds files, though in larger, uncompressed format, and individually disaggregated by individual building (i.e. each unique tag). Note, to reduce disk space: a) each file excludes any columns that entirely consist of NA values for a specific building and b) the ancilliary_data.csv files are in ‘wide’ vs ‘long’ format (to remove duplicate timestamps)
vars_available_by_building.csv
This is a csv (comma separated value) format file. This notes the data available (i.e. at least one value that is not NA) for each building in the dataset.
all_hours_summary_stats.csv
For convenience, this file contains statistical summary data for all of the buildings used in the paper, assessed at hourly intervals (i.e., the contents of hhw_system_data.rds plus the data from non-publicly available buildings). It first describes the start and end date and number of days spanned by the data for each building (i.e., tag). For each variable, it states the mean, standard deviation, interquartile range, skewness, min, 5th, 10th, 25th, 50th,75th, 90th,95th, max, number of NA and non-NA values, the fraction of the dataset that is an NA value.
operating_hours_summary_stats.csv
This file is the same as the previous file except that the underlying dataset being summarized is first filtered to contain only time periods where the system was estimated to be operating (i.e. oper >0).
timeseries_figs/[N]_timeseries.jpeg
These are jpeg image files. The N signifies the building tag. The files contains a timeseries visualization of the primary data for each of the publicly available buildings, showing the full range of operating state, supply and return temperature, supply to return temperature difference, flow, load, and enabled state.
General data notes
The datetime_UTC value represents the datetime in UTC timezone. Any other datetime related features (e.g. ‘hr’), or datetimes displayed on visualizations, are in local time at the building. Buildings that were provided totally anonymously use a US/Pacific timezone.
All datasets use SI units of degree celsius, liter/sec, watt, or pascal. Some visualizations also include dual units, as noted on the figure, for convenience.The ‘tag’ value represents a unique building identifier which can be used to join datasets with each other (e.g. joining the metadata to the timeseries data).
See linked open access journal publication for detail regarding how this dataset was collected and processed.