A real-world energy management data set from a smart company building for optimization and machine learning
Data files
Nov 26, 2024 version files 102.93 GB
-
data.zip
102.61 GB
-
issues.zip
2.67 MB
-
plots.zip
1.75 MB
-
README.md
15.04 KB
-
reduced_data.zip
317.26 MB
Feb 26, 2025 version files 103.42 GB
-
data.zip
103.09 GB
-
issues.zip
2.83 MB
-
plots.zip
1.43 MB
-
README.md
15.22 KB
-
reduced_data.zip
320.16 MB
Abstract
We present a real-world data set obtained from monitoring a smart company building over the course of six years. The data set describes the energy consumption of various sites within the building, energy production via a photovoltaic system and a combined-heat-and-power plant, and the detailed operation of the heating and cooling system. The data set further contains measurements from an on-site weather station for the same time period. The data set covers periods of normal operation before the onset of the Covid-19-pandemic, periods of reduced operation during, and after, the pandemic. We describe the recording, processing, and curation strategy to generate the data set. The data set enables the application of a wide range of methods in the domain of energy management, including optimization, modelling, and machine learning to optimize building operations and reduce costs and carbon emissions.
https://doi.org/10.5061/dryad.73n5tb363
Description of the data and file structure
The presented data set contains measurements from electricity meters, heat and cooling meters and the weather station from a medium size company, the Honda R&D Europe facility located in Offenbach am Main, Germany.
The data set contains measurements from January 1, 2018 0:00 GMT+1 until January 1, 2024 0:00 GMT+1. Note that the facility is located in Offenbach, Germany, hence the local timezone is Europe/Berlin, which corresponds to GMT+2 during the European daylight savings period, and GMT+1 in winter.
As electricity meters, ABB-B24, Janitza UMG 96 RM-E, Janitza UMG 96 PA MID+, as well as Socomec DIRIS I35, I45 and S135 meters are installed in the facility. Heating and cooling is metered using SensorStar 2/2U meters. Weather measurements are collected from a Lufft WS501-UMB weather station.
During the recording time span, a multitude of issues occurred that affected the collected data, like measurement outages, maintenance, and device replacements. In order to produce a consistent and research-grade data set, these issues need to be addressed and corrected. We apply a cleaning and post-processing pipeline to the data, which consists of seven steps:
- Specification and detection of issues with rule-based detection mechanism
- Data harmonization to ensure consistency in naming and sign convection
- Application of issue correction
- Time alignment of all measurements
- Resampling into equidistantly sampled time series (1 min, 15 min, 1 h)
- Calculation of missing dependent measurements
- Export the time series in gzip-compressed CSV files
Furthermore, based on the corrected and resampled time series, we provide a reduced dataset. It consists of a less complex representation of the building energy consumption, production of both electricity, heating and cooling, as well as weather measurements.
Files and variables
File: data.zip
Description: It contains one directory for each meter, named by its Uniform Resource Name (URN), in total 81 URNs. Each directory contains multiple time series for each measurement, which depends on the type of measurement (thermal or electrical), and meter type. There are multiple time series for each measurement at different processing steps, as gzip-compressed CSV files, namely:
URN_MEASUREMENT_raw.csv.gz
: raw, unprocessed time series (not present for measurements that are to be renamed during harmonization).URN_MEASUREMENT_harmonized.csv.gz
: time series with applied harmonization step.URN_MEASUREMENT_corrected.csv.gz
: time series with applied harmonization, issue correction, and time alignment.URN_MEASUREMENT_corrected_resampled_1min/15min/1h.csv.gz
: fully processed time series, resampled to 1 min, 15 min and 1 h sample frequency, respectively.
As an example, for the 1 min corrected and resampled total measurement power of the URN H1.Z20, its file name would be: H1.Z20_P_corrected_resampled_1min.csv.gz
.
Each file contains two columns of data. The first is datetime_utc
, which is the ISO 8601 string with time zone information that reports the measurement timestamp in UTC. The second column is the actual measurement. The measurement unit used are reported in the following table:
Meter type | Measurement | Unit | Description |
---|---|---|---|
Electrical | f | Hz | Grid frequency |
” | I₁ | A | Electric current phase L1 |
” | I₂ | A | Electric current phase L2 |
” | I₃ | A | Electric current phase L3 |
” | U₁ | V | Voltage of phase L1 |
” | U₂ | V | Voltage of phase L2 |
” | U₃ | V | Voltage of phase L3 |
” | P₁ | W | Electric power phase L1 |
” | P₂ | W | Electric power phase L2 |
” | P₃ | W | Electric power phase L3 |
” | W₁ | kWh | Energy phase L1 |
” | W₂ | kWh | Energy phase L2 |
” | W₃ | kWh | Energy phase L3 |
” | PF₁ | - | Power factor phase L1 |
” | PF₂ | - | Power factor phase L2 |
” | PF₃ | - | Power factor phase L3 |
” | P | W | Total electric power |
” | Q | var | Total reactive power |
” | PF | - | Total power factor |
” | Wᵢₙ | kWh | Electric energy consumed |
” | Wₒᵤₜ | kWh | Electric energy delivered |
” | WQᵢₙ | kvarh | Reactive energy consumed |
” | WQₒᵤₜ | kvarh | Reactive energy delivered |
” | W | kWh | Total electrical energy |
Thermal | P | W | Heating/cooling power |
” | W | kWh | Total energy |
” | Tᵥₗ | °C | Flow temperature |
” | Tᵣₗ | °C | Return temperature |
” | Tdiff | K | Temperature difference between flow and return |
” | qᵥ | L/h | Volume flow |
” | V | L | Cumulated volume |
Weather | AH | g/m³ | Absolute humidity |
” | Dc | ° (from North) | Current wind direction |
” | Dp | °C | Dew point |
” | H | kJ/kg | Specific enthalpy |
” | Iga | W/m² | Current global normal irradiance |
” | Igm | W/m² | Mean global normal irradiance (10 min moving average) |
” | Pₐ | hPa | Ambient air pressure |
” | ρ | g/cm³ | Actual air density |
” | Sₐᶜ | m/s | Current wind speed |
” | Tₐ | °C | Ambient air temperature |
” | Uₐ | % | Relative humidity of ambient air |
File: issues.zip
Description: It contains all manually and automatically detected issues for each of the meters present in the data set. The issues are stored in directories: manual
and automatic
. The files naming scheme is: URN_issues_manual/automatic.yaml
.
File: reduced_data.zip
Description: It contains a reduced dataset with a less complex representation of the building energy consumption, production of both electricity, heating and cooling, as well as weather measurements. For electricity, heating and cooling, both power (P, in W) and energy (W, in kWh) aggregations are provided. The aggregations are generated by summing the fully processed power (P) and energy (W) measurements, respectively.
For weather data, mean global normal irradiance (Igm, in W/m2) and ambient temperature (Ta, in °C) are provided.
The aggregated dataset is provided with 1 min, 15 min, and 1 h sample resolution. For each resolution, one sub-directory is provided, containing 7 gzip-compressed CSV files. Their file names, and the columns that are in each file is the following:
- filename:
electricity_P.csv.gz
Column | Unit | Description |
---|---|---|
datetime_utc |
ISO 8601 string with time-zone information | Measurement timestamp in UTC |
total |
W | Electricity power drawn from the main grid |
PV |
W | Photovoltaic (PV) electrical power generated |
CHP |
W | Combined heat and power (CHP) electrical power generated |
- filename:
electricity_W.csv.gz
Column | Unit | Description |
---|---|---|
datetime_utc |
ISO 8601 string with time-zone information | Measurement timestamp in UTC |
total |
kWh | Electricity energy drawn from the main grid |
PV |
kWh | Photovoltaic (PV) electrical energy generated |
CHP |
kWh | Combined heat and power (CHP) electrical energy generated |
- filename:
heating_P.csv.gz
Column | Unit | Description |
---|---|---|
datetime_utc |
ISO 8601 string with time-zone information | Measurement timestamp in UTC |
total |
W | Total heat power production |
CHP_heat |
W | CHP heat power production |
CHP_elec |
W | CHP electricity power production |
- filename:
heating_W.csv.gz
Column | Unit | Description |
---|---|---|
datetime_utc |
ISO 8601 string with time-zone information | Measurement timestamp in UTC |
total |
kWh | Total heat energyproduction |
CHP_heat |
kWh | CHP heat energy production |
CHP_elec |
kWh | CHP electricity energy production |
- filename:
cooling_P.csv.gz
Column | Unit | Description |
---|---|---|
datetime_utc |
ISO 8601 string with time-zone information | Measurement timestamp in UTC |
total |
W | Total cooling power production of the cooling machines |
cool_elec |
W | Electrical power consumption of the cooling machines |
- filename:
cooling_W.csv.gz
Column | Unit | Description |
---|---|---|
datetime_utc |
ISO 8601 string with time-zone information | Measurement timestamp in UTC |
total |
kWh | Total cooling energy production of the cooling machines |
cool_elec |
kWh | Electrical energy consumption of the cooling machines |
- filename:
weather.csv.gz
Column | Unit | Description |
---|---|---|
datetime_utc |
ISO 8601 string with time-zone information | Measurement timestamp in UTC |
Igm |
W/m² | Mean global normal irradiance (10 min moving average) |
Ta |
°C | Ambient air temperature |
File: plots.zip
Description: This folder contains the plots for the reduced_data
time series , in png
files. Namely: cooling, electricity, heating and weather. Each figure depicts the full 6 years of operation data, and the provided variables.
Code/software
The total provided time series data is in gzip-compressed CSV format, and has a total size of 102.61 GB (317.26 MB for the reduced data).
Each time series has a size that ranges from approx. 300 KB to 25 MB, depending on the sampling frequency (1 min to 1 h).
In principle, any software that can handle gzip-compressed CSV files could be used to access the data. We recommend to use the open source pandas
library from Python programming language.
Version Changes
-
26 November 2024: first data version
-
26 February 2025: corrected downsampling strategy to 15 min and 1 h resolution of non-cumulative measurements
During the recording time span, a multitude of issues occurred which affected the collected data, like measurement outages, maintenance and device replacements.
In order to produce a consistent and research-grade data set, these issues need to be addressed and corrected. We apply a cleaning and post-processing pipeline to the data, which consists of seven steps:
- Specification and detection of issues with rule-based detection mechanism
- Data harmonization to ensure consistency in naming and sign convection
- Application of issue correction
- Time alignment of all measurements
- Resampling into equidistantly sampled time series (1 min, 15 min, 1 h)
- Calculation of missing dependent measurements
- Export the time series in gzip-compressed CSV files
Furthermore, based on the corrected and resampled time series, we provide a reduced dataset. It consists of a less complex representation of the building energy consumption, production of both electricity, heating and cooling, as well as weather measurements.