Skip to main content
Dryad

MERRA-2 subset for evaluation of renewables with merra2ools R-package: 1980-2020 hourly, 0.5° lat x 0.625° lon global grid

Cite this dataset

Lugovoy, Oleg; Gao, Shuo (2021). MERRA-2 subset for evaluation of renewables with merra2ools R-package: 1980-2020 hourly, 0.5° lat x 0.625° lon global grid [Dataset]. Dryad. https://doi.org/10.5061/dryad.v41ns1rtt

Abstract

Renewable variable energy resources (VER) - solar and wind energy are becoming increasingly important sources of electricity worldwide. Assessing the potential and the reliability of the resources requires long-term historical data. Directly measured solar radiation and wind speed are limited to locations of weather stations, and even when available, the observations are not directly suitable for the evaluation of VERs potential (as an example, the wind speed is rarely measured at wind turbines heights).  Reanalysis data based on satellite imagery and Earth system models, such as MERRA-2 offer a broad set of long-term time series on a global grid. 
`merra2ools` is a preprocessed subset of MERRA-2 variables and a software (R-package) designed for quick estimation of hourly output of solar photovoltaics and wind turbines. The grid of the MERRA-2 dataset has 0.625° step length along longitude (- 180° to 180°) and 0.5° along latitude (- 90° to 90°), making 576 x 361 grid or 207936 locations. The subset of the hourly data covers the period from 1980-Jan-01 00:30 UTC to 2020-Jan-31 23:30 UTC. It includes eight variables: wind speed at 10- and 50-meters height (W10M and W50M), wind direction (WDIR), the atmospheric temperature at 10 meters height (T10M), surface incoming shortwave flux (SWGDN), surface albedo (ALBEDO), bias-corrected total precipitation (PRECTOTCORR), and air density at the surface (RHOA). The dataset’s key variables are date-time in Coordinated Universal Time timezone (UTC) and location identifiers (locid). In total, the subset has 290,357,084,160 data points (362,946,355,200 including the key variables). To reduce the dataset’s memory footprint (~3TB uncompressed), the original MERRA-2 variables have been rounded, scaled, and stored as integers in highly compressed data format with high speed full random access (`fst` package for R). The resulting dataset is saved in separate files by months (41 years x 12 months, 492 data-files in total). Additionally, some summary statistics such as mean values of each variable by month and location ID, annual spatial correlations with the nearest neighbors have been calculated for wind speed and solar irradiance and added to the dataset.

Methods

The `merra2ools` dataset has been assembled through the following steps:

  1. The MERRA-2 collections tavg1_2d_flx_Nx (Surface Flux Diagnostics), tavg1_2d_rad_Nx (Radiation Diagnostics), and tavg1_2d_slv_Nx (Single-level atmospheric state variables) downloaded from NASA Goddard Earth Sciences (GES) Data and Information Services Center (DISC) (https://disc.gsfc.nasa.gov/datasets?project=MERRA-2) using GNU Wget network utility (https://disc.gsfc.nasa.gov/data-access). Every of the three collections consist of daily netCDF-4 files with 3-dimensional variables (lon lat x hour). 
  2. The following variables obtained from the netCDF-4 files and merged into long-term time-series:
  • Northward (V) and Eastward (U) wind at 10 and 50 meters (V10MV50MU10MU50M, respectively), and 10-meter air temperature (T10M) from the tavg1_2d_slv_Nx collection;
  • Incident shortwave land (SWGDN) and Surface albedo (ALBEDO) from the tavg1_2d_rad_Nx collection;
  • Bias corrected total precipitation (PRECTOTCORR) and Air density at the surface (RHOA) from the tavg1_2d_flx_Nx collection.

3. The representation of wind speed and direction has been changed from Northward (V) and Eastward (U) components to the wind speed (W) and direction (DIR). Wind speed has been calculated for both 10- and 50-meters height (W10M and W50M). The wind direction is reported for 50 meters height only due to the variable’s marginal differences between the two heights. This conversion also reduces the dataset’s size and the computational burden of calculating wind power capacity factors (which require wind speed rather than direction).

4. The obtained time series have been merged, scaled, grouped by year and month, and stored in compressed “fst” file-format (file names mask: “merra2_YYYYMM.fst” where YYYY is a year, MM - two-digit number of a month).

The resulting merra2ools dataset is stored in 492 data files (12 months x 41 years) and has the following hourly time-series:

  • locid - location IDs (key variable), an index of locations in MERRA-2 dataset, from 1 to 207936, integer;
  • UTC - date and time (key variable) in Coordinated Universal Time (UTC) timezone (“POSIXct/POSIXt” format);
  • W10M - wind speed at 10-meter height (calculated sqrt(V10M2 + U10M2) where V10M and U10M are northward and eastward wind at 10-meter, m/s, rounded to the first decimal place, stored as an integer variable named “W10M.e1”, where suffix “.e1” indicates scale factor = 10);
  • W50M - wind speed at 50-meter heigh (calculated sqrt(V50M2 + U50M2) where V50M and U50M are northward and eastward wind at 50-meter, m/s, rounded to the first decimal place, stored as an integer variable named “W50M.e1”, where “.e1” indicates scale factor = 10);
  • WDIR - Direction of wind at 50-meter height (calculated atan2(V50M/U50M), converted to degrees, rounded to tens, stored as an integer variable named “WDIR.e_1”, where “.e_1” indicates scale factor = 0.1);
  • T10M - 10-meter air temperature (Celsius, converted from original Kelvin units (C = K - 273.15), rounded to the nearest integer, stored as an integer variable “T10M.C”, where “.C” indicates Celsius units);
  • SWGDN - surface incoming shortwave flux (W/m2, rounded to the nearest integer, stored as an integer variable with the same name);
  • ALBEDO - Surface albedo (index [0, 1], rounded to the second decimal place, stored as an integer variable named “ALBEDO.e2”, where “.e2” indicates scale factor = 100);
  • PRECTOTCORR - Bias corrected total precipitation (kg/m2/hour, rounded to the first decimal place, stored as an integer variable named “PRECTOTCORR.kg_m2_h.e1”, where ".kg_m2_h" indicates chages in units from the original kg/m2/s units in MERRA-2 to kg/m2/hour, and “.e1” indicates the scale factor = 10);
  • RHOA - Air density at the surface (kg/m2, rounded to the second decimal place, stored as an integer variable named “RHOA.e2”, where “.e2” indicates scale factor = 100).

All variables are hourly averages. UTC-time is given for the middle of every hour.

Additional files:

locid.RData - R-Data-file with `data.frame` of MERRA-2 locations identification, which contains the following fields:

  • lon - longitude of the location;
  • lat - latitude of the location;
  • locid - location ID, a variable created based on longitude and latitude grid indexes (V1 and V2), from 1 to 207936;
  • V1 - longitude location index, from the MERRA-2 database netCDF-4 files;
  • V2 - latitude location index, from the MERRA-2 database netCDF-4 files.

locid_neighbr.RData - R-Data-file with `data.frame` of locations identification of first neighbors for every location on the grid:

  • locid - location ID of a cell of the grid for which neighbors are reported in the following columns;
  • N - locid of the closest Northern neighbor;
  • NE - locid of the closest North-Eastern neighbor;
  • E - locid of the closest Eastern neighbor;
  • SE - locid of the closest South-Eastern neighbor;
  • S - locid of the closest Southern neighbor;
  • SW - locid of the closest South-Western neighbor;
  • W - locid of the closest Western neighbor;
  • NW - locid of the closest North-Western neighbor.

merra2_stat_ym_mean.fst - monthly means by location for each of the eight variables.

merra2_stat_ym_sd.fst - monthly standard deviations by location for each of the eight variables.

merra2_stat_y_W50M_nbcor.fst - annual spatial autocorrelation of W50M in locid with every of the first neighbors (N, NE, E, SE, S, SW, W, NW) and the average across all neighbors (Z).

merra2_stat_y_SWGDN_nbcor.fst - annual spatial autocorrelation of SWGND in locid with every of the first neighbors (N, NE, E, SE, S, SW, W, NW) and the average across all neighbors (Z).

Usage notes

The straightforward way of working with the dataset is with the `merra2ools` package that offers functions to read, automatically rescale, merge, and subset the data for particular locations and time-periods. Alternatively, the data can be accessed directly using `fst` package-function “read_fst()” and resaved in other formats. For meaningful units, the data must be rescaled (dividing scaled variables by their scale factors - see the “Methods”).

Funding

National Aeronautics and Space Administration