GHCN-daily: Western United States precipitation dataset
Data files
May 08, 2024 version files 161.62 MB
-
GHCN_daily_precip_WestUS_WilliamsEtAl.zip
161.62 MB
-
README.md
3.03 KB
Abstract
This is a dataset of daily precipitation totals in the western United States (US) from January 1 1950 – February 29 2024. The dataset is based on daily precipitation measurements from 642 gauges across the western US, with periodic data gap-filling perfomed using estimates based on nearby gauges. From the daily records at the 642 gauge sites we produced a gridded record of daily precipitation totals at 0.25-degree resolution across the western US. The 642 gauges used as the basis for this dataset were selected for their uniquely long records and continuous coverage from the early 1950s through 2023. The purpose of producing this dataset was for evaluation of trends in sub-monthly cool-season (November–March) precipitation characteristics.
README: GHCN daily western United States precipitation
https://doi.org/10.5061/dryad.c866t1gfp
This dataset contains daily precipitation totals for the western United States (US) as represented by gauge data from Global Historical Climatology Network (GHCN) daily dataset.
Description of the data and file structure
This dataset contains daily precipitation totals for the western United States (US) as represented by the Global Historical Climatology Network (GHCN) daily dataset. Components of the dataset are:
prec_daily.nc: This is a netcdf file file called prec_daily.nc with gridded 1/4-degree estimates of daily precipitation total from Jan 1 1950 – Feb 29 2024. There are 100 latitudes, from 28.125N to 52.875N and 108 longitudes, from 126.875W to 100.125W. There are 27,088 daily time steps, corresponding with a time vector with units of days since Jan 1 1800. Grid cells lying outside of the western US do not have values. Here, western US is defined as all parts of the continental western US that are west of the continental divide.
stninfo.txt: This is a text file with information about the 642 GHCN stations that serve as the basis of the precipitation dataset. Each row represents a station and the information provided in each row comes directly from the ghcnd-stations.txt file available from https://www.ncei.noaa.gov/products/land-based-station/global-historical-climatology-network-daily. The column labels are:
Station ID, latitude, longitude, elevation, State (if applicable), and Station name
station_data/: This is a directory containing 642 text files, each with daily data precipitation totals in units of tenths of a millimeter for one of the 642 gauges. Precipitation values of NaN indicate missing data.
Columns are:
1. Year
2. Month
3. Day
4. Cool-season year (days in Nov-Mar are assigned to the year containing Jan-Mar)
5. Original GHCN precipitation total with no gap filling
6. Precipitation total after filling with zeros when all nearby stations had zero
7. Precipitation total after gap filling with from stations ≤50 km away
8. Precipitation total after gap filling with stations 50-100 km away
9. Precipitation total after gap filling with from stations ≤50 km away but now willing to do gap filling using gap-filled values from those stations
10. Precipitation total filling remaining gaps with median of all days within 10 days of the year of the gap day
Sharing/Access information
This dataset was derived from the Global Historical Climatology Network – daily (GHCN-daily) dataset (Menne et al., 2012). The dataset is available from:
https://www.ncei.noaa.gov/products/land-based-station/global-historical-climatology-network-daily
Menne, M. J., I. Durre, R. S. Vose, B. E. Gleason, and T. G. Houston. “An Overview of the Global Historical Climatology Network-Daily Database.” Journal of Atmospheric and Oceanic Technology 29, no. 7 (2012): 897–910. https://doi.org/10.1175/JTECH-D-11-00103.1.
Methods
Considering all daily gauge records available from the Global Historical Climatology Network-Daily (Menne et al., 2012) we first compile a master database of daily precipitation totals for the 3,528 gauges within a 1° buffer of our western US study area with valid data for ≥50% of days in each of ≥20 years during calendar years 1950–2023. We then perform an initial zero-filling procedure in which missing values are replaced with zero on days when all (and ≥2) other gauges with valid data within 50-km report no precipitation. When there are <2 gauges within 50 km with valid data, but at least two gauges within 100 km with valid data, the zero-filling is repeated if all available (and ≥2) gauges within 100 km report zero precipitation.
From this master database we identify 645 “primary gauges” within the boundaries of our western US study area with particularly thorough coverage over the study period of cool seasons (November–March) 1951–2023. Long data records are critical to our assessment of trends so we set a relatively high bar for consideration as a primary gauge: valid values (including zero-filling) are necessary on ≥75% of cool-season days in each of the 8 decades of the study period (cool seasons 1951–1959, 1960–1969, 1970–1979, 1980–1989, 1990–1999, 2000–2009, 2010–2019, 2020–2023). Among these primary gauges, the mean percentage of cool-season days with valid data in the study period is 96.7% and all gauges have >84.8% daily cool-season coverage.
References
Menne, M. J., Durre, I., Vose, R. S., Gleason, B. E., & Houston, T. G. (2012). An overview of the global historical climatology network-daily database. Journal of Atmospheric and Oceanic Technology, 29(7), 897–910. https://doi.org/10.1175/JTECH-D-11-00103.1
These primary gauges are then subjected to gap-filling in which nearby gauges from the master database are used to infill missing daily precipitation totals using quantile mapping. To fill a missing value on day i at primary gauge j, a master gauge within 50 km of gauge j with a valid value on day i is considered as a potential predictor gauge as long as a number of criteria are fulfilled based on agreement between daily precipitation measurements between the two gauges during a seasonally constrained calibration period. Because spatial relationships between gauges may vary seasonally, the calibration period for day i is all days from all years 1950–2023 that are within a 3-month window centered on the month of day i. A potential predictor gauge is only considered if (1) it had ≥100 days with overlapping valid values with gauge j, (2) ≥10 of these days have co-occurring non-zero precipitation at both gauges, and (3) there is statistically significant (p<0.05) correlation between both gauges in terms of occurrences of non-zero precipitation as well as precipitation totals on days when both gauges experienced precipitation.
To test precipitation-occurrence correlation, we convert the potential predictor gauge’s calibration-period precipitation totals into quantiles based on the empirical distribution function, use these values as predictors of the probability of non-zero precipitation at gauge j, and assessed significance of the Matthew’s correlation coefficient between observed and estimated occurrence of non-zero precipitation. To test precipitation intensity correlation, we evaluate significance of the Pearson’s correlation coefficient between precipitation quantile values on co-occurring non-zero precipitation days at both gauges. Only potential predictor gauges that produce p-values below the 0.05 significance threshold for both tests are considered further and in cases when more than one potential predictor gauge is available to fill a given missing value, the gauge with the highest Pearson’s correlation for non-zero precipitation quantiles is selected. Finally, quantile mapping is used to replace the missing value on day i at gauge j with the gauge j precipitation total that corresponds with the quantile value observed at the predictor gauge.
This quantile mapping approach risks imposing a wet bias in cases when the predictor gauge has more frequent calibration-period zero-precipitation days than does the target gauge. For example, if zero precipitation corresponds to a quantile value of 0.7 at the predictor gauge but of 0.4 at the target gauge, then quantile mapping would cause the target gauge to be assigned a non-zero precipitation value each time the predictor gauge experienced zero precipitation. In such cases, we gap fill with a randomly drawn target-gauge precipitation value (including zeros) from calibration-period days when the predictor gauge experienced zero precipitation. Caveats to this approach are that it can introduce precipitation totals in cases where precipitation was not physically plausible and randomly drawing precipitation totals causes repetitions of our gap-filling exercise to yield non-identical results. Strengths are that, for gauges where missing values are clustered in time, our approach avoids imposing a wet bias for extended periods, which could induce artificial trends in precipitation frequency and intensity. In any case, randomly assigned non-zero precipitation values are rare, accounting for only 2.9% of all gap-filled values.
After the above-described gap-filling, the average daily coverage among the 645 primary gauges during the study period is 99.8%, and 82.2% of gauges have 100% coverage. To fill additional gaps, we repeat the above procedure considering predictor gauges that are 50–100 km from target gauges, which increases average daily coverage to >99.9%, with 98.6% of gauges having 100% coverage. At this point we dismiss 3 gauges that still have <99.5% of daily coverage in any of the 8 decades of the study period, leaving 642 primary gauges. To finish the gap-filling process and finalize the network of primary gauges we carry out one final gap-filling round, now allowing previously gap-filled values to inform additional gap fillings. To limit the influence of gap-fillings from distant gauges we only consider predictor gauges within 50 km. At this point, 640 gauges have 100% coverage on cool-season days. The remaining 2 gauges are in Death Valley, California, and the northwestern Great Basin in southern Oregon and are missing data for only 1 and 10 cool-season days, respectively. Given that these gauges are in sparsely instrumented areas and both have valid data for ~85% of cool-season days over 1951–2023 prior to any gap filling, we retain these gauges in our final gauge network. For each of the small number of remaining days with missing data, we assign the gauge’s climatological median daily precipitation total among all available days from 1950–2024 that are within 10 days (in terms of day of year).
We performed sensitivity tests by varying a number of the parameters involved in our gap-filling approach descried above, including the search radius considered to identify gap-filling locations, the seasonal window length used for calibration, the number of overlapping days required, and the number of co-occurring days with precipitation. We found that applying large variations (doubling or halving) to these parameters had negligible effects on the regional precipitation metrics we evaluated.
Finally, we grid these gauge records of daily precipitation total to 0.25° geographic resolution across the western US by calculating the average daily precipitation record across all gauges within each 0.25° grid cell. When no gauges are within a 0.25° grid-cell we iteratively expand the grid-cell boundary used to search for gauges by 0.025° in all directions until at least one gauge lies within the expanded boundary. We then bias correct the 0.25° totals by multiplying such that the 1951–2023 monthly climatologies of mean daily precipitation total match those of the monthly version of the National Oceanic and Atmospheric Administration (NOAA) nClimGrid dataset (Vose et al., 2014), which we upscaled to 0.25° from its native 1/24°. Our scaling approach improves our representation of the mean seasonal cycle as well as spatial heterogeneity in mean climate by taking advantage of the denser gauge network Vose et al. (2014) used to produce the monthly nClimGrid. Because we only scale to match the climatology but not variability of nClimGrid, this approach does not introduce artificial agreement with nClimGrid in terms of the interannual variations or trends in cool-season precipitation totals, frequencies, or intensities that our study is concerned with. We also confirm that the time series of regional precipitation frequency and intensity anomalies are nearly identical if the scaling is not applied. Notably, a similar scaling approach is also used in production of the daily nClimGrid (Durre et al., 2022) and daily PRISM (Daly et al., 2021) datasets, though rather than scaling to match a monthly climatology these datasets are scaled such that their monthly sums equal those of their corresponding monthly products each month. Thus our approach of scaling to simply match climatology is relatively conservative.
References
Daly, C., Doggett, M. K., Smith, J. I., Olson, K. V., Halbleib, M. D., Dimcovic, Z., et al. (2021). Challenges in observation-based mapping of daily precipitation across the conterminous United States. Journal of Atmospheric and Oceanic Technology, 38(11), 1979–1992. https://doi.org/10.1175/JTECH-D-21-0054.1
Durre, I., Arguez, A., Schreck III, C. J., Squires, M. F., & Vose, R. S. (2022). Daily high-resolution temperature and precipitation fields for the contiguous United States from 1951 to present. Journal of Atmospheric and Oceanic Technology, 39(12), 1837–1855. https://doi.org/10.1175/JTECH-D-22-0024.1
Menne, M. J., Durre, I., Vose, R. S., Gleason, B. E., & Houston, T. G. (2012). An overview of the global historical climatology network-daily database. Journal of Atmospheric and Oceanic Technology, 29(7), 897–910. https://doi.org/10.1175/JTECH-D-11-00103.1
Vose, R. S., Applequist, S., Squires, M., Durre, I., Menne, M. J., Williams Jr, C. N., et al. (2014). Improved historical temperature and precipitation time series for US climate divisions. Journal of Applied Meteorology and Climatology, 53(5), 1232–1251. https://doi.org/10.1175/JAMC-D-13-0248.1