Skip to main content

Simulated glacier runoff to 75 global major river basins (2000-2100, 3 glacier models)

Cite this dataset

Wimberly, Finn; Ultee, Lizz (2024). Simulated glacier runoff to 75 global major river basins (2000-2100, 3 glacier models) [Dataset]. Dryad.


This dataset accompanies our study examining projected, century-scale, glacier runoff simulated by three global glacier evolution models (GloGEM, OGGM, and PyGEM). The dataset includes glacier model projections for all 75 major river basins of interest and includes four Shared Socioeconomic Pathways (SSPs) and twelve Global Climate Models (GCMs). Projections, originally provided as single glacier simulations for RGI (Randolph Glacier Inventory) regions of interest were aggregated at the basin scale using Jupyter notebooks. The dataset gives glacier runoff projections, for all three models, for each combination of GCM, SSP, and basin, with monthly resolution.

README: Processed Runoff Files

Files contain century scale runoff projections for all 75 GRDC major river basins larger than 3000 km^2 with at least 30 km^2 of glacier cover in the year 2000. For each basin there is a time series corresponding to each of the 4 implemented emission scenarios (SSPs) and each of the 12 implemented Global Climate Models (GCMs). The combination of these variables is given by the index column and is formatted as: GCM_SSP_Basin. There are two versions of the data set where the time series (the Date variable) is structured slightly differently (see methods). 

Description of the data and file structure

Data is stored as .csv files and was formatted using a pandas DataFrame. In addition to the GCM_SSP_Basin index column, each DataFrame contains variables: Date, a string of the format YEAR-MO month for which the runoff projections correspond to; GloGEM, which gives the runoff projection given my the Global Glacier Evolution Model in km^3; PyGEM, which gives the runoff projection given my the Python Glacier Evolution Model in km^3; and OGGM, which gives the runoff projection given my the Open Global Glacier Model in km^3.

Note to users: Each data set is a table consisting of more than 4 million rows (this was done as an alternative to uploading separate files for each combination of SSP, GCM, and basin). The below code can be used to read in (one of) the data sets into a dictionary of the form RF_dict[gcm][ssp][basin], where each dictionary entry contains a DataFrame with century-scale projections for a single basin given by a single SSP and GCM.

# Path to the single CSV file
all_rf_file_path = '/Your/File/Path/all_rf_data.csv'
all_rf_aligned_file_path = '/Your/File/Path/all_rf_aligned_data.csv'

# Read the CSV file
df = pd.read_csv(all_rf_file_path)

# Dictionary to store DataFrames
RF_dict = {}

# Iterate over the rows and populate the dictionary
for index, row in df.iterrows():
    # Extract GCM, SSP, and Basin from the source_file column
    gcm_ssp_basin = row['source_file'].split('_')
    gcm = gcm_ssp_basin[0]
    ssp = gcm_ssp_basin[1]
    basin = gcm_ssp_basin[2]

    # Initialize nested dictionary structure if not already done
    if gcm not in RF_dict:
        RF_dict[gcm] = {}
    if ssp not in RF_dict[gcm]:
        RF_dict[gcm][ssp] = {}
    if basin not in RF_dict[gcm][ssp]:
        RF_dict[gcm][ssp][basin] = []

    # Append the row to the corresponding DataFrame

# Convert lists of rows to DataFrames
for gcm in RF_dict:
    for ssp in RF_dict[gcm]:
        for basin in RF_dict[gcm][ssp]:
            # Convert list of rows to DataFrame
            temp_df = pd.DataFrame(RF_dict[gcm][ssp][basin])
            # Drop the 'source_file' column
            temp_df.drop(columns=['source_file'], inplace=True)
            # Set the 'Date' column as the index
            temp_df.set_index('Date', inplace=True)
            # Store the DataFrame back in the dictionary
            RF_dict[gcm][ssp][basin] = temp_df


All files were generated using Jupyter Notebooks and can be found within the repository


Glacier model projections (generated by each of the three implemented glacier evolution models) for all RGI regions of interest (and all 4 SSPs and 12 GCMs) were uploaded to Drive separately. A jupyter notebook was created for each RGI region and each glacier model. Within these notebooks regional datasets containing single glacier projections were read in from Drive and aggregated on a basin scale. Single basin dataframes were then combined into a single data frame such that the index row provides the GCM, SSP, and basin. The projections are given with monthly resolution (given by the "Date" column). There are two versions of the data. The reason for these versions is that one of the glacier models (GloGEM) provided projections for hydrologic years (such that the first provided value is for October 1999) while the other two (OGGM & PyGEM) used calendar years (such that the first projection is for January 2000). With this in mind, the first dataset "all_rf_data.csv" begins its time indexing in January 2000 for all models (ie October 1999 for GloGEM is indexed as January 2000). This data set should be used as if you are summing annual totals. The second data set "all_rf_aligned_data.csv" begins in Ocotber 1999. This aligns the months (though it also introduces zero values for the first three months of the OGGM & PyGEM projections) such that one can easily compare seasonal cycles across models. It is also important to note the while GloGEM and PyGEM provided projections through 2100, OGGM only provided projections through 2099. Thus, the final year is filled with zero values for OGGM.


Middlebury College