Skip to main content
Dryad

National impacts of e-commerce growth: Development of a spatial demand based tool

Cite this dataset

Xiao, Ivan (2022). National impacts of e-commerce growth: Development of a spatial demand based tool [Dataset]. Dryad. https://doi.org/10.25338/B89H0F

Abstract

This project aims to study the impacts of e-commerce on shopping behaviors and related externalities. The objectives are divided into five major tasks in this project. Methods used include Weighted Multinomial Logit (WMNL) models, time series forecasting, and Monte Carlo (MC) simulations. The American Time Use Survey (ATUS) and the National Household Travel Survey (NHTS) databases are used for identifying the independent and dependent variables for behavioral modeling. At the same time, we collected all MSA population data from the U.S. Census Bureau and combined the shares of each variable from ATUS to generate a synthesized population, which serves as input into the MC simulation framework together with the behavioral model. This simulation framework includes the generation of shopping travel parameters and the calculation of negative externalities. We do this to estimate e-commerce demand and impacts every decade until 2050. The results and analyses provide information that supports the generation of shopping travel and the estimations of a series of negative externalities using MC simulation, which includes shopping travel parameters, last-mile delivery parameters, and emission rate per person. For different parameters, a unique probability distribution or a regression relation is obtained for different MSAs, and this distribution is fed into the subsequent MC simulation. Finally, we simulated shopping behaviors for synthesized populations (until 2050) and estimated the expected negative externalities. The MC simulation generates aggregate average vehicle miles traveled (VMT) and emissions (negative externalities) for different shopping activities in the planning years and different MSAs.

Methods

The tasks of this project employ different combinations of methods to enable the prediction of e-commerce shopping behaviors for each MSA of interest at the individual level as well as the quantitative calculation of externalities. Methods used include Weighted Multinomial Logit (WMNL) models, time series forecasting, and Monte Carlo (MC) simulations, which are utilized throughout Task 1 to Task 5. 

In Task 1, we mainly build and validate the WMNL behavior models for different MSAs with specific sets of model coefficients that can be used to predict shopping behavior for a synthesized population. In the WMNL mode, the dependent variable with totally four categories, namely “No shopping”, “In-store shopping”, “Online shopping” and “Both shopping”. The results of the WMNL models vary across MSAs, as reflected by the fact that different coefficients of variables are positive in some MSAs and negative in others. In general, however, female, high education, low to moderate age group, and not in labor market are the positive influences that make the respondents choose the online and/or both shopping. Four different population growth scenarios are specified with the combinations of high/moderate IV market share time series prediction and projected population. Also, the models are validated by the synthesized populations for the planning years, resulting in around 2% in the errors of dependent variable market share predictions.

Tasks 2, 3 and 4 provide information that supports the generation of shopping travels and the calculation of a series of negative externalities in Task 5 using Monte Carlo simulation, which is shopping travel parameters, last-mile delivery parameters and emission rate per person, respectively. For different parameters, a unique probability distribution or a regression relation is obtained for different MSAs, and this distribution is fed into the subsequent MC simulation. 

Finally, Task 5 is performed to serve the goal of the project: to simulate shopping behaviors for a synthesized population and to calculate related negative externalities. The MC simulation process is finalized by utilizing the results from Task 1 to 4, where the outputs of this part are the aggregate average VMT and emissions (negative externalities) for different shopping activities in the planning years and different MSA. This aggregate simulation results mainly come from calculations of VMT and emissions of the datasets of synthesized populations for different planning years and population growth scenarios. 

Usage notes

These data are from multiple sources in order to support the project titled “National Impacts of E-commerce Growth: Development of a Spatial Demand Based Tool”, funded by the National Center for Sustainable Transportation (NCST). The purpose of this project is to study the impacts of e-commerce on consumers’ shopping behaviors and the related externalities. Methods used include Weighted Multinomial Logit (WMNL) models, time series forecasting, and Monte Carlo (MC) simulations. 

This project makes use of three primary datasets: 

1. American Time Use Survey (ATUS)

The project uses the 2004-2020 ATUS data to analyze shopping behaviors. The use of ATUS data is mainly for specifying shopping behavior models and extracting variables for the six chosen metropolitan areas.

The ATUS data can be accessed at: https://timeuse.ipums.org/ 

2. National Household Travel Survey (NHTS)

The project uses the 2009 and 2017 NHTS data, which are based on trip-based surveys, to extract shopping travel parameters and last-mile delivery parameters for the six chosen metropolitan areas. Extracting shopping tours from NHTS requires identifying trip chains; we developed scripts to convert the raw trip-based data to tour-based data (refer to the code related to this project).

The NHTS data can be accessed at: https://nhts.ornl.gov/ 

3. population projections

The population projection data are produced in five-year increments from 2020 through 2100. These files are provided in .csv format. This project uses the data from 2020 through 2050. 

The projections are provided as totals, or segmented by age category (in four year increments), race/ethnicity and sex. 

Note that the population projections are publicly available at the following DOI: https://doi.org/10.17605/OSF.IO/9YNFC 

4. Individual Income Tax ZIP Code Data

The Individual Income Tax ZIP Code data show selected income and tax items classified by State, ZIP Code, and size of adjusted gross income. Data are based on individual income tax returns filed with the IRS. 

The Tax ZIP Code data can be accessed at: https://www.irs.gov/statistics/soi-tax-stats-data-by-geographic-area 

5. MOVES Emission Rates

The emission rates are compiled from EPA’s Motor Vehicle Emission Simulator (MOVES) model for the six chosen metropolitan areas and planning years (2020-2050). 

The emission estimates are stored in MOVES_Emission_Rates.xlsx. 

6. ZIP code level geographic data

These data contain geographic boundaries for the six chosen metropolitan areas, as well as the socio-demographic data for each of the ZIP codes. 

These data are stored in multiple files in different formats (specified below) in the folder geographic_data/. 

Funding

National Center for Sustainable Transportation, Award: USDOT Grant 69A3551747114