Employee airline travel preferences survey data, UC Davis GreenFLY project
Data files
Dec 07, 2019 version files 2.57 MB
Abstract
This survey data is from a study exploring the potential to promote lower-emissions air travel by providing consumers with information about the carbon emissions of possible flight choices in the context of online flight search and booking. We surveyed over 450 faculty, researchers, and staff at the University of California, Davis, and asked them to choose among hypothetical flight options for a domestic and an international university-related business trip. Emissions estimates for different flight alternatives were displayed as prominently as price; this simple intervention has been promoted in several demonstration projects, including GreenFLY, a demo we created at UC Davis.
Methods
The flight choice experiment involved an online survey in which UC Davis employees were asked to make a series of binary discrete choices between roundtrip flight alternatives, that varied in terms of cost, carbon emissions, layovers (0 or 2: one layover each way), and airport (SMF or SFO), for two hypothetical UC Davis-related business trips, one to Washington, DC and the other to London. We based these hypothetical scenarios (trip destinations and attribute levels of flight alternatives) on data about actual UC Davis employee air travel.
For the layover flight alternatives, we created eight possible cost-carbon combinations, using each cost level and each carbon level twice, and not repeating any pairing. There are many ways to do this, and we chose one which tended to pair high cost with low carbon, to create trade-offs. Our eight layover flights to DC appear in Table 2. The same cost-carbon pairings were used for layover flight alternatives from SFO.
We organized the flight alternatives into sets of two for the choice experiment questions. Criteria for pairing flight alternatives were as follows:
1. Every flight alternative should appear roughly the same number of times in the survey,
2. The distribution of kinds of flights in the questions (eg. layover out of SFO) should match the distribution in the entire set,
3. Avoid questions in which the two flights have the same cost, or the same carbon, and
4. Focus on pairs that might have competitive utility (e.g. an alternative that is lower cost, lower carbon, nonstop and out of SMF is likely to be selected in most cases, so it is not useful for understanding potential trade-offs).
From this, we created seven "buckets" of questions for Washington, and seven for London, and asked each participant a randomly-chosen question from each bucket. We made an error in the online questionnaire-design software, which caused a random bucket to be skipped for London. Nonetheless, each flight option appears freqently (between 40 and 120 times) in the questions we asked.
The original data output from Qualtrics was processed into a format suitable for processing with the mlogit package in R.
Usage notes
The R file we used for analysis is included. It is called ModelsFromSurveyData.R
The data file is called Organized_Survey_Data.csv
There are two rows in the file for each question presented to a user, one for each of the two alternative flights the question asked the user to compare.
The columns in the file are:
rowIndex - The user's ID from the Qualtrics data. In the original data each user's entire questionaire was one long row.
columnIndex - The question column from the Qualtrics data. Each question was a column in the original data.
QID - The question ID in the Qualtrics file for each question. These should be in 1:1 correspondence with the columnIndex values.
Code - The descriptive name of the specific flight corresponding to this row.
L.N - Layover or non-stop
Carbon - CO2 emissions for the flight, in pounds
Cost - cost in dollars
Airport - SMF or SFO
Choice - the flight the user chose has Choice=1, the other has Choice=0
perferred_airport - SFO or SMF. About 10% of participants live in the Bay Area and would prefer to fly out of SFO.
Other columns included other demographic data, which hopefully are self-explanatory.