The Internet not only has changed the dynamics of our collective attention but also through the transactional log of online activities, provides us with the opportunity to study attention dynamics at scale. In this paper, we particularly study attention to aircraft incidents and accidents using Wikipedia transactional data in two different language editions, English and Spanish. We study both the editorial activities on and the viewership of the articles about airline crashes. We analyse how the level of attention is influenced by different parameters such as number of deaths, airline region, and event locale and date. We find evidence that the attention given by Wikipedia editors to pre-Wikipedia aircraft incidents and accidents depends on the region of the airline for both English and Spanish editions. North American airline companies receive more prompt coverage in English Wikipedia. We also observe that the attention given by Wikipedia visitors is influenced by the airline region but only for events with a high number of deaths. Finally we show that the rate and time span of the decay of attention is independent of the number of deaths and a fast decay within about a week seems to be universal. We discuss the implications of these findings in the context of attention bias.
Dynamics and Biases of Online Attention: The Case of Aircraft Crashes
This is a text file named "Readme.txt" in the dataset folder submitted for the Royal Society Open Science on May,2016.
It describes the dataset used for article "Dynamics of Online Attention: The case of Airline Crashes" by Ruth García-Gavilanes, Milena Tsvetkova and Taha Yasseri
Version of Paper 2.0 Release
The dataset is about a set of articles classified as aircraft incidents or accidents in English and Spanish Wikipedia, belonging to the categories “Aviation accidents and incidents by country” and “Aviation accidents and incidents by year” which theoretically covers all airline accidents and incidents in different countries and throughout history available in Wikipedia by December 2015.
In total we obtained 1496 articles in English Wikipedia and 488 articles in Spanish Wikipedia.
flights_en.txt and flights_es.txt have the following columns:
These files contain information about the articles and the events associated with them.
The files have the following columns
flight: Name of the article in Wikipedia (be aware that Wikipedia can redirect these articles to other names)
langs: The number of languages containing a version of the article
flight.en/ flight.es: The translated name in English (for flight_es.txt) or Spanish (for flights_en.txt)
date: The date the “event” occurred
start.date: The date the Wikipedia page was first edited
max.date: The date of maximum views in the timeline of the article
deaths: Number of deaths derived from the accident or incident
longitude/latitude: The longitude and latitude of the accident or incident (when not available we made an approximation)
company_long/company_lat: The coordinates of the country where the airline headquarters is located
aircraft_company_continent: The continent where the airline headquarters is located
aircraft_company_country: The country where the airline headquarters is located
region_event: The continent where the event/incident occurred
country_event: The country where the event/incident occurred
The folder has two subfolders: enwiki and eswiki.
Each subfolder contains subfolders with the title of each article of the dataset in English (1496) and Spanish (488). In each folder, there are files with names in format title_2008_2015.txt, the field *title* is the name of redirects or the current name of each article. For example, the directory pageviews/enwiki/1912_Brooklands_Flanders_Monoplane_crash contains two files: Monoplane_Committee_2008_2015.txt and 1912_Brooklands_Flanders_Monoplane_crash_2008_2015.txt.
The file Monoplane_Committee_2008_2015 contains viewership information about the views to Wikipedia article Monoplane_Committee from 2008 to 2015.
Each file has the following columns:
rd.views: Number of views
date: Date (yyyy-mm-dd)
Contact and scripts
Source code in R to extract page views of articles and redirects : https://github.com/Ruthygg/WikiRedirects
The pageviews are extracted from http://stats.grok.se. Data availability starts from Dec-2007. From 2015 there is an API in R devoted to extract pageviews https://github.com/ironholds/pageviews.