Skip to main content
Dryad

Catalan Referendum Twitter corpus

Cite this dataset

Jiménez-Zafra, Salud María; Martín-Valdivia, María Teresa; Sáez-Castillo, Antonio José; Conde-Sánchez, Antonio (2021). Catalan Referendum Twitter corpus [Dataset]. Dryad. https://doi.org/10.5061/dryad.stqjq2c24

Abstract

This corpus consists of 46,962 tweets related to the Catalan referendum, a very controversial topic in Spain due to it was an independence referendum called by the Catalan regional government and suspended by the Constitutional Court of Spain after a request from the Spanish government. All the tweets were downloaded on October 1, 2017 with the hashtags #CatalanReferendum or #ReferendumCatalan. Later, we collected features of these tweets on October 31, 2017 in order to analyze their virality. Each item in this collection is made up of the features we used from each tweet to perform the virality analysis:

  • lang: Tweet language.
  • retweet_count: Total number of retweets recorded for a given tweet.
  • favourite_count: Total number of favourites recorded for a given tweet.
  • is_quote_status: Whether a tweet includes a quote of another tweet.
  • num_hashtags: Total number of hashtags in the tweet.
  • num_urls: Total number of URLs in the tweet.
  • num_mentions: Total number of users mentioned in the tweet.
  • interval_time: Interval of the day on which the tweet was published (morning (06:00-12:00), afternoon (12:00-18:00), evening (18:00-00:00) or night (00:00-06:00)).
  • positive_words_iSOL: Total number of positive words found in the tweet using iSOL lexicon.
  • negative_words_iSOL: Total number of negative words found in the tweet using iSOL lexicon.
  • positive_words_NRC: Total number of positive words found in the tweet using NRC lexicon.
  • negative_words_NRC: Total number of negative words found in the tweet using NRC lexicon.    
  • positive_words_mlSenticon: Total number of positive words found in the tweet using ML-SentiCon lexicon.
  • negative_words_mlSenticon: Total number of negative words found in the tweet using ML-SentiCon lexicon.
  • verified_user: Whether the tweet is from a verified user.
  • followers_count_user: Total number of users who follow the author of a tweet.
  • friends_count_user: Total number of friends that the author is following.
  • listed_count_user: Total number of lists that include the author of a tweet.
  • favourites_count_user: Total number of favourited tweets by a user.
  • statuses_count_user: Total number of tweets made by the author since the creation of the account.

Funding

LIVING-LANG project from the Spanish Government, Award: RTI2018-094653-B-C21

Regional Government of Andalusia, Award: DOC_01073

European Commission

Ministerio de Educación Cultura y Deporte, Award: FPU014/00983

European Commission