Skip to main content

Data from: Understanding sentiment of national park visitors from social media data

Cite this dataset

Hausmann, Anna et al. (2020). Data from: Understanding sentiment of national park visitors from social media data [Dataset]. Dryad.


National parks are key for conserving biodiversity and supporting people´s well-being. However, anthropogenic pressures challenge the existence of national parks and their conservation effectiveness. Therefore, it is crucial to assess how people perceive national parks in order to enhance socio-political support for conservation. User-generated data shared by visitors on social media provide opportunities to understand how people perceive (e.g. preferences, feelings, opinions) national parks during nature-based recreational experiences. In this study, we applied methods from automated natural language processing to assess visitors’ sentiment when describing experiences in Instagram posts geolocated inside four national parks in South Africa. We found that visitors’ sentiment was positive, and mostly included emotions such as joy, anticipation, trust and surprise, with only a small occurrence of posts with negative feelings. Appreciation of nature, in association with a diverse set of other aspects, such as activities, geographical features and tourist attractions, was used to describe experiences related to nature, wilderness, traveling, holidays and adventures. The type of nature-based experience described by visitors was park specific, revealing different profiles of parks providing wildlife or scenery experiences. Findings support and highlight the societal role of national parks in providing visitors with opportunities to develop positive connections with nature. Social media data may be used to understand visitors’ perceptions, and how the image of national parks is constructed by users in the virtual social environment. This may help inform management for promoting a high quality tourism experience, as well as conservation marketing aimed at fostering socio-political support for national parks and their long-term conservation effectiveness.


Data collection

The dataset contains 33,213 Instagram posts obtained through the Application Programming Interface (API). Data collection was carried out 1 week/month between June 2013 and February 2016, for posts which were geolocated within the bords of four National Parks in South Africa, namely Addo Elephant (AENP), Garden Route (GRNP), Kruger (KNP) and Table Mountain (TMNP) National Parks. Only publicly available posts were accessed and users were de-identified. The dataset contains only posts which were in English language.


Data processing

Posts have been de-identified in order to protect users´privacy. For each post, text content was converted in number of words/hashtags. Normalized frequecies of single words/hashtags used across all posts per park is provided. Text content of each posts was assigned a sentiment values, a sentiment polarity and emotion values by using NRC Word-Emotion Lexicon (Mohammad & Turney 2013, Crowdsourcing a Word-Emotion Association Lexicon. Computational Intelligence, 23, 236–465). Please refer to the manuscript for further details on the methods used for classification.


Usage notes

The dataset is in .xlsx format and contains three sheets:

1- A description of the variables in the other sheets. 

2- Social media posts and classification of sentiment values, polarity and emotion values  

3- Normalized frequecies of single words/hashtags used across all posts per park.