Twitter data reveal six distinct environmental personas
Cite this dataset
Chang, Charlotte; Armsworth, Paul; Masuda, Yuta (2023). Twitter data reveal six distinct environmental personas [Dataset]. Dryad. https://doi.org/10.5061/dryad.79cnp5ht0
Abstract
Effective digital environmental communication is integral to galvanizing public support for conservation in the age of social media. Environmental advocates require messaging strategies suited to social media platforms, including ways to identify, target, and mobilize distinct audiences. Here, we provide – to the best of our knowledge – the first systematic characterization of environmental personas on social media. Beginning with 1 million environmental nongovernmental organization (NGO) followers on Twitter, of which 500,000 users met data quality criteria, we identified six personas that differ in their expression of 21 environmental issues. General consistency in the proportional composition of personas was detected across 14 countries with sufficiently large samples. Within the US, although the six personas varied in their mean political ideology, we did not observe that the personas split along political party lines. Our results pave the way for environmental advocates – including NGOs, public agencies, and researchers – to use audience segmentation methods like the one discussed here to target and tailor messages to distinct constituencies at speed and scale. This repository contains several tabular files that can be used to query user data from Twitter or reproduce the main results in the main text of the article.
README
Replication materials documentation for "Twitter data reveal six distinct environmental personas"
This replication code and dataset accompanies the manuscript linked at: https://doi.org/10.1002/fee.2510. I provide a description of the replication datasets below and include a SHA256
checksum that you can use to ensure the integrity of the downloaded file (please execute shasum -a 256 FILENAME
in the command line to verify, or use some other utility to find the SHA256 checksum for each file).
Datasets
MeanViewpoints.tsv
contains the mean and standard error of the mean (SEM) for the issue viewpoints shown in Figure 1 for the six personas in a "long data" format- Columns:
- variable: name of the environmental issue
- mean: persona-level mean viewpoint value for that issue
- SEM: standard error of the mean
- Persona: abbreviated name for the six personas (SMA: Smart alecks, GEN: Generalists, STE: Stewards, CLC: Climate concerned, TEC: Technocrats, RES: Reserved)
- SHA256:
5d540edcb39c8d7a14db315b5eaeed83689021ec43cbe16a1c7eb4467c943098
- Columns:
UserTweetIDs.txt
contains one tweet ID per user of the 1+ million users in our sample. These tweet IDs can be "hydrated" and used to find the users sampled in our study.- TweetID: single column listing one tweet ID per user
- SHA256:
fbe1da240a5ab9d9aebac0aabbde247e6eeebfa77c5471cdb6136f45110b1111
EnvironmentalPundits.tsv
contains the user names and IDs for the environmental pundits whose timelines were used as the data source to train the probabilistic latent Dirichlet allocation topic model.- Columns:
- Screenname: User name (e.g. GretaThunberg, which you can use to navigate to twitter.com/GretaThunberg)
- ID: User ID
- SHA256:
a6a987d934dea75e8ba2329820d6cfe354af0991f2bdbd4746b0f83ad6dafaa3
- Columns:
Persona_PoliticalIdeology.tsv
provides the mean political ideology score for the six personas- Columns:
- mean: mean political ideology score
- SEM: standard error of the mean
- Persona: abbreviated name for the six personas
- SHA256:
e568d9737cbd7c0b1b1ce61a6c9c8294f14a62d934446cc0d618ebf091bf1a13
- Columns:
US_geography.tsv
shows the state-level ranks for each persona- Columns:
- name: State name
- Persona: abbreviated name for the six personas
- Rank: Rank for the 50 states (+ Washington DC)
- SHA256:
8ba7e0ca437639656e25a473c4aec281e828e59941af847d8865bf4eddf1371d
- Columns:
Code
Scraper.py
provides code that can be used to obtain user information from theUserTweetIDs.txt
data file above to reproduce the user set in our analysis.Plotting.R
provides code to reproduce the plots in the main text.
Methods
Please see the main text of the article and Supplementary Information for more details on how the data were gathered and processed.
Usage notes
These data may only be used for publically accessible research and may not be used for private, for-profit use. This replication code and dataset accompanies the manuscript linked at: https://doi.org/10.1002/fee.2510.
Funding
David H. Smith Conservation, Research Fellowship Program