Skip to main content

Twitter data reveal six distinct environmental personas

Cite this dataset

Chang, Charlotte; Armsworth, Paul; Masuda, Yuta (2023). Twitter data reveal six distinct environmental personas [Dataset]. Dryad.


Effective digital environmental communication is integral to galvanizing public support for conservation in the age of social media. Environmental advocates require messaging strategies suited to social media platforms, including ways to identify, target, and mobilize distinct audiences. Here, we provide – to the best of our knowledge – the first systematic characterization of environmental personas on social media. Beginning with 1 million environmental nongovernmental organization (NGO) followers on Twitter, of which 500,000 users met data quality criteria, we identified six personas that differ in their expression of 21 environmental issues. General consistency in the proportional composition of personas was detected across 14 countries with sufficiently large samples. Within the US, although the six personas varied in their mean political ideology, we did not observe that the personas split along political party lines. Our results pave the way for environmental advocates – including NGOs, public agencies, and researchers – to use audience segmentation methods like the one discussed here to target and tailor messages to distinct constituencies at speed and scale. This repository contains several tabular files that can be used to query user data from Twitter or reproduce the main results in the main text of the article.


Replication materials documentation for "Twitter data reveal six distinct environmental personas"

This replication code and dataset accompanies the manuscript linked at: I provide a description of the replication datasets below and include a SHA256 checksum that you can use to ensure the integrity of the downloaded file (please execute shasum -a 256 FILENAME in the command line to verify, or use some other utility to find the SHA256 checksum for each file).


  • MeanViewpoints.tsv contains the mean and standard error of the mean (SEM) for the issue viewpoints shown in Figure 1 for the six personas in a "long data" format
    • Columns:
      • variable: name of the environmental issue
      • mean: persona-level mean viewpoint value for that issue
      • SEM: standard error of the mean
      • Persona: abbreviated name for the six personas (SMA: Smart alecks, GEN: Generalists, STE: Stewards, CLC: Climate concerned, TEC: Technocrats, RES: Reserved)
    • SHA256: 5d540edcb39c8d7a14db315b5eaeed83689021ec43cbe16a1c7eb4467c943098
  • UserTweetIDs.txt contains one tweet ID per user of the 1+ million users in our sample. These tweet IDs can be "hydrated" and used to find the users sampled in our study.
    • TweetID: single column listing one tweet ID per user
    • SHA256: fbe1da240a5ab9d9aebac0aabbde247e6eeebfa77c5471cdb6136f45110b1111
  • EnvironmentalPundits.tsv contains the user names and IDs for the environmental pundits whose timelines were used as the data source to train the probabilistic latent Dirichlet allocation topic model.
    • Columns:
      • Screenname: User name (e.g. GretaThunberg, which you can use to navigate to
      • ID: User ID
    • SHA256: a6a987d934dea75e8ba2329820d6cfe354af0991f2bdbd4746b0f83ad6dafaa3
  • Persona_PoliticalIdeology.tsv provides the mean political ideology score for the six personas
    • Columns:
      • mean: mean political ideology score
      • SEM: standard error of the mean
      • Persona: abbreviated name for the six personas
    • SHA256: e568d9737cbd7c0b1b1ce61a6c9c8294f14a62d934446cc0d618ebf091bf1a13
  • US_geography.tsv shows the state-level ranks for each persona
    • Columns:
      • name: State name
      • Persona: abbreviated name for the six personas
      • Rank: Rank for the 50 states (+ Washington DC)
    • SHA256: 8ba7e0ca437639656e25a473c4aec281e828e59941af847d8865bf4eddf1371d


  • provides code that can be used to obtain user information from the UserTweetIDs.txt data file above to reproduce the user set in our analysis.
  • Plotting.R provides code to reproduce the plots in the main text.


Please see the main text of the article and Supplementary Information for more details on how the data were gathered and processed.

Usage notes

These data may only be used for publically accessible research and may not be used for private, for-profit use. This replication code and dataset accompanies the manuscript linked at:


David H. Smith Conservation, Research Fellowship Program