US EPA Twitter posts from 2013-2014 and 2017-2018
Data files
May 07, 2025 version files 282.54 KB
-
README.md
2.55 KB
-
us_epa_coded_data_whole_sample_notext.csv
280 KB
Abstract
This dataset refers to posts made by the US EPA on Twitter (https://twitter.com/epa) during the years 2013-2014 and 2017-2018. Twitter (now X) was selected as one of the major social media platforms that government agencies use to communicate with the public. This data was analyzed for several discourse categories (e.g., speech acts, sentiment, expressives, actors, etc.) in a content analysis described in publications associated with this dataset. As of March 29, 2019, the EPA Twitter account had produced over 15,000 tweets and had over 595,000 followers. For reference, as of March 14, 2024, the account had about 623,000 followers. This data refers to a sample of all posts (not retweets and replies) made during the first 16 months of Administrator Gina McCarthy and the first 16 months of Administrator Scott Pruitt.
https://doi.org/10.5061/dryad.573n5tbhx
Description of the data and file structure
This data was collected for content analysis of message characteristics, including several discourse categories such as "speech acts", "expressive" clauses, positive information, negative information, actors, scientific content, and others. This was intended to understand the language of government social media posts and the relationship with governmental and political priorities. Text is removed from this file due to copyright restrictions.
Files and variables
File: us_epa_coded_data_whole_sample_notext.csv
Description:
Variables
- index: post identifier
- date: date the post was created
- text: full text of post (removed)
- directive: post has at least one directive clause
- rhetorical: post has at least one rhetorical clause (aka "question prompt")
- participatory: post has at least one participatory clause (e.g., asking for citizen feedback or input)
- commitment: post has at least one commitment clause
- expressive: post has at least one expressive clause
- representative: post has at least one representative clause that was not coded in any other category.
- agency: post refers to the EPA agency or one of its programs
- govt: post refers to another federal government agency
- external: post refers to an actor external to the federal government (e.g., state government, news agency, etc.)
- topic_terms: topic terms that define the message
- statistics: any reference to statistical information
- scientific: any reference to scientific or causal information
- negative_sentiment: mark 1 if any positive sentiment is associated with the whole message
- positive_sentiment: mark 1 if any negative sentiment is associated with the whole message
- self-evaluation: positive (1), negative (-1), or neutral (0) sentiment associated with the agency
- list_order: sub-index
- list_order_unique: additional index for sorting
- date_original: unclear
- index_original: unclear
- retweets: retweets the post received
- favorites: likes the post received
- text_original: original text of the post (removed)
- geo: geolocation of the post
- mentions: Twitter users mentioned in the post
- hashtags: hashtags in the post
- permalink: permalink for the post
- admin: administrator of the agency during the period of the post
Access information
Data was derived from the following sources:
- Twitter API
This data was collected in 2019 via the Twitter API. It contains references to posts (e.g., date post was made, link to post, etc.). It also contains coding of discourse categories (e.g., speech acts, sentiment, expressives, actors, etc.) performed in a content analysis of the messages as described in publications associated with this dataset. However, no text is provided here due to copyright restrictions.
For research purposes, this data refers to a sample of all posts (not retweets and replies) collected from the EPA account. The population of posts was focused on the first 16 months of the administrator Gina McCarthy—a total of 4,765 posts from July 18, 2013, to December 14, 2014 (504 days); and the first 16 months of administrator Scott Pruitt, a total of 951 posts from February 17, 2017, to July 6, 2018 (504 days).
From this data, we selected a random sample of 25% of the posts of the McCarthy full population of posts and 50% of the Pruitt full population of posts for a multi-annotator content analysis. This dataset here is a sample of 950 posts under the McCarthy administration and 471 posts under the Pruitt administration that were coded based on instructions developed for content analysis as described in related publications of this dataset.
