Incentivizing news consumption on social media platforms using large language models and realistic bot accounts
Data files
Jun 13, 2024 version files 7.06 MB
-
Final_Coef_Estimates_Class.csv
5.49 KB
-
Final_Coef_Estimates_PolInterest.csv
3.79 KB
-
Final_Coef_Estimates.csv
1.68 KB
-
Final_topic_classification.csv
438.68 KB
-
README.md
4.22 KB
-
UserAnalysis_Data.csv
2.46 MB
-
UserFollowingChange.csv
530.41 KB
-
UserInfo.csv
3.62 MB
Abstract
This project examines how to enhance users' exposure to and engagement with verified and ideologically balanced news in an ecologically valid setting. We rely on a large-scale two-week long field experiment on 28,457 Twitter users. We created 28 bots utilizing GPT-2 that replied to users tweeting about sports, entertainment, or lifestyle with a contextual reply containing two hardcoded elements: a URL to the topic-relevant section of quality news organization and an encouragement to follow its Twitter account. Treated users were randomly assigned to receive responses by bots presented as female or male. We examine whether our intervention enhances the following of news media organization, the sharing/liking of news content and the tweeting/liking of political content. We find that the treated users followed more news accounts and the users in the female bot treatment were more likely to like news content than the control.
https://doi.org/10.5061/dryad.7sqv9s50w
Description of the data and file structure
Dataset contains the following CSV’s:
UserInfo.csv - user account-level information
UserAnalysis_Data.csv - user behavioral data, with corresponding treatment labels
UserFollowingChange.csv - change in followed accounts, per user, pre and post experiment, for filtering
Final Coef Estimates.csv - Exported main model coefficients, for plotting figure 4
Final Coef Estimates PolInterest.csv - Exported political interest model coefficients, for plotting figure 5
Final Coef Estimates Class.csv - Exported users topic classes model coefficients, for plotting appendix models
Final_topic_classification.csv - user topic classifications, for merging in topic class models.
The individual files contain the following columns:
UserInfo.csv:
user_id - Twitter ID of user
created_at - date of account creation
listed_count - number of lists account features on
favourites_count - number of likes from account
statuses_count - number of posts from account
friends_count - number of accounts followed by user
followers_count - numbers of accounts following user
UserFollowingChange.csv:
original_user_id - Twitter ID of user
following - number of accounts followed pre-treatment
followingpost - number of accounts followed post-treatment
postdiff - difference between pre and post accounts followed
postdiffpct - difference in pct between pre and post accounts followed
UserAnalysis_Data.csv
UserIDs - Twitter ID of user
treatment - treatment group
treated - where user was treated (not control
followees_diff - post-treatment difference in number of media accounts followed
tweets_media_pct_diff - post-treatment difference in pct. retweets of media accounts
likes_media_pct_diff - post-treatment difference in pct. likes of media accounts
tweets_pol_pct_diff - post-treatment difference in pct. of political tweets
likes_pol_pct_diff - post-treatment difference in pct. of political likes
followees_pre - number of pre-treatment media accounts followed
tweets_media_pct_pre - pct. of pre-treatment retweets of media accounts
likes_media_pct_pre - pct. of pre-treatment likes of media accounts
tweets_pol_pct_pre - pct. of pre-treatment tweets political
likes_pol_pct_pre - pct. of pre-treatment likes political
followees_difftrim - post-treatment difference in number of media accounts followed (suggested accounts only)
Final_topic_classification.csv:
UserIDs - Twitter ID of user
Class - Topical classification of user (Class of tweets most often sent by user)
All the coef csv contain these variables:
Treatment - Treatment Group
Variable - Variable modelled as dependent variable
coef - coefficient estimate
se - standard error
upper - upper bound (95% CI)
lower - lower bound (95% CI)
Model - ITT or Treated Models
UserType - One of Sports, Entertainment or Lifestyle in the ‘Final Coef Estimates Class.csv’ file and between ‘High Interest’ (Political) or ‘Low Interest’ (Political) in the ‘Final Coef Estimates PolInterest.csv’ file.
Function- This refers to the “Score Test” being performed in the Regression Models.
Some cells in the files contain ”N/A”, which indicates that those cells were not applicable for that user or that information was missing for that user.
Sharing/Access information
Data was derived from the following sources:
- Twitter API and Tweepy
Code/Software
- Tweepy docs: https://docs.tweepy.org/en/stable/
- Our bot’s github: https://github.com/ercexpo/EXPO-TwitterBot
- Other important resources used for our experiment: https://github.com/HadiAskari/TwitterBot-Resources
Collected via Twitter API and the Python Tweepy library. Contains raw files from our pre and post metrics and also contains our final metrics after all of the classifications (politics and news).