Skip to main content
Dryad

Why cannot long-term cascade be predicted? Exploring temporal dynamics in information diffusion processes

Cite this dataset

Cao, Ren-Meng; Liu, Xiao-Fan; Xu, Xiao-Ke (2021). Why cannot long-term cascade be predicted? Exploring temporal dynamics in information diffusion processes [Dataset]. Dryad. https://doi.org/10.5061/dryad.44j0zpccv

Abstract

Predicting information cascade plays a crucial role in various applications such as advertising campaigns, emergency management, and infodemic controlling. However, predicting the scale of an information cascade in a long-term could be difficult. In this study, we take Weibo, a Twitter-like online social platform, as an example, exhaustively extract predictive features from the data, and use a conventional machine learning algorithm to predict the information cascade scales. Specifically, we compare the predictive power (and the loss of it) of different categories of features in short-term and long-term prediction tasks. Among the features that describe the follower-followee network, retweet network, tweet content, and early diffusion dynamics, we find that early diffusion dynamics are the most predictive ones in short-term prediction tasks but lose most of their predictive power in long-term tasks. In-depth analyses reveal two possible causes of such failure: the bursty nature of information diffusion and feature temporal drift over time. Our findings further enhance the comprehension to information diffusion process and may assist in the control of such process.

Methods

This dataset was provided by DataCastle, a data science company in China. 

Funding

National Natural Science Foundation of China, Award: 61773091, 61603073

Liaoning Revitalization Talents Program, Award: XLYC1807106

Outstanding Innovative Talents of Higher Learning Institutions of Liaoning, Award: LR2016070

Outstanding Innovative Talents of Higher Learning Institutions of Liaoning, Award: LR2016070