Why cannot long-term cascade be predicted? Exploring temporal dynamics in information diffusion processes
Data files
Sep 09, 2021 version files 14.56 MB
-
Code.ipynb
-
data.zip
-
README.txt
Abstract
Predicting information cascade plays a crucial role in various applications such as advertising campaigns, emergency management, and infodemic controlling. However, predicting the scale of an information cascade in a long-term could be difficult. In this study, we take Weibo, a Twitter-like online social platform, as an example, exhaustively extract predictive features from the data, and use a conventional machine learning algorithm to predict the information cascade scales. Specifically, we compare the predictive power (and the loss of it) of different categories of features in short-term and long-term prediction tasks. Among the features that describe the follower-followee network, retweet network, tweet content, and early diffusion dynamics, we find that early diffusion dynamics are the most predictive ones in short-term prediction tasks but lose most of their predictive power in long-term tasks. In-depth analyses reveal two possible causes of such failure: the bursty nature of information diffusion and feature temporal drift over time. Our findings further enhance the comprehension to information diffusion process and may assist in the control of such process.
Methods
This dataset was provided by DataCastle, a data science company in China.