Skip to main content

Data from: Social media and Bitcoin Metrics: which words matter

Cite this dataset

Burnie, Andrew; Yilmaz, Emine (2019). Data from: Social media and Bitcoin Metrics: which words matter [Dataset]. Dryad.


We develop a new Data-Driven Phasic Word Identification (DDPWI) methodology to determine which words matter as the bitcoin pricing dynamic changes from one phase to another. With Google search volumes as a baseline, we find that Reddit submissions are both correlated with Google and have a comparable relationship with a variety of bitcoin metrics, using Spearman’s rho. Reddit provides complete access to the text of submissions. Rather than associating sentiment with market activity, we describe the DDPWI method for finding specific ‘price dynamic’ words associated with changes in the bitcoin pricing pattern through 2017 and 2018. We assess the significance of these changes using Wilcoxon Rank-Sum Tests with Bonferroni corrections. These price dynamic words are used to pull out associated words in the submissions thereby providing the context to their use. For example, the price dynamic word ‘ban’, which became significantly higher in frequency as prices fell, occurred in the context of both government regulation and internet companies banning cryptocurrency adverts. This approach could be used more generally to look at social media and discussion forums at a granular level identifying specific words that impact the metric under investigation rather than overall sentiment.

Usage notes