Skip to main content

Simulated cannabis days-of-use data

Cite this dataset

Chambers, Mark; Drovandi, Christopher (2022). Simulated cannabis days-of-use data [Dataset]. Dryad.



  • The numbers of days that people consume alcohol and other drugs over a fixed time interval, such as 28 days, are often collected in surveys for research in the addictions field.
  • The presence of an upper bound on these variables can result in response distributions with "ceiling effects".
  • Also, if some peoples’ substance use behaviors are characterized by various weekly patterns of use, summaries of substance days-of-use over longer periods can exhibit multiple modes. Multiple modes can also result from "heaping" of responses when respondents are unsure about the precise value.
  • These characteristics of substance days-of-use data mean that models assuming common parametric response distributions will not always provide a good fit.

Repository contents:

Simulate longitudinal cannabis days-of-use over 28-day intervals intended to reproduce characteristics of data reported by respondents to an Australian survey of illicit drug users run over 4 waves during the COVID-19 pandemic in Australia in 2020–21. The dataset includes generated subject_id and survey_wave and iso explanatory variables, where iso is a dummy variable indicating subjects that were in quarantine or isolation at the time of the 28-day interval.

R-code to fit proportional-odds and continuation-ratio ordinal models as well as binomial, beta-binomial, negative binomial and hurdle negative binomial models to these data are available at a linked companion website.


We fitted a Bayesian multinomial model to reported cannabis days-of-use over four 28-day intervals (four survey waves) during the COVID-19 pandemic in Australia. Cannabis days-of-use was modeled as a nominal categorical variable with 29 levels, one for each possible response (0 days, 1 day, ..., 28 days).

The model, fitted to responses by 443 illicit drug users across four survey waves, included only survey wave and isolation status (in isolation or quarantine yes/no) as explanatory variables with subject_id as a random intercept.

A simulated sample of 600 participants was generated by twice subsampling 300 subject_ids without replacement from the full set of 443. Most participants will have been selected in both subsamples.

A single cannabis days-of-use was simulated for 2 subsamples x 300 subject_ids x 4 survey waves = 2400 28-day intervals. The cannabis days of use simulated response was generated by a single draw from the posterior predictive distribution for each subsample.

The survey wave and isolation explanatory variables and subject_id are included in the supplied dataset. Survey participants are not identifiable.

Usage notes

The data are provided in an R dataset, synthetic_cannabis_use.RData.

In order to run R code accompanying the dataset, the Rstan software package also needs to be installed.


Department of Health