Skip to main content
Dryad logo

A vigiPoint characterisation of female versus male reports in VigiBase, the WHO global database of individual case safety reports

Citation

Watson, Sarah; Caster, Ola (2019), A vigiPoint characterisation of female versus male reports in VigiBase, the WHO global database of individual case safety reports, Dryad, Dataset, https://doi.org/10.5061/dryad.8cz8w9gk1

Abstract

General information

This data is supplementary material to the paper by Watson et al. on sex differences in global reporting of adverse drug reactions [1]. Readers are referred to this paper for a detailed description of the context in which the data was generated. Anyone intending to use this data for any purpose should read the publicly available information on the VigiBase source data [2, 3]. The conditions specified in the caveat document [3] must be adhered to.

Source dataset

The dataset published here is based on analyses performed in VigiBase, the WHO global database of individual case safety reports [4]. All reports entered into VigiBase from its inception in 1967 up to 2 January 2018 with patient sex coded as either female or male have been included, except suspected duplicate reports [5]. In total, the source dataset contained 9,056,566 female and 6,012,804 male reports.

Statistical analysis

The characteristics of the female reports were compared to those of the male reports using a method called vigiPoint [6]. This is a method for comparing two or more sets of reports (here female and male reports) on a large set of reporting variables, and highlight any feature in which the sets are different in a statistically and clinically relevant manner. For example, patient age group is a reporting variable, and the different age groups 0 - 27 days, 28 days - 23 months et cetera are features within this variable. The statistical analysis is based on shrinkage log odds ratios computed as a comparison between the two sets of reports for each feature, including all reports without missing information for the variable under consideration. The specific output from vigiPoint is defined precisely below. Here, the results for 18 different variables with a total of 44,486 features are presented. 74 of these features were highlighted as so called vigiPoint key features, suggesting a statistically and clinically significant difference between female and male reports in VigiBase.

Description of published dataset

The dataset is provided in the form of a MS Excel spreadsheet (.xlsx file) with nine columns and 44,486 rows (excluding the header), each corresponding to a specific feature. Below follows a detailed description of the data included in the different columns.

Variable: This column indicates the reporting variable to which the specific feature belongs. Six of these variables are described in the original publication by Watson et al.: country of origin, geographical region of origin, type of reporter, patient age group, MedDRA SOC, ATC level 2 of reported drugs, seriousness, and fatality [1]. The remaining 12 are described here:

  •  MedDRA HLGT (high-level group term), MedDRA HLT (high-level term) and MedDRA PT (preferred term) are defined analogously to the MedDRA SOC (system organ class) [1], only at lower levels of the MedDRA (Medical Dictionary for Regulatory Activities) hierarchy. Here, MedDRA version 20.1 has been used.
  • ATC level 3 of reported drugs is defined analogously to the variable ATC level 2 of reported drugs [1], only one step further down in the ATC (Anatomical Therapeutical Classification) hierarchy.
  • The vigiGrade completeness score is a measure of how complete each report is with respect to certain report fields useful for causality assessment [7]. The completeness score has been dichotomised into two features, 'Above or equal to 0.8' and 'Below 0.8'. The maximum possible score for an individual report is 1.0.
  • The date of VigiBase entry is simply the time when a report was entered into VigiBase. This variable is divided into 14 features that are either individual years or ranges of years.
  •  The number of reported drugs is the number of unique drugs that are coded on a report as either suspected, interacting, or concomitant. A drug is here defined as an entry at the preferred base (i.e. substance) level of the WHODRUG terminology. The variable is divided into four features: 'One drug', 'Two drugs', '3-5 drugs', and 'More than 5 drugs'.
  •  The number of reported MedDRA PTs is the number of unique MedDRA preferred terms that are coded as events on a report. This variable is divided into four features in exactly the same way as the reported drugs.
  •  A reported drug is a drug coded on a report as either suspected, interacting, or concomitant. As above, a drug is defined as an entry at the preferred base (i.e. substance) level of the WHODRUG terminology. This variable has almost 23,000 features, one for each drug that occurs in at least one female or one male report.
  •  The type of report indicates the type of individual case report. The vast majority belongs to the feature 'Spontaneous', but there are four other possible features for this variable.

The Variable column can be useful for filtering the data, for example if one is interested in one or a few specific variables.

Feature: This column contains each of the 44,486 included features. The vast majority should be self-explanatory, or else they have been explained above, or in the original paper [1].

Female reports and Male reports: These columns show the number of female and male reports, respectively, for which the specific feature is present.

Proportion among female reports and Proportion among male reports: These columns show the proportions within the female and male reports, respectively, for which the specific feature is present. Comparing these crude proportions is the simplest and most intuitive way to contrast the female and male reports, and a useful complement to the specific vigiPoint output.

Odds ratio: The odds ratio is a basic measure of association between the classification of reports into female and male reports and a given reporting feature, and hence can be used to compare female and male reports with respect to this feature. It is formally defined as a / (bc / d), where

  •  a is the number of female reports with the feature
  •  b is the number of female reports without the feature (excluding reports where the variable is missing)
  •  c is the number of male reports with the feature
  •  d is the number of male reports without the feature (excluding reports where the variable is missing).

This crude odds ratio can also be computed as (pfemale / (1-pfemale)) / (pmale / (1-pmale)), where pfemale and pmale are the proportions described earlier. If the odds ratio is above 1, the feature is more common among the female than the male reports; if below 1, the feature is less common among the female than the male reports. Note that the odds ratio can be mathematically undefined, in which case it is missing in the published data.

vigiPoint score: This score is defined based on an odds ratio with added statistical shrinkage, defined as (a + k) / ((bc / d) + k), where k is 1% of the total number of female reports, or about 9,000. While the shrinkage adds robustness to the measure of association, it makes interpretation more difficult, which is why the crude proportions and unshrunk odds ratios are also presented. Further, 99% credibility intervals are computed for the shrinkage odds ratios, and these intervals are transformed onto a log2 scale [6]. The vigiPoint score is then defined as the lower endpoint of the interval, if that endpoint is above 0; as the higher endpoint of the interval, if that endpoint is below 0; and otherwise as 0. The vigiPoint score is useful for sorting the features from strongest positive to strongest negative associations, and/or to filter the features according to some user-defined criteria.

vigiPoint key feature: Features are classified as vigiPoint key features if their vigiPoint score is either above 0.5 or below -0.5. The specific thereshold of 0.5 is arbitrary, but chosen to identify features where the two sets of reports (here female and male reports) differ in a clinically significant way.

References

  1. Watson S, Caster O, Rochon PA, den Ruijter H. Reported adverse drug reactions in women and men: Aggregated evidence from globally collected individual case reports during half a decade. EClinicalMedicine 2019.
  2. Uppsala Monitoring Centre. Guideline for using VigiBase data in studies.
  3. Uppsala Monitoring Centre. Caveat document: Statement of reservations, limitations, and conditions relating to data released from VigiBase, the WHO global database of individual case safety reports (ICSRs).
  4. Lindquist M. VigiBase, the WHO Global ICSR Database System: Basic Facts. The Drug Information Journal 2008; 42(5): 409-19.
  5. Norén GN, Orre R, Bate A, Edwards IR. Duplicate detection in adverse drug reaction surveillance. Data Mining and Knowledge Discovery 2007; 14(3): 305-28.
  6. Juhlin K, Star K, Norén GN. A method for data-driven exploration to pinpoint key features in medical data and facilitate expert review. Pharmacoepidemiology and  Drug Safety 2017; 26(10): 1256-65.
  7. Bergvall T, Norén GN, Lindquist M. vigiGrade: A tool to identify well-documented individual case reports and highlight systematic data quality issues. Drug Safety 2014; 37(1): 65-77.