Skip to main content

Accuracy of identifying incident stroke cases from linked healthcare data in UK Biobank

Cite this dataset

Rannikmae, Kristiina (2021). Accuracy of identifying incident stroke cases from linked healthcare data in UK Biobank [Dataset]. Dryad.


Objective: In UK Biobank (UKB), a large population-based prospective study, cases of many diseases are ascertained through linkage to routinely collected, coded national health datasets. We assessed the accuracy of these for identifying incident strokes.

Methods: In a regional UKB sub-population (n=17,249), we identified all participants with ≥1 code signifying a first stroke after recruitment (incident stroke-coded cases) in linked hospital admission, primary care or death record data. Stroke physicians reviewed their full electronic patient records (EPRs) and generated reference standard diagnoses. We evaluated the number and proportion of cases that were true positives (i.e. positive predictive value, PPV) for all codes combined and by code source and type.

Results: Of 232 incident stroke-coded cases, 97% had EPR information available. Data sources were: 30% hospital admission only; 39% primary care only; 28% hospital and primary care; 3% death records only. While 42% of cases were coded as unspecified stroke type, review of EPRs enabled a pathological type to be assigned in >99%. PPVs (95% confidence intervals) were: 79% (73%-84%) for any stroke (89% for hospital admission codes, 80% for primary care codes) and 83% (74%-90%) for ischemic stroke. PPVs for small numbers of death record and hemorrhagic stroke codes were low but imprecise. 

Conclusions: Stroke and ischemic stroke cases in UKB can be ascertained through linked health datasets with sufficient accuracy for many research studies. Further work is needed to understand the accuracy of death record and hemorrhagic stroke codes and to develop scalable approaches for better identifying stroke types.

Usage notes

Supplementary Figure S1: Pathway for code assignment and linkages in UK Biobank 

Supplementary Figure S2: Code sources (a) and proportion of cases with codes for unspecified versus specified stroke type (b) 

Supplementary Figure S3: Stroke type and subtype distributions 

Supplementary Table S1: ICD10 and Read v2 stroke codes 

Appendix 1: Questionnaire for stroke outcome adjudication