Skip to main content
Dryad logo

Clinical trial generalizability assessment in the big data era: a review


He, Zhe et al. (2020), Clinical trial generalizability assessment in the big data era: a review, Dryad, Dataset,


Clinical studies, especially randomized controlled trials, are essential for generating evidence for clinical practice.  However, generalizability is a long-standing concern when applying trial results to real-world patients.  Generalizability assessment is thus important, nevertheless, not consistently practiced.  We performed a systematic scoping review to understand the practice of generalizability assessment.  We identified 187 relevant papers and systematically organized these studies in a taxonomy with three dimensions: (1) data availability (i.e., before or after trial [a priori vs a posteriori generalizability]), (2) result outputs (i.e., score vs non-score), and (3) populations of interest.  We further reported disease areas, underrepresented subgroups, and types of data used to profile target populations.  We observed an increasing trend of generalizability assessments, but less than 30% of studies reported positive generalizability results.  As a priori generalizability can be assessed using only study design information (primarily eligibility criteria), it gives investigators a golden opportunity to adjust the study design before the trial starts.  Nevertheless, less than 40% of the studies in our review assessed a priori generalizability.  With the wide adoption of electronic health records systems, rich real-world patient databases are increasingly available for generalizability assessment; however, informatics tools are lacking to support the adoption of generalizability assessment practice.


We performed the literature search over the following 4 databases: MEDLINE, Cochrane, PychINFO, and CINAHL. Following the Institute of Medicine’s standards for systematic review and Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA), we conducted the scoping review in the following six steps: 1) gaining an initial understanding about clinical trial generalizability assessment, population representativeness, internal validity, and external validity, 2) identifying relevant keywords, 3) formulating four search queries to identify relevant articles in the 4 databases, 4) screening the articles by reviewing titles and abstracts, 5) reviewing articles’ full-text to further filter out irrelevant ones based on inclusion and exclusion criteria, and 6) coding the articles for data extraction.

Study selection and screening process

We used an iterative process to identify and refine the search keywords and search strategies. We identified 5,352 articles as of February 2019 from MEDLINE, CINAHL, PychINFO, and Cochrane. After removing duplicates, 3,569 records were assessed for relevancy by two researchers (ZH and XT) through reviewing the titles and abstracts against the inclusion and exclusion criteria. Conflicts were resolved with a third reviewer (JB). During the screening process, we also iteratively refined the inclusion and exclusion criteria. Out of the 3,569 articles, 3,275 were excluded through the title and abstract screening process. Subsequently, we reviewed the full texts of 294 articles, among which 106 articles were further excluded based on the exclusion criteria. The inter-rater reliability of the full-text review between the two annotators is 0.901 (i.e., Cohen’s kappa, p < .001). 187 articles were included in the final scoping review. 

Data extraction and reporting

We coded and extracted data from the 187 eligible articles according to the following aspects: (1) whether the study performed an a priori generalizability assessment or a posteriori generalizability assessment or both; (2) the compared populations and the conclusions of the assessment; (3) the outputs of the results (e.g., generalizability scores, descriptive comparison); (4) whether the study focused on a specific disease. If so, we extracted the disease and disease category; (5) whether the study focused on a particular population subgroup (e.g., elderly). If so, we extracted the specific population subgroup; (6) the type(s) of the real-world patient data used to profile the target population (i.e., trial data, hospital data, regional data, national data, and international data). Note that trial data can also be regional, national, or even international, depending on the scale of the trial. Regardless, we considered them in the category of “trial data” as the study population of a trial is typically small compared to observational cohorts or real-world data. For observational cohorts or real-world data (e.g., EHRs), we extracted the specific scale of the database (i.e., regional, national, and international). For the studies that compared the characteristics of different populations to indicate generalizability issues, we further coded the populations that were compared (e.g., enrolled patients, eligible patients, general population, ineligible patients), and the types of characteristics that were compared (i.e., demographic information, clinical attributes and comorbidities, treatment outcomes, and adverse events). We then used Fisher’s exact test to assess whether there is a difference in the types of characteristics compared between a priori and a posteriori generalizability assessment studies.


National Institute on Aging, Award: R21AG061431