Quantifying representativeness in RCTs using ML fairness metrics - Data and codes
Data files
Sep 02, 2021 version files 61.42 KB
Abstract
The "Quantifying representativeness in RCTs using ML fairness metrics - Data and codes" is used to quantify representativeness in randomized clinical trials (RCTs) and provide insights to improve the clinical trial equity and health equity. We developed RCT representativeness metrics based on Machine Learning (ML) Fairness Research. Visualizations and statistical tests based on proposed metrics enable researchers and physicians to rapidly visualize and assess subgroup representation in RCTs. The approach enables users to determine underrepresentation, absence, or other misrepresentation of subgroups indicating potential limitations of RCTs. The method could help support generalizability evaluation of existing RCT cohorts, enrollment target decisions for new RCTs (if eligibility criteria are included), and monitoring of RCT enrollment, ultimately contributing to more equitable public health outcomes. We apply the proposed RCT representativeness metrics to three landmark clinical trials released in the last decade: Action to Control Cardiovascular Risk in Diabetes (ACCOD), Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial (ALLHAT), and Systolic Blood Pressure Intervention Trial (SPRINT). This dataset contains the processed data and results for the experiments and visualization codes in the paper titled "Quantifying representativeness in randomized clinical trials using machine learning fairness metrics."
Methods
The raw NHANES and RCT datasets are downloaded directly from the websites.
The target population summary data are processed by following "NHANES Survey Methods and Analytic Guidelines" using R "haven" and "survey" packages.
The RCT sample summary data are calculated through R "count()" function.
Usage notes
All information are provided in the MIAOQI_QuantifyingRepresentativenessInRCTsUsingMLFairnessMetricsDataAndCodes_Readme.txt.