Detection and initial management of gestational diabetes through primary health care services in Morocco: a cluster randomized controlled trial Corresponding author: Bettina Utz (utzb@rki.de) The included data and program code suffice to reproduce all statistical analyses in the paper. The dataset in its initial state is in wide format, with one row per patient. Some unused, potentially identifying variables have been removed: -Height -Education level -Occupation -Method of transport -Posession of health insurance -History of hypertension -History of abortion, premature, and intrauterine fetal death -Parity (number of previous deliveries), although gestity (number of pregnancies, including the current one) is provided All calendar dates (of screenings, antenatal consultations, changes in treatment regimen, delivery and postpartum tests) have been changed to represent the number of days since the patient was diagnosed with GDM. As such, the column Date_diag is know a constant 0. Finally, two remaining identifying variables (maternal weight and age) have been masked. A single, randomly chosen integer between -5 and 5 was added to all weight measurements for all subjects. A separate integer was similarly drawn and added to maternal age. Below is the R script which performed the changes listed above: diabetes <- haven::read_dta("~/Diabetes/GDM_Version2.2a_base_data_repository.dta") # remove unused variables diabetes <- diabetes[,!names(diabetes) %in% c("Height","Med_ACDT","DM_ACDT","HT_ACDT","OBSTET_ACDT","AVT_ACDT","PREMAT_ACDT","IUFD_ATCD","Par","Edu","Property","Transport","Insur","Occup","LMP","EDD)] # Anonymize patient code diabetes$Pat_code <- as.character(1:nrow(diabetes)) # Anonymize dates diabetes$Date_form <- as.integer(diabetes$Date_form - as.Date("1960-01-01")) for (x in grep("Date_",names(diabetes),value=TRUE)){ if (x != "Date_diag"){ diabetes[[x]] <- as.integer(diabetes[[x]] - diabetes$Date_diag) } } diabetes$Date_diag <- 0L ## Mask indirect identifying variables # Assign random integers to add/substract from variables mask_weight <- sample(-5:5, 1) mask_age <- sample(-5:5, 1) # Apply mask weightvars <- grep("Weight", names(diabetes), value=TRUE) diabetes[weightvars] <- diabetes[weightvars] + mask_weight diabetes$Age <- diabetes$Age + mask_age haven::write_dta(diabetes, "~/Diabetes/data_for_dryad.dta")