Data from: Screening familial risk for hereditary breast and ovarian cancer
Data files
Oct 17, 2024 version files 48.29 MB
-
datashare1.csv
47.61 MB
-
datashare2.csv
673.33 KB
-
README.md
4.92 KB
Abstract
Background: Most patients with pathogenic or likely pathogenic (P/LP) variants for breast cancer have not undergone genetic testing. Our primary objective was to identify patients meeting family history criteria for genetic testing in the electronic health record (EHR).
Methods: This study included a cross-sectional analysis with an observation date of February 1, 2024. Participants included patients aged 18 to 79 years enrolled in Renown Health, a large health system in Northern Nevada. Genotype was known for 38,003 patients enrolled in Healthy Nevada Project (HNP), a population genomics study. The primary exposure in this study was an EHR indicating that a patient is positive for criteria according to the Seven-Question Family History Questionnaire (hereafter, FHS7 positive) assessing familial risk for hereditary breast and ovarian cancer (HBOC). The primary outcomes were the presence of P/LP variants in the ATM, BRCA1, BRCA2, CHEK2, or PALB2 genes.
Results: Among 835,727 patients, 29,913 (3.6%) were FHS7 positive. Among those who were FHS7 positive, 24,535 (82.0%) had no evidence of prior genetic testing for HBOC in their EHR. Being FHS7 positive was associated with increased prevalence of P/LP variants in BRCA1/BRCA2 (odds ratio [OR], 3.34; 95% CI, 2.48-4.47), CHEK2 (OR, 1.62; 95% CI, 1.05-2.43), and PALB2 (OR, 2.84; 95% CI, 1.23-6.16) among HNP female individuals, and in BRCA1/BRCA2 (OR, 3.35; 95% CI, 1.93-5.56) among HNP male individuals. Among 1,527 HNP survey respondents, 352 of 383 EHR-FHS7 positive patients (91.9%) were survey-FHS7 positive, but only 352 of 883 survey-FHS7 positive patients (39.9%) were EHR-FHS7 positive.
Conclusions and Relevance: In this cross-sectional study, EHR-derived FHS7 identified thousands of patients with familial risk for breast cancer, indicating a substantial gap in genetic testing.
Data description: Data used to create Tables 1, 3, and 4 of the manuscript are provided as comma-separated-values files. Data used to create Table 2 and to train the cause-specific hazard models are are subject to HIPAA and other privacy and compliance restrictions. Requests for these and other data may be addressed to Joe Grzymski (at jgrzymski@med.unr.edu) or Craig Kugler (at ckugler@med.unr.edu). These data (and other sensitive data) are available to qualified researchers upon reasonable request and with permission from the Center for Genomic Medicine. The HNP encourages collaboration with scientific researchers on an individual basis. Examples of restrictions that will be considered in requests to data access include but are not limited to:
1. Whether the request comes from an academic institution in good standing that will collaborate with our team to
protect the privacy of the participants and the security of the data requested.
2. Type and amount of data requested.
3. Feasibility of the research suggested.
4. Amount of resource allocation to support the collaboration
https://doi.org/10.5061/dryad.gf1vhhmxr
Description of the data and file structure
These data were collected for a study examining the association of family history criteria (as defined by the questions in the Seven-Question Family History Questionnaire, or FHS7) with the presence of pathogenic or likely pathogenic variants associated with hereditary breast and ovarian cancer. The association of criteria in FHS7 with cancer outcomes was also examined.
Files and variables
File: datashare1.csv
Description: Contains records for 835,727 patients indicating their FHS7 status according to their EHRs. For 1,527 survey respondents, the records also indicate their FHS7 status according to their survey responses.
Variables
- id: unique ID for each patient, assigned randomly. Does not correspond to id in datashare2.csv.
- sex: recorded sex of patient, with values anonymized (Male - 0 and Female - 1).
- fhs7pos_ehr: binary indicator (0/1) for whether a patient met FHS7 criteria according to their electronic health record (EHR, 1 indicates meeting criteria)
- Q1ehr: binary indicator (Yes/No) indicating whether a patient met the criteria in FHS7 question 1 in their EHR
- Q2ehr: binary indicator (Yes/No) indicating whether a patient met the criteria in FHS7 question 2 in their EHR
- Q3ehr: binary indicator (Yes/No) indicating whether a patient met the criteria in FHS7 question 3 in their EHR
- Q4ehr: binary indicator (Yes/No) indicating whether a patient met the criteria in FHS7 question 4 in their EHR
- Q5ehr: binary indicator (Yes/No) indicating whether a patient met the criteria in FHS7 question 5 in their EHR
- Q6ehr: binary indicator (Yes/No) indicating whether a patient met the criteria in FHS7 question 6 in their EHR
- Q7ehr: binary indicator (Yes/No) indicating whether a patient met the criteria in FHS7 question 7 in their EHR
- fhs7pos_survey: binary indicator (0/1) for whether a patient met FHS7 criteria according to their survey responses (1 indicates meeting criteria). NA values (in this column and in columns Q1survey-Q7survey) indicate that the patient did not respond to any of the FHS7 questions in the survey.
- Q1survey: categorial variable (Yes/No/I don’t know/no response) indicating how a patient answered FHS7 question 1 in the survey.
- Q2survey: categorial variable (Yes/No/I don’t know/no response) indicating how a patient answered FHS7 question 2 in the survey.
- Q3survey: categorial variable (Yes/No/I don’t know/no response) indicating how a patient answered FHS7 question 3 in the survey.
- Q4survey: categorial variable (Yes/No/I don’t know/no response) indicating how a patient answered FHS7 question 4 in the survey.
- Q5survey: categorial variable (Yes/No/I don’t know/no response) indicating how a patient answered FHS7 question 5 in the survey.
- Q6survey: categorial variable (Yes/No/I don’t know/no response) indicating how a patient answered FHS7 question 6 in the survey.
- Q7survey: categorial variable (Yes/No/I don’t know/no response) indicating how a patient answered FHS7 question 7 in the survey.
File: datashare2.csv
Description: Contains records for each of 37,996 HNP participants indicating FHS7 status, sex, and genes with P/LP variants. Due to privacy concerns, a small number of patients with unknown sex or sex recorded as non-binary were excluded.
Variables
- id: unique ID for each patient, assigned randomly. Does not correspond to id in datashare1.csv.
- fhs7_positive: binary indicator (0/1) for whether a patient met FHS7 criteria according to their electronic health record (1 indicates meeting criteria)
- fhs7_or_cancer: binary indicator (0/1) for whether a patient met FHS7 criteria or had a personal history of breast, ovarian, fallopian tubal, peritoneal, pancreatic, or prostate cancer (1 indicates meeting criteria)
- sex: recorded sex of patient, with values anonymized (Male - 0 and Female - 1)
- gene: categorical variable indicating whether patients tested positive for pathogenic or likely pathogenic (P/LP) variants in genes related to breast cancer (ATM, BRCA1, BRCA2, CHEK2, PALB2, or none). Some patients tested positive for P/LP variants in multiple genes - in these cases both genes are listed separated by a backwards slash.
Code/software
Code for reproducing Tables 1, 3, and 4 in the main manuscript using the data provided in this repository is provided as the R script data_sharing_code240823.R. For our work, we used R version 4.4.0 and R packages dplyr, tidyr, tibble, htmlTable, flextable, and foreach. Executing the code requires the script and the data files (datashare1.csv and datashare2.csv) to be stored in the same directory.
If desired, code to output tables to Microsoft Word can be uncommented in the script.
These data were extracted from deidentified electronic health records provided by Renown Health in Reno, Nevada, USA. Methods for extracting family history data and other patient information are described in the Methods section of the main manuscript, as well as in the Supplemental Online Content.