The FAIR database: facilitating access to public health research literature
Data files
Nov 28, 2024 version files 1.84 MB
-
ProgressTrainingCombined_(1).zip
1.84 MB
-
README.md
1.38 KB
Abstract
Objective: In public health, access to research literature is critical to informing decision making and to identify knowledge gaps. However, identifying relevant research is not a straightforward task since public health interventions are often complex, can have positive and negative impacts on health inequalities and are applied in diverse and rapidly evolving settings. We developed a ‘living’ database of public health research literature to facilitate access to this information using Natural Language Processing tools. Materials and Methods: Classifiers were identified to identify the study design (e.g. cohort study or clinical trial) and relationship to factors that may be relevant to inequalities using the PROGRESS-Plus classification scheme. Training data was obtained from existing MEDLINE labels and from a set of systematic reviews in which studies were annotated with PROGRESS-Plus categories. Results: Evaluation of the classifiers showed that the study type classifier achieved average precision and recall of 0.803 and 0.930 respectively. The PROGRESS-Plus classification proved more challenging with average precision and recall of 0.608 and 0.534. The FAIR database uses information provided by these classifiers to facilitate access to inequality-related public health literature. Discussion: Previous work on automation of evidence synthesis has focussed on clinical areas rather than public health, despite the need being arguably greater. Conclusion: The development of the FAIR databased demonstrates that it is possible to create a publicly accessible and regularly updated database of public health research literature focused on inequalities. The database is freely available (https://eppi.ioe.ac.uk/eppi-vis/Fair).
https://doi.org/10.5061/dryad.wdbrv15zn
Description of the data and file structure
The file is in .tsv format with the following headers:
PaperId: the Microsoft Academic number for the paper
PaperTitle: the title of the paper
Citations: a list of Microsoft Academic numbers for the papers in the paper's bibliography
coFoS: Microsoft Academic 'fields of study' / Topics
Authors: the authors of the paper
Abstract: the abstract of the paper
PublicationDate: the date of publication
DocType: the type of document
FamilyId: not used
RecordId: not used
CN: not used
Incl: not used
Lang: not used
V: volume
I: issue
DOI: DOI
JN: journal title
PG: pages
Y: year of publication
URLs: URLs where the full text paper may be downloaded
umls: not used
Place: PROGRESS-Plus category
Race: PROGRESS-Plus category
Occupation: PROGRESS-Plus category
Gender: PROGRESS-Plus category
Religion: PROGRESS-Plus category
Education: PROGRESS-Plus category
Socioeconomic: PROGRESS-Plus category
Social Plus: PROGRESS-Plus category
Access information
Other publicly accessible locations of the data:
Data was derived from the following sources:
- Microsoft Academic (now OpenAlex)
1978 papers that had been included in systematic reviews previously were identified for training and testing the machine learning model. Please see the paper and website for further information.