Code from: The retinal age gap: An affordable and highly accessible biomarker for population-wide disease screening across the globe
Data files
Mar 25, 2025 version files 123.97 KB
-
README.md
6.61 KB
-
Software.zip
117.36 KB
Abstract
Traditional biomarkers, such as those obtained from blood tests, are essential for early disease detection, improving health outcomes, and reducing healthcare costs. However, they often involve invasive procedures, specialized laboratory equipment, or special handling of biospecimens. The retinal age gap (RAG) has emerged as a promising new biomarker that can overcome these limitations, making it particularly suitable for disease screening in low- and middle-income countries. This study aimed to evaluate the potential of the RAG as a biomarker for broad disease screening across a vast spectrum of diseases. Fundus images were collected from 86,522 UK Biobank participants, aged 40 to 83 (mean age: 56.2±8.3 years). A deep learning model was trained to predict retinal age using 17,791 images from healthy participants. The remaining images were categorized into disease/injury groups based on clinical codes. Additionally, 8,524 participants from the Brazilian Multilabel Ophthalmological Dataset (BRSET) were used for external validation. Among the 159 disease/injury groups from the 2019 Global Burden of Disease Study, 56 groups (35.2%) exhibited RAG distributions significantly different from healthy controls. Notable examples included chronic kidney disease, cardiovascular disease, blindness, vision loss, and diabetes. Overall, the RAG shows great promise as a cost-effective, non-invasive biomarker for early disease screening.
This repository contains the code and instructions required to replicate the analyses presented in the paper, “The retinal age gap: An affordable and highly accessible biomarker for population-wide disease screening across the globe.” The work leverages fundus images and tabulated participant data from the UK Biobank and BRSET, along with disease coding information, to compute and analyze the retinal age gap (RAG) as a biomarker.
Table of Contents
Data Required
UK Biobank Data
- What: Fundus image data and tabulated participant data.
- How to Acquire:
- Register on the UK Biobank website.
- Follow the detailed instructions available in the UK Biobank Data Access Guide to download the necessary data.
BRSET Data
- What: Fundus images and associated tabulated participant data.
- How to Acquire:
- Register at PhysioNet – Brazilian Ophthalmological dataset.
- Once registered, you can download the data directly from the website.
ICD-10 Code Dictionary
- What: The
disease_icd10_code_dict.csv
file contains ICD-10 codes associated with each disease and injury analyzed in this study. - How to Acquire:
- Ensure this file is placed in the appropriate directory within the repository (e.g.,
data/
orresources/
folder) so that it can be accessed by the analysis scripts.
- Ensure this file is placed in the appropriate directory within the repository (e.g.,
Running the Analysis
The analysis is structured into a series of Jupyter Notebooks. Follow the steps below to run the complete workflow, from data preprocessing to generating the figures presented in the paper.
1. Data Preprocessing
Notebook: 1_data_preprocessing.ipynb
Steps:
- Set Required Paths:
path_to_ukbiobank_csv
: Path to the location of the extracted UK Biobank CSV file.path_to_ukbiobank_fundus_images
: Path to the directory containing the UK Biobank fundus images.path_to_brset_fundus_images
: Path to the directory containing the BRSET fundus images.
- Additional Instructions:
- Download the EyeQ repository and follow the instructions in the EyeQ README to preprocess the fundus images.
- Execute the preprocessing scripts to process all fundus images located in the directories specified.
2. Training Models for RAG Computation
Notebook: 2_train_models_for_RAG_computation.ipynb
Steps:
- Set Required Paths:
path_to_ukbiobank_preprocessed_fundus_images
: Path to the directory containing the preprocessed UK Biobank fundus images.
- Execution:
- Run the notebook to train the models that will be used for retinal age gap prediction.
3. Computing Model Predictions
Notebook: 3_compute_model_predictions.ipynb
Steps:
- Set Required Paths:
path_to_ukbiobank_preprocessed_fundus_images
: Path to the directory containing the preprocessed UK Biobank fundus images.path_to_brset_preprocessed_fundus_images
: Path to the directory containing the preprocessed BRSET fundus images.
- Execution:
- Run the notebook to compute age predictions using the best-performing age prediction model.
4. Analyzing RAG Values and Generating Figures
Notebook: 4_analyze_rag_values.ipynb
Steps:
- Execution:
- Run the notebook to calculate the retinal age gap (RAG) values.
- The notebook also provides scripts for generating the figures that are included in the paper.
Utility Scripts
The repository includes several utility Python scripts that support the analysis workflow. These scripts provide essential functions and classes used by the Jupyter notebooks.
config.py
- Purpose: Contains configuration settings for training and evaluating models.
- Key Components:
- Training hyperparameters (learning rate, batch size, epochs)
- Device configuration (CPU/GPU)
- Data augmentation transformations for different image resolutions
- Normalization parameters for fundus images
dataset.py
- Purpose: Defines the dataset class for handling fundus images.
- Key Components:
FundusDataset
class that inherits from PyTorch’s Dataset- Methods for loading and preprocessing fundus images
- Support for both age prediction and disease prediction tasks
metadata_processing.py
- Purpose: Processes metadata associated with each fundus image.
- Key Components:
- Functions to compute and organize metadata from raw participant data
- Handling of demographic information, health status, and other relevant variables
- Data cleaning and transformation utilities
train_utils.py
- Purpose: Contains utilities for training and evaluating models.
- Key Components:
- Functions for training models (
train_one_epoch
,train_model
) - Model evaluation functions (
evaluate_model
) - Support for different model architectures and resolutions
- Handling of training checkpoints and model saving
- Functions for training models (
utils.py
- Purpose: General utility functions used throughout the analysis.
- Key Components:
- Model prediction functions (
make_prediction
) - Accuracy checking and performance metrics (
check_accuracy
) - Checkpoint saving and loading utilities
- Image processing functions (reading, writing, feature extraction)
- Metadata extraction from fundus filenames
- Model prediction functions (
Environment Setup
- Python Version: This code used Python 3.8.
- Dependencies: Install the required packages by running:
pip install -r requirements.txt