Code from: The retinal age gap: An affordable and highly accessible biomarker for population-wide disease screening across the globe

Nielsen, Christopher 1 ; Wilms, Matthias1; Forkert, Nils1

Published Mar 25, 2025 on Dryad. https://doi.org/10.5061/dryad.f7m0cfz72

Data files

Mar 25, 2025 version files 123.97 KB

README.md
6.61 KB
Software.zip
117.36 KB

Abstract

Traditional biomarkers, such as those obtained from blood tests, are essential for early disease detection, improving health outcomes, and reducing healthcare costs. However, they often involve invasive procedures, specialized laboratory equipment, or special handling of biospecimens. The retinal age gap (RAG) has emerged as a promising new biomarker that can overcome these limitations, making it particularly suitable for disease screening in low- and middle-income countries. This study aimed to evaluate the potential of the RAG as a biomarker for broad disease screening across a vast spectrum of diseases. Fundus images were collected from 86,522 UK Biobank participants, aged 40 to 83 (mean age: 56.2±8.3 years). A deep learning model was trained to predict retinal age using 17,791 images from healthy participants. The remaining images were categorized into disease/injury groups based on clinical codes. Additionally, 8,524 participants from the Brazilian Multilabel Ophthalmological Dataset (BRSET) were used for external validation. Among the 159 disease/injury groups from the 2019 Global Burden of Disease Study, 56 groups (35.2%) exhibited RAG distributions significantly different from healthy controls. Notable examples included chronic kidney disease, cardiovascular disease, blindness, vision loss, and diabetes. Overall, the RAG shows great promise as a cost-effective, non-invasive biomarker for early disease screening.

This repository contains the code and instructions required to replicate the analyses presented in the paper, “The retinal age gap: An affordable and highly accessible biomarker for population-wide disease screening across the globe.” The work leverages fundus images and tabulated participant data from the UK Biobank and BRSET, along with disease coding information, to compute and analyze the retinal age gap (RAG) as a biomarker.

Data Required
Running the Analysis
Utility Scripts
Environment Setup

Data Required

UK Biobank Data

What: Fundus image data and tabulated participant data.
How to Acquire:
- Register on the UK Biobank website.
- Follow the detailed instructions available in the UK Biobank Data Access Guide to download the necessary data.

BRSET Data

What: Fundus images and associated tabulated participant data.
How to Acquire:
- Register at PhysioNet – Brazilian Ophthalmological dataset.
- Once registered, you can download the data directly from the website.

ICD-10 Code Dictionary

What: The disease_icd10_code_dict.csv file contains ICD-10 codes associated with each disease and injury analyzed in this study.
How to Acquire:
- Ensure this file is placed in the appropriate directory within the repository (e.g., data/ or resources/ folder) so that it can be accessed by the analysis scripts.

Running the Analysis

The analysis is structured into a series of Jupyter Notebooks. Follow the steps below to run the complete workflow, from data preprocessing to generating the figures presented in the paper.

1. Data Preprocessing

Notebook: 1_data_preprocessing.ipynb

Steps:

Set Required Paths:
- path_to_ukbiobank_csv: Path to the location of the extracted UK Biobank CSV file.
- path_to_ukbiobank_fundus_images: Path to the directory containing the UK Biobank fundus images.
- path_to_brset_fundus_images: Path to the directory containing the BRSET fundus images.
Additional Instructions:
- Download the EyeQ repository and follow the instructions in the EyeQ README to preprocess the fundus images.
- Execute the preprocessing scripts to process all fundus images located in the directories specified.

2. Training Models for RAG Computation

Notebook: 2_train_models_for_RAG_computation.ipynb

Steps:

Set Required Paths:
- path_to_ukbiobank_preprocessed_fundus_images: Path to the directory containing the preprocessed UK Biobank fundus images.
Execution:
- Run the notebook to train the models that will be used for retinal age gap prediction.

3. Computing Model Predictions

Notebook: 3_compute_model_predictions.ipynb

Steps:

Set Required Paths:
- path_to_ukbiobank_preprocessed_fundus_images: Path to the directory containing the preprocessed UK Biobank fundus images.
- path_to_brset_preprocessed_fundus_images: Path to the directory containing the preprocessed BRSET fundus images.
Execution:
- Run the notebook to compute age predictions using the best-performing age prediction model.

4. Analyzing RAG Values and Generating Figures

Notebook: 4_analyze_rag_values.ipynb

Steps:

Execution:
- Run the notebook to calculate the retinal age gap (RAG) values.
- The notebook also provides scripts for generating the figures that are included in the paper.

Utility Scripts

The repository includes several utility Python scripts that support the analysis workflow. These scripts provide essential functions and classes used by the Jupyter notebooks.

config.py

Purpose: Contains configuration settings for training and evaluating models.
Key Components:
- Training hyperparameters (learning rate, batch size, epochs)
- Device configuration (CPU/GPU)
- Data augmentation transformations for different image resolutions
- Normalization parameters for fundus images

dataset.py

Purpose: Defines the dataset class for handling fundus images.
Key Components:
- FundusDataset class that inherits from PyTorch’s Dataset
- Methods for loading and preprocessing fundus images
- Support for both age prediction and disease prediction tasks

metadata_processing.py

Purpose: Processes metadata associated with each fundus image.
Key Components:
- Functions to compute and organize metadata from raw participant data
- Handling of demographic information, health status, and other relevant variables
- Data cleaning and transformation utilities

train_utils.py

Purpose: Contains utilities for training and evaluating models.
Key Components:
- Functions for training models (train_one_epoch, train_model)
- Model evaluation functions (evaluate_model)
- Support for different model architectures and resolutions
- Handling of training checkpoints and model saving

utils.py

Purpose: General utility functions used throughout the analysis.
Key Components:
- Model prediction functions (make_prediction)
- Accuracy checking and performance metrics (check_accuracy)
- Checkpoint saving and loading utilities
- Image processing functions (reading, writing, feature extraction)
- Metadata extraction from fundus filenames

Environment Setup

Python Version: This code used Python 3.8.
Dependencies: Install the required packages by running:
pip install -r requirements.txt

Code from: The retinal age gap: An affordable and highly accessible biomarker for population-wide disease screening across the globe

Data files

Abstract

README: Code from: The retinal age gap: An affordable and highly accessible biomarker for population-wide disease screening across the globe

Table of Contents

Data Required

UK Biobank Data

BRSET Data

ICD-10 Code Dictionary

Running the Analysis

1. Data Preprocessing

2. Training Models for RAG Computation

3. Computing Model Predictions

4. Analyzing RAG Values and Generating Figures

Utility Scripts

config.py

dataset.py

metadata_processing.py

train_utils.py

utils.py

Environment Setup