Skip to main content
Dryad

Examples of misclassified images of papilledema severity by the Deep Learning System

Data files

Abstract

The study's objective is to evaluate the performance of a deep learning system (DLS) in classifying the severity of papilledema associated with increased intracranial pressure, on standard retinal fundus photographs.

A DLS was trained to automatically classify papilledema severity in 965 patients (2103 mydriatic fundus photographs), representing a multiethnic cohort of patients with confirmed elevated intracranial pressure. Training was performed on 1052 photographs with mild/moderate papilledema (MP) and 1051 photographs with severe papilledema (SP) classified by a panel of experts, and the performance of the DLS was tested in 111 patients (214 photographs, 92 with MP and 122 with SP).

In this dataset, we provide illustrative examples of misclassified images by the DLS, two examples of images wrongly classified as moderate papilledema instead of severe (figure 1A and 1B), and two examples of images wrongly classified as severe papilledema instead of moderate (figure 2A and 2B).

Unsurprisingly, DLS errors occurred more often in patients with moderate papilledema (Frisén 3 severity), a situation already encountered in clinical studies.