Skip to main content
Dryad logo

Examples of misclassified images of papilledema severity by the Deep Learning System

Citation

Vasseneix, Caroline; Najjar, Raymond P; Milea, Dan (2022), Examples of misclassified images of papilledema severity by the Deep Learning System, Dryad, Dataset, https://doi.org/10.5061/dryad.66t1g1k1x

Abstract

The study's objective is to evaluate the performance of a deep learning system (DLS) in classifying the severity of papilledema associated with increased intracranial pressure, on standard retinal fundus photographs.

A DLS was trained to automatically classify papilledema severity in 965 patients (2103 mydriatic fundus photographs), representing a multiethnic cohort of patients with confirmed elevated intracranial pressure. Training was performed on 1052 photographs with mild/moderate papilledema (MP) and 1051 photographs with severe papilledema (SP) classified by a panel of experts, and the performance of the DLS was tested in 111 patients (214 photographs, 92 with MP and 122 with SP).

In this dataset, we provide illustrative examples of misclassified images by the DLS, two examples of images wrongly classified as moderate papilledema instead of severe (figure 1A and 1B), and two examples of images wrongly classified as severe papilledema instead of moderate (figure 2A and 2B).

Unsurprisingly, DLS errors occurred more often in patients with moderate papilledema (Frisén 3 severity), a situation already encountered in clinical studies.

Methods

The study included de-identified unaltered digital ocular fundus photographs obtained in patients with confirmed intracranial hypertension and papilledema, from international neuro-ophthalmology centers. Two experts independently classified the fundus photographs into a simple two-grade papilledema severity classification; 1/ mild to moderate papilledema (Frisén 1-3), defined as disc edema with no obscuration of major blood vessels on the disc; 2/ severe papilledema (Frisén 4-5), defined as disc edema associated with any obscuration of major blood vessels on the disc. The fundus photographs were divided into 2 datasets, the training dataset used to teach the DLS to classify papilledema severity, and the testing dataset, used to evaluate the performance of the DLS.

The fundus photographs included in this dataset are illustrative examples of papilledema images which were misclassified in terms of severity, according to the gold standard, by the deep learning system.

Funding

Singapore National Medical Research Council, Award: CIRG18Nov-0013

Duke-NUS Medical School, Award: 05/FY2019/P2/06-A60

Singapore National Medical Research Council, Award: CIRG18Nov-0013