Deep learning chronic wasting disease (CWD) immunohistochemistry (IHC) image dataset
Data files
Oct 10, 2025 version files 172.04 GB
-
images01.zip
7.52 GB
-
images02.zip
6.71 GB
-
images03.zip
7.75 GB
-
images04.zip
8.78 GB
-
images05.zip
8.59 GB
-
images06.zip
3.87 GB
-
images07.zip
3.78 GB
-
images08.zip
3.69 GB
-
images09.zip
3.27 GB
-
images10.zip
3.20 GB
-
images11.zip
3.60 GB
-
images12.zip
1.48 GB
-
images13.zip
5.77 GB
-
images14.zip
7.64 GB
-
images15.zip
6.94 GB
-
images16.zip
2.53 GB
-
images17.zip
3.52 GB
-
images18.zip
8.50 GB
-
images19.zip
7.84 GB
-
images20.zip
5.86 GB
-
images21.zip
6.11 GB
-
images22.zip
7.69 GB
-
images23.zip
6.19 GB
-
images24.zip
6.56 GB
-
images25.zip
6.54 GB
-
images26.zip
5.50 GB
-
images27.zip
6.06 GB
-
images28.zip
7.21 GB
-
images29.zip
6.56 GB
-
images30.zip
2.77 GB
-
qupath.zip
13.17 MB
-
README.md
2.98 KB
Abstract
The dataset contains 143 whole-slide images (WSI) containing a combination of central nervous system tissue, typically obex containing the dorsal motor nucleus of the Vagus (DMNV; n= 137) and retropharyngeal lymph nodes (RPLN; n = 114) derived from surveillance diagnostic samples and farmed cervid depopulations. Species represented in the training data set included white tailed deer (n = 68), sheep (n= 54), elk (n = 14), goat (n = 4), and moose (n = 3). Of the 143 slides, 54 were identified as suspect (i.e. detected) and 89 were not detected. Ground truth annotations for lymphoid follicles in retropharyngeal lymph nodes and the dorsal motor nucleus of the Vagus (DMNV) in obex samples were manually annotated by a transmissible spongiform encephalopathy (TSE) trained board-certified veterinary anatomic pathologist. Annotations were performed in QuPath 5.0 using the brush tool. In total, the training data set contains 3,296 annotations broken down into +/- DMNV regions (n = 224+/438-, respectively) and +/- lymphoid follicular regions (n = 1295+/1339-, respectively). The dataset was collected and annotated in order to train deep neural networks for tissue type and anatomical structure detection. The code is available in an accompanying Github repository.
Dataset DOI: 10.5061/dryad.w6m905r2d
Description of the data and file structure
This dataset was created to provide training data for deep learning image analysis approach specifically tailored to review slides from large-scale veterinary prion disease surveillance. The training dataset includes 143 prion IHC whole-slide images (WSI) containing a total of 3,296 manual annotations. Annotated images were segmented into non-overlapping tiles and then used to fine-tune a pretrained convolutional neural network, enhancing the model’s ability to recognize prion-specific quality-control parameters and staining features. When tested on a separate, blinded dataset of 50 CWD IHC slides, the model achieved 100% concordance for tissue classification (brain vs. lymph node), 94% concordance for identifying relevant anatomical structures (lymphoid follicles and dorsal motor nucleus), and 100% concordance for chromogen staining when compared to evaluation by a trained veterinary pathologist.
Files and variables
The dataset consists of 143 images in SVS format (slide001.svs - slide143.svs). These are provided in 30 zip files (images01.zip - images30.zip). Slide resolution is in the megapixel range with approximately 0.264 micrometers per pixel. Each image consists of several structures that are annotated using the QuPath software system for bioimage analysis. The annotations were created using the QuPath brush tool and consist of either "Follicle", "Non-follicular", "Dorsal motor nucleus", "Not DMN", "Midline" or "Not midline". The QuPath project is included in the qupath.zip file.
Code/software
Software for processing the images is available in the GitHub repo at https://github.com/holderlb/Deep-CWD-IHC. This repo include code for extracting image annotations, generating image tiles, training deep learning models, and using the deep learning models to analyze new images. See the README file there for details.
QuPath version 0.5 (https://qupath.github.io) was used to create the annotations. The included QuPath project (qupath.zip) contains a standard QuPath project file (project.qpproj) and supporting files written by QuPath and includes references to all images and their annotations. To view the images in QuPath, first extract the qupath.zip file and extract the images to some directory. Open QuPath, select File -> Project -> Open Project, and choose the project.qpproj file. Since your images will likely be in a different directory than on our platform, QuPath will complain about not finding the images. Select the Search button in that window, navigate to the directory containing your images, and then select Apply Changes. Double-clicking on an image in the left menu will display the full image with annotations.
Dataset case selection
Formalin-fixed, paraffin-embedded (FFPE) tissues, including retropharyngeal lymph node and obex submitted for TSE surveillance were retrospectively selected from the Washington Animal Disease Diagnostic Laboratory (WADDL) as well as United States Department of Agriculture Agricultural Research Service Animal Disease Research Unit (USDA-ARS-ADRU) scrapie research cases. Inclusion criteria required that cases had been previously evaluated by a TSE trained veterinary pathologist and assigned one of the following diagnostic categories based on immunohistochemistry (IHC): Detected, Not Detected, Location, or Insufficient Follicles. These categories reflect standard interpretive outcomes used in TSE surveillance programs and represent the full spectrum of tissue and staining conditions encountered in diagnostic practice. Cases were excluded if they had been assigned an unacceptable or unsuitable diagnostic code typically due to poor sample fixation and subsequent postmortem autolysis.
Tissue processing and staining
All immunohistochemical staining and evaluation were performed according to the NVSL document: Detection of Scrapie and Chronic Wasting Disease by Immunohistochemistry.22 Briefly, all samples were formalin fixed in 10% neutral buffered formalin with routine processing and embedding. Slides were cut at 4 µm thickness and mounted on NVSL approved charged slides. Mounted slides were pretreated with 96% Formic Acid for 5 minutes and transferred to Tris Buffer 0.05M, pH 7.5 to rinse. Decloaking was performed utilizing the Diva Decloaker solution (Cat# DV2004; Biocare Medical, Pacheco, CA) in combination with the BioCare antigen retrieval chamber pot (Cat# DCARC0001; Biocare Medical, Pacheco, CA).
IHC was performed on the Ventana Discovery Ultra autostainer (Roche Diagnostics, Indianapolis, IN), employing the ready-to-use (RTU) dispensers for F99 monoclonal antibody provided in the Anti-Prion Research Kit (Cat# 760-231; Roche Diagnostics, Indianapolis, IN). Individual RTUs additionally contain biotinylated secondary antibody, an alkaline phosphatase–streptavidin detection system, and a substrate chromogen composed of fast red A, naphthol, and fast red B, followed by hematoxylin counterstaining. Each staining run included a positive control section.
Digitization
All whole slide images were acquired using a Leica GT450 using default settings at a magnification of 400x (image resolution: 0.264 µm/pixel or 96,154 pixels/inch). All WSI images were saved as .SVS files and viewed or annotated using QuPath throughout the project.
