Hematoxylin-and-eosin-stained bladder urothelial cell carcinoma versus inflammation digital histopathology image dataset
Ameen, Yusra; Badary, Dalia; Hussain, Khaled; Sewisy, Adel (2023), Hematoxylin-and-eosin-stained bladder urothelial cell carcinoma versus inflammation digital histopathology image dataset, Dryad, Dataset, https://doi.org/10.5061/dryad.0cfxpnw5q
Digital pathology requires a large number of well-annotated image datasets to benefit from deep learning algorithms. Unfortunately, most available datasets are annotated at the slide level; which is not as useful as patch-level or pixel-level annotations. Additionally, urinary bladder cancer is underrepresented in digital pathology deep learning studies. Here, we present an annotated dataset of patch-level images obtained from 90 hematoxylin-and-eosin-stained histopathology slides of urinary bladder lesions. Non-overlapping photographs of all available tissue areas on each slide were systematically obtained and manually classified by the pathologist in our team as inflammation (5,948 images), urothelial cell carcinoma (UCC) (5,811 images), or invalid (3,132 images).
The dataset source was 90 formalin-fixed paraffin-embedded hematoxylin-and-eosin-stained histopathology slides with 4-μm-thick sections of urinary bladder lesions that were either cystitis (43 slides) or UCC (47 slides). Slides were obtained from 74 specimens from the Departments of Pathology of both the Faculty of Medicine and the Cancer Institute in our university. The UCC slides were of different pathological stages: pTa (five slides), pT1 (nine slides), pT2 (28 slides), and pT3a (five slides). Slides were photographed using an Olympus® E-330 digital camera mounted on an Olympus® CX31 light microscope by an Olympus® E330-ADU1.2X adapter. Magnification of the microscope was set to 20x. Certain camera settings were adjusted before photographing. The shutter speed, aperture value, International Organization for Standardization (ISO) sensitivity to light, and white balance were set automatically. Exposure compensation value, which controls the brightness, was set to +1.0. Images were set to have a resolution of 3,136x2,352 pixels, a Joint Photographic Experts Group (JPEG) format, and a 1:2.7 compression rate. Non-overlapping photographs of all available tissue areas on each slide were systematically obtained. This resulted in a total of 14,891 images.
Regardless of the slide-level diagnoses, the pathologist in our team manually classified all of the obtained images into three categories: inflammation, UCC, and invalid. An image-level (also known as patch-level) diagnosis of inflammation was based on the presence of inflammatory cell infiltrate in the form of lymphocytes, plasma cells, eosinophils, and/or polymorphs, in the absence of any malignant cells. An image-level diagnosis of UCC was based on the presence of malignant urothelial cells showing features of anaplasia in the form of pleomorphism, hyperchromatism, increased nuclear-cytoplasmic ratio, and increased mitotic figures. These malignant cells may be arranged in papillae, sheets, or groups. They may also be present as single cells. An image was considered invalid when it contained no sufficient criteria to be included in one of the other two categories, even if it only contained normal urinary bladder tissue. Also, an image was considered invalid if it contained tissues that were processed too badly to be diagnosed. The pathologist's classification resulted in a total of 5,948 inflammation images, 5,811 UCC images, and 3,132 invalid images. Invalid images were not excluded.