Data from: On the objectivity, reliability, and validity of deep learning enabled bioimage analyses

Segebarth, Dennis1 ; Griebel, Matthias 2 ; Stein, Nikolai2; R. von Collenberg, Cora1; Martin, Corinna1; Fiedler, Dominik3; Comeras, Lucas B.4; Sah, Anupam5; Schoeffler, Victoria1; Lüffe, Theresa1; Dürr, Alexander1; Gupta, Rohini1; Sasi, Manju1; Lillesaar, Christina1; Lange, Maren D.3; Tasan, Ramon O.4; Singewald, Nicolas5; Pape, Hans-Christian3; Flath, Christoph M.2 ; Blum, Robert 2

Published Oct 20, 2020; Updated Oct 21, 2020 on Dryad. https://doi.org/10.5061/dryad.4b8gtht9d

Data files

Oct 20, 2020 version files 7.80 GB

bioimage_data.zip

471.11 MB
model_library.zip

7.26 GB
notebooks.zip

20.89 KB
readme.txt

2.29 KB
requirements.txt

235 B
source_data.zip

22.34 MB
test_data.zip

151.34 KB
train_data.zip

39.86 MB
unet.zip

23.72 KB

Abstract

Bioimage analysis of fluorescent labels is widely used in the life sciences. Recent advances in deep learning (DL) allow automating time-consuming manual image analysis processes based on annotated training data. However, manual annotation of fluorescent features with a low signal-to-noise ratio is somewhat subjective. Training DL models on subjective annotations may be instable or yield biased models. In turn, these models may be unable to reliably detect biological effects. An analysis pipeline integrating data annotation, ground truth estimation, and model training can mitigate this risk. To evaluate this integrated process, we compared different DL-based analysis approaches. With data from two model organisms (mice, zebrafish) and five laboratories, we show that ground truth estimation from multiple human annotators helps to establish objectivity in fluorescent feature annotations. Furthermore, ensembles of multiple models trained on the estimated ground truth establish reliability and validity. Our research provides guidelines for reproducible DL-based bioimage analyses.

This data repository contains the source code and source data of our study. Raw bioimages represent cFOS labeling in different brain areas of mice after behavioral analyses (Pavlovian fear conditioning paradigms).We provide the code and training datasets that we used to generate expert and consensus models and ensembles, a model library that contains our validated consensus ensembles, the source data and our code used for the analyses, and the complete bioimage datasets of two laboratories (Lab-Wue1 [283 images] and Lab-Mue [24 images]).

Official repository of our study "On the objectivity, reliability, and validity of deep learning enabled bioimage analyses." You can find our paper at eLife. In addition, we also provide all code in our GitHub repository.

File organization:

bioimage_data.zip:

This folder contains the raw image data of all laboratories and an Excel sheet ("image_mapping.xlsx") that contains all metadata to associate the images with experimental data, like genotype, treatment condition (see code below) or whether the image was used for model training.

Treatment condition code:

   - lab-wue1: homecage (H), context control (-), context conditioned (+)
   - lab-mue: early retrieval (Ext), late retrieval (Ret)
   - lab-inns1: control (Ctrl), extinction (Ext)
   - lab-inns2: Saline, L-DOPA responder, L-DOPA non-responder
   - lab-wue2: wildtype (WT), gad1b knock-down (KO)

For each laboratory, we provide all labels predicted by the different models or ensembles as indicated with the path names: "*/labels/initialization_variant/model_type/model_or_ensemble/identifier/", and all regions in which bioimage analysis was performed. For two laboratories (lab-wue1 and lab-mue), we also provide all microscopy images.

model_library.zip:

This folder contains a selection of one validated consenus ensembles for each of the five bioimage datasets.

source_data.zip:

This folder contains the source data of our study and is organized according to the individual figures in which the data is presented. In each figure folder, you find a readme file that provides more detailed information about the respective files and which notebook was used to perform the analysis.

test_data.zip:

This folder contains the test dataset of lab-wue1.

train_data.zip:

This folder contains all training datasets that were generated in the course of this study. This includes all microscopy images, the labels of the individual experts, and the computed consensus labels.

requirements.txt:

This file contains a list of all packages and their versions that are required for local installation and execution of our codes.

Data from: On the objectivity, reliability, and validity of deep learning enabled bioimage analyses

Data files

Abstract

Methods

Usage notes

Works referencing this dataset