Coelho, Luis Pedro; Kangas, Joshua D.; Naik, Armaghan W.; Osuna-Highley, Elvira; Glory-Afshar, Estelle; Fuhrman, Margaret; Simha, Ramanuja; Berget, Peter B.; Jarvik, Jonathan W.; Murphy, Robert F.

Published Aug 30, 2013 on Dryad. https://doi.org/10.5061/dryad.2vm70

Abstract

Motivation: Evaluation of previous systems for automated determination of subcellular location from microscope images has been done using datasets in which each location class consisted of multiple images of the same representative protein. Here, we frame a more challenging and useful problem where previously unseen proteins are to be classified. Results: Using CD-tagging, we generated two new image datasets for evaluation of this problem, which contain several different proteins for each location class. Evaluation of previous methods on these new datasets showed that it is much harder to train a classifier that generalizes across different proteins than one that simply recognizes a protein it was trained on. We therefore developed and evaluated additional approaches, incorporating novel modifications of local features techniques. These extended the notion of local features to exploit both the protein image and any reference markers that were imaged in parallel. With these, we obtained a large accuracy improvement in our new datasets over existing methods. Additionally, these features help achieve classification improvements for other previously studied datasets.

Labeled RandTag Widefield Images (Unlabeled Subset)

This datasets consists of fluorescent microscope images of GFP-tagged proteins locating to different organelles. The organelles were annotated by visual inspection. For each organelle, multiple proteins locating to it were identified; and for each protein, multiple images were acquired. Images contain two channels: GFP tagged protein and a nuclear marker (Hoechst). NIH 3T3 cells were tagged using CD tagging. This data was acquired on a widefield microscope

unlabeled.tar.bz2

Labeled RandTag Confocal Images

labeled-confocal.tar.bz2

Labeled RandTag Widefield Images (Nuclear Subset)

Part of widefield labeled dataset

nuclear.tar.bz2

Labeled RandTag Widefield Images (Cytoskeleton Subset)

This is part of the labeled widefield dataset.

cytoskeleton.tar.bz2

Labeled RandTag Widefield Images (Nucleoli Subset)

This is part of the labeled widefield dataset.

nucleoli.tar.bz2

Labeled RandTag Widefield Images (Golgi Subset)

This file is part of the labeled widefield dataset.

Golgi.tar.bz2

Labeled RandTag Widefield Images (Mitochondria Subset)

Part of widefield labeled subset

mitochondria.tar.bz2

Labeled RandTag Widefield Images (Cytoplasmic Subset)

This is part of the labeled widefield dataset.

cytoplasmic.tar.bz2

Labeled RandTag Widefield Images (Lysosomal Subset)

This is part of the labeled widefield dataset

lysosome.tar.bz2

Labeled RandTag Widefield Images (Plasma Membrane Subset)

This file is part of the widefield labeled dataset

membrane.tar.bz2

Labeled RandTag Widefield Images (ER Subset)

This is part of the labeled widefield dataset.

ER.tar.bz2

Data from: Determining the subcellular location of new proteins from microscope images using local features

Data files

Abstract

Labeled RandTag Widefield Images (Unlabeled Subset)

Labeled RandTag Confocal Images

Labeled RandTag Widefield Images (Nuclear Subset)

Labeled RandTag Widefield Images (Cytoskeleton Subset)

Labeled RandTag Widefield Images (Nucleoli Subset)

Labeled RandTag Widefield Images (Golgi Subset)

Labeled RandTag Widefield Images (Mitochondria Subset)

Labeled RandTag Widefield Images (Cytoplasmic Subset)

Labeled RandTag Widefield Images (Lysosomal Subset)

Labeled RandTag Widefield Images (Plasma Membrane Subset)

Labeled RandTag Widefield Images (ER Subset)

Data from: Determining the subcellular location of new proteins from microscope images using local features

Data files

Abstract

Usage notes

Labeled RandTag Widefield Images (Unlabeled Subset)

Labeled RandTag Confocal Images

Labeled RandTag Widefield Images (Nuclear Subset)

Labeled RandTag Widefield Images (Cytoskeleton Subset)

Labeled RandTag Widefield Images (Nucleoli Subset)

Labeled RandTag Widefield Images (Golgi Subset)

Labeled RandTag Widefield Images (Mitochondria Subset)

Labeled RandTag Widefield Images (Cytoplasmic Subset)

Labeled RandTag Widefield Images (Lysosomal Subset)

Labeled RandTag Widefield Images (Plasma Membrane Subset)

Labeled RandTag Widefield Images (ER Subset)

Works referencing this dataset