Motivation: Evaluation of previous systems for automated determination of subcellular location from microscope images has been done using datasets in which each location class consisted of multiple images of the same representative protein. Here, we frame a more challenging and useful problem where previously unseen proteins are to be classified. Results: Using CD-tagging, we generated two new image datasets for evaluation of this problem, which contain several different proteins for each location class. Evaluation of previous methods on these new datasets showed that it is much harder to train a classifier that generalizes across different proteins than one that simply recognizes a protein it was trained on. We therefore developed and evaluated additional approaches, incorporating novel modifications of local features techniques. These extended the notion of local features to exploit both the protein image and any reference markers that were imaged in parallel. With these, we obtained a large accuracy improvement in our new datasets over existing methods. Additionally, these features help achieve classification improvements for other previously studied datasets.
Labeled RandTag Widefield Images (Unlabeled Subset)
This datasets consists of fluorescent microscope images of GFP-tagged proteins locating to different organelles. The organelles were annotated by visual inspection. For each organelle, multiple proteins locating to it were identified; and for each protein, multiple images were acquired. Images contain two channels: GFP tagged protein and a nuclear marker (Hoechst). NIH 3T3 cells were tagged using CD tagging. This data was acquired on a widefield microscope
unlabeled.tar.bz2
Labeled RandTag Confocal Images
This datasets consists of fluorescent microscope images of GFP-tagged proteins locating to different organelles. The organelles were annotated by visual inspection. For each organelle, multiple proteins locating to it were identified; and for each protein, multiple images were acquired. Images contain two channels: GFP tagged protein and a nuclear marker (Hoechst). NIH 3T3 cells were tagged using CD tagging. This data was acquired on a confocal microscope.
labeled-confocal.tar.bz2
Labeled RandTag Widefield Images (Nuclear Subset)
Part of widefield labeled dataset
nuclear.tar.bz2
Labeled RandTag Widefield Images (Cytoskeleton Subset)
This is part of the labeled widefield dataset.
cytoskeleton.tar.bz2
Labeled RandTag Widefield Images (Nucleoli Subset)
This is part of the labeled widefield dataset.
nucleoli.tar.bz2
Labeled RandTag Widefield Images (Golgi Subset)
This file is part of the labeled widefield dataset.
Golgi.tar.bz2
Labeled RandTag Widefield Images (Mitochondria Subset)
Part of widefield labeled subset
mitochondria.tar.bz2
Labeled RandTag Widefield Images (Cytoplasmic Subset)
This is part of the labeled widefield dataset.
cytoplasmic.tar.bz2
Labeled RandTag Widefield Images (Lysosomal Subset)
This is part of the labeled widefield dataset
lysosome.tar.bz2
Labeled RandTag Widefield Images (Plasma Membrane Subset)
This file is part of the widefield labeled dataset
membrane.tar.bz2
Labeled RandTag Widefield Images (ER Subset)
This is part of the labeled widefield dataset.
ER.tar.bz2