Data from: Determining the subcellular location of new proteins from microscope images using local features
Data files
Aug 30, 2013 version files 2.77 GB
-
cytoplasmic.tar.bz2
155.35 MB
-
cytoskeleton.tar.bz2
211.27 MB
-
ER.tar.bz2
94.86 MB
-
Golgi.tar.bz2
180.46 MB
-
labeled-confocal.tar.bz2
646.23 MB
-
lysosome.tar.bz2
149.52 MB
-
membrane.tar.bz2
108.26 MB
-
mitochondria.tar.bz2
178.10 MB
-
nuclear.tar.bz2
256.23 MB
-
nucleoli.tar.bz2
182.05 MB
-
README_for_cytoplasmic.tar.bz2.txt
1.92 KB
-
README_for_cytoskeleton.tar.bz2.txt
1.92 KB
-
README_for_ER.tar.bz2.txt
1.92 KB
-
README_for_Golgi.tar.bz2.txt
1.92 KB
-
README_for_labeled-confocal.tar.rst
1.38 KB
-
README_for_lysosome.tar.bz2.txt
1.92 KB
-
README_for_membrane.tar.bz2.txt
1.92 KB
-
README_for_mitochondria.tar.bz2.txt
1.92 KB
-
README_for_nuclear.tar.txt
1.83 KB
-
README_for_nucleoli.tar.bz2.txt
1.92 KB
-
README_for_unlabeled.tar.txt
1.83 KB
-
unlabeled.tar.bz2
603.98 MB
Abstract
Motivation: Evaluation of previous systems for automated determination of subcellular location from microscope images has been done using datasets in which each location class consisted of multiple images of the same representative protein. Here, we frame a more challenging and useful problem where previously unseen proteins are to be classified. Results: Using CD-tagging, we generated two new image datasets for evaluation of this problem, which contain several different proteins for each location class. Evaluation of previous methods on these new datasets showed that it is much harder to train a classifier that generalizes across different proteins than one that simply recognizes a protein it was trained on. We therefore developed and evaluated additional approaches, incorporating novel modifications of local features techniques. These extended the notion of local features to exploit both the protein image and any reference markers that were imaged in parallel. With these, we obtained a large accuracy improvement in our new datasets over existing methods. Additionally, these features help achieve classification improvements for other previously studied datasets.