Training and test data for: Not getting in too deep: A practical deep learning approach to routine crystallisation image classification
Data files
Jan 31, 2023 version files 13.96 GB
-
README.md
7.71 KB
-
Test1.zip
935.46 MB
-
Test2.zip
928.39 MB
-
Test3.zip
1.79 GB
-
TrainingClear.zip
2.19 GB
-
TrainingCrystalline.zip
1.45 GB
-
TrainingHeavyPrecipitate.zip
1.64 GB
-
TrainingLightPrecipitate.zip
2.46 GB
-
TrainingNull.zip
193.02 MB
-
TrainingOptimisable.zip
726.60 MB
-
TrainingPhaseSeparation.zip
1.03 GB
-
TrainingShootable.zip
609.94 MB
Abstract
These data were used to classify crystallisation experiments in Milne et al., (https://doi.org/10.1101/2022.09.28.509868). Here, four of the most widely-used convolutional deep-learning network architectures that can be implemented without the need for extensive computational resources were compared. It was shown that the classifiers have different strengths that can be combined to provide an ensemble classifier achieving a classification accuracy comparable to that obtained by a large consortium initiative (Bruno et al. PLOS one, 13(6), 2018). Eight classes were used to rank the experimental outcomes, thereby providing detailed information that can be used with routine crystallography experiments to automatically identify crystal formation for drug discovery and pave the way for further exploration of the relationship between crystal formation and crystallisation conditions.
The images in this dataset were collected at AstraZeneca UK using a Rock imager (Formulatrix) and cropped from the original 1028x960 pixels to 800x800 pixels.
The data files in this submission are in PNG format and are compressed as .zip files. The images for three independent test sets are compressed separately (Test1.zip, Test2.zip, Test3.zip) whilst the images used as the training set are compressed as separate classes (e.g.TrainingClear.zip contains training set images from the class 'Clear').