Reference Information ===================== Provenance for this README -------------------------- * File name: README.md * Authors: Jamie Milne * Other contributors: Julie Wilson, Yinhai Wang, Chen Qian, David Hargreaves * Date created: 2023-01-10 * Date modified: 2023-01-31 Dataset Version and Release History ----------------------------------- * Current Version: * Number: 1.0.0 * Date: 2023-01-31 * Persistent identifier: DOI: 10.5061/dryad.0k6djhb45 * Summary of changes: More in depth look into the data * Embargo Provenance: n/a * Scope of embargo: n/a * Embargo period: n/a Dataset Attribution and Usage ----------------------------- * Dataset Title: Training and test data for: Not getting in too deep: A practical deep learning approach to routine crystallisation image classification * Persistent Identifier: https://doi.org/10.5061/dryad.0k6djhb45 * Dataset Contributors: * Creators: Jamie Milne, Julie Wilson, Yinhai Wang, Chen Qian, David Hargreaves * Image Creators: Members of the Protein Crystallogrpahy team at AstraZeneca, Darwin Building, Cambridge. * Date of Issue: 2022-10-30 * Publisher: bioRxiv * Note: This publication is currently undergoing review to be published in PLOS ONE Contact Information ------------------- * Name: Jamie Milne * Affiliations: Department of Mathematics, University of York/AstraZeneca, Darwin Building, Cambridge Science Park, Cambridge, * ORCID ID: https://orcid.org/0000-0002-2726-874X * Email: jm2163@york.ac.uk * Alternate Email: jamie.milne@astrazeneca.com * Address: e-mail preferred * Alternative Contact: Superviser/Professor at University of York * Name: Julie Wilson * Affiliations: Department of Mathematics, University of York * ORCID ID: https://orcid.org/0000-0002-5171-8480 * Email: julie.wilson@york.ac.uk * Address: e-mail preferred * Contributor ORCID IDs: * Jamie Milne: https://orcid.org/0000-0002-2726-874X * Julie Wilson: https://orcid.org/0000-0002-5171-8480 * David Hargreaves: https://orcid.org/0000-0003-4441-0039 * Chen Qian: https://orcid.org/0000-0003-2663-7694 * Yinhai Wang: https://orcid.org/0000-0002-3671-4932 - - - Additional Dataset Metadata =========================== Acknowledgements ---------------- * Funding sources: Engineering and Physical Sciences Research Council Awarded to JM, Award: EP/V519807/1. DH, CQ and YW are all employed by AstraZeneca. Dates and Locations ------------------- * Dates of data collection: Images of experiments were captured between 2015 and 2022 * Geographic locations of data collection: Experiments were conducted using a Formulatrix Rock Imager at the Darwin Building, Cambridge, UK - - - Methodological Information ========================== * Methods of data collection/generation: All data was collected using a Formulatrix Rock Imager and stored on a server. - - - Data and File Overview ====================== Summary Metrics --------------- * File count: 11 * Total file size: 12.727 GB * Range of individual file sizes: 193.02 MB - 2.46 GB * File formats: .zip (Containing .PNG) Naming Conventions ------------------ * File naming scheme: files with the "Test" prefix denote files that were used when testing models whereas files with prefix "Training" denote images that were used when training the models. Table of Contents ----------------- * Test1.zip * Test2.zip * Test3.zip * TrainingClear.zip * TrainingNull.zip * TrainingHeavyPrecipitate.zip * TrainingLightPrecipitate.zip * TrainingCrystalline.zip * TrainingPhaseSeparation.zip * TrainingOptmisable.zip * TrainingShootable.zip Setup ----- * Unpacking instructions: Right click and "extract all" * Relationships between files/folders: All zip files with prefix "Training" were used for training and those with prefix "Test" were used to test the generated models * Recommended software/tools: n/a - - - File/Folder Details =================== Details for: Test1.zip --------------------------------------- * Description: a zipped folder containing subfolders labelled from 1-8, corresponding to classes as in https://doi.org/10.1101/2022.09.28.509868. Within each subfolder are images of size 800pixels x 800pixels in .PNG format. These images are very similar to images used in training. * Format(s): .zip * Size(s): 935.46 MB * Subfolder(s): 8 Details for: Test2.zip --------------------------------------- * Description: a zipped folder containing subfolders labelled from 1-8, corresponding to classes as in https://doi.org/10.1101/2022.09.28.509868. Within each subfolder are images of size 800pixels x 800pixels in .PNG format. These images are loosely related to images used in training. * Format(s): .zip * Size(s): 928.39 MB * Subfolder(s): 8 Details for: Test3.zip --------------------------------------- * Description: a zipped folder containing subfolders labelled from 1-8, corresponding to classes as in https://doi.org/10.1101/2022.09.28.509868. Within each subfolder are images of size 800pixels x 800pixels in .PNG format. These images are not related to images used in training. * Format(s): .zip * Size(s): 1.79 GB * Subfolder(s): 8 Details for: TrainingNull.zip --------------------------------------- * Description: a zipped folder containing images of size 800pixels x 800pixels that were associated with the class "Null" in training. * Format(s): .zip * Size(s): 193.02 MB * Subfolder(s): n/a Details for: TrainingClear.zip --------------------------------------- * Description: a zipped folder containing images of size 800pixels x 800pixels that were associated with the class "Clear" in training. * Format(s): .zip * Size(s): 2.19 GB * Subfolder(s): n/a Details for: TrainingHeavyPrecipitate.zip --------------------------------------- * Description: a zipped folder containing images of size 800pixels x 800pixels that were associated with the class "Heavy Precipitate" in training. * Format(s): .zip * Size(s): 1.64 GB * Subfolder(s): n/a Details for: TrainingLightPrecipitate.zip --------------------------------------- * Description: a zipped folder containing images of size 800pixels x 800pixels that were associated with the class "Light Precipitate" in training. * Format(s): .zip * Size(s): 2.46 GB * Subfolder(s): n/a Details for: TrainingPhaseSeparation.zip --------------------------------------- * Description: a zipped folder containing images of size 800pixels x 800pixels that were associated with the class "Phase Separation" in training. * Format(s): .zip * Size(s): 1.03 GB * Subfolder(s): n/a Details for: TrainingCrystalline.zip --------------------------------------- * Description: a zipped folder containing images of size 800pixels x 800pixels that were associated with the class "Crystalline" in training. * Format(s): .zip * Size(s): 1.45 GB * Subfolder(s): n/a Details for: TrainingOptimisable.zip --------------------------------------- * Description: a zipped folder containing images of size 800pixels x 800pixels that were associated with the class "Optimisable" in training. * Format(s): .zip * Size(s): 726.60 MB * Subfolder(s): n/a Details for: TrainingShootable.zip --------------------------------------- * Description: a zipped folder containing images of size 800pixels x 800pixels that were associated with the class "Shootable" in training. * Format(s): .zip * Size(s): 609.94 MB * Subfolder(s): n/a - - - END OF README