Berkeley Single-Cell Computational Microscopy (BSCCM) dataset
Data files
Feb 08, 2024 version files 244.67 GB
-
BSCCM-coherent-tiny.tar.gz_chunk00000.bin
-
BSCCM-coherent.tar.gz_chunk00000.bin
-
BSCCM-coherent.tar.gz_chunk00001.bin
-
BSCCM-coherent.tar.gz_chunk00002.bin
-
BSCCM-coherent.tar.gz_chunk00003.bin
-
BSCCM-coherent.tar.gz_chunk00004.bin
-
BSCCM-coherent.tar.gz_chunk00005.bin
-
BSCCM-coherent.tar.gz_chunk00006.bin
-
BSCCM-coherent.tar.gz_chunk00007.bin
-
BSCCM-coherent.tar.gz_chunk00008.bin
-
BSCCM-coherent.tar.gz_chunk00009.bin
-
BSCCM-coherent.tar.gz_chunk00010.bin
-
BSCCM-coherent.tar.gz_chunk00011.bin
-
BSCCM-coherent.tar.gz_chunk00012.bin
-
BSCCM-coherent.tar.gz_chunk00013.bin
-
BSCCM-coherent.tar.gz_chunk00014.bin
-
BSCCM-coherent.tar.gz_chunk00015.bin
-
BSCCM-coherent.tar.gz_chunk00016.bin
-
BSCCM-coherent.tar.gz_chunk00017.bin
-
BSCCM-coherent.tar.gz_chunk00018.bin
-
BSCCM-coherent.tar.gz_chunk00019.bin
-
BSCCM-coherent.tar.gz_chunk00020.bin
-
BSCCM-coherent.tar.gz_chunk00021.bin
-
BSCCM-coherent.tar.gz_chunk00022.bin
-
BSCCM-coherent.tar.gz_chunk00023.bin
-
BSCCM-coherent.tar.gz_chunk00024.bin
-
BSCCM-tiny.tar.gz_chunk00000.bin
-
BSCCM.tar.gz_chunk00000.bin
-
BSCCM.tar.gz_chunk00001.bin
-
BSCCM.tar.gz_chunk00002.bin
-
BSCCM.tar.gz_chunk00003.bin
-
BSCCM.tar.gz_chunk00004.bin
-
BSCCM.tar.gz_chunk00005.bin
-
BSCCM.tar.gz_chunk00006.bin
-
BSCCM.tar.gz_chunk00007.bin
-
BSCCM.tar.gz_chunk00008.bin
-
BSCCM.tar.gz_chunk00009.bin
-
BSCCM.tar.gz_chunk00010.bin
-
BSCCM.tar.gz_chunk00011.bin
-
BSCCM.tar.gz_chunk00012.bin
-
BSCCM.tar.gz_chunk00013.bin
-
BSCCM.tar.gz_chunk00014.bin
-
BSCCM.tar.gz_chunk00015.bin
-
BSCCM.tar.gz_chunk00016.bin
-
BSCCM.tar.gz_chunk00017.bin
-
BSCCM.tar.gz_chunk00018.bin
-
BSCCM.tar.gz_chunk00019.bin
-
BSCCM.tar.gz_chunk00020.bin
-
BSCCM.tar.gz_chunk00021.bin
-
BSCCM.tar.gz_chunk00022.bin
-
BSCCM.tar.gz_chunk00023.bin
-
BSCCM.tar.gz_chunk00024.bin
-
BSCCM.tar.gz_chunk00025.bin
-
BSCCM.tar.gz_chunk00026.bin
-
BSCCM.tar.gz_chunk00027.bin
-
BSCCM.tar.gz_chunk00028.bin
-
BSCCM.tar.gz_chunk00029.bin
-
BSCCM.tar.gz_chunk00030.bin
-
BSCCM.tar.gz_chunk00031.bin
-
BSCCM.tar.gz_chunk00032.bin
-
BSCCM.tar.gz_chunk00033.bin
-
BSCCM.tar.gz_chunk00034.bin
-
BSCCM.tar.gz_chunk00035.bin
-
BSCCM.tar.gz_chunk00036.bin
-
BSCCM.tar.gz_chunk00037.bin
-
BSCCM.tar.gz_chunk00038.bin
-
BSCCM.tar.gz_chunk00039.bin
-
BSCCM.tar.gz_chunk00040.bin
-
BSCCM.tar.gz_chunk00041.bin
-
BSCCM.tar.gz_chunk00042.bin
-
BSCCM.tar.gz_chunk00043.bin
-
BSCCM.tar.gz_chunk00044.bin
-
BSCCM.tar.gz_chunk00045.bin
-
BSCCM.tar.gz_chunk00046.bin
-
BSCCM.tar.gz_chunk00047.bin
-
BSCCM.tar.gz_chunk00048.bin
-
BSCCM.tar.gz_chunk00049.bin
-
BSCCM.tar.gz_chunk00050.bin
-
BSCCM.tar.gz_chunk00051.bin
-
BSCCM.tar.gz_chunk00052.bin
-
BSCCM.tar.gz_chunk00053.bin
-
BSCCM.tar.gz_chunk00054.bin
-
BSCCM.tar.gz_chunk00055.bin
-
BSCCM.tar.gz_chunk00056.bin
-
BSCCM.tar.gz_chunk00057.bin
-
BSCCM.tar.gz_chunk00058.bin
-
BSCCM.tar.gz_chunk00059.bin
-
BSCCM.tar.gz_chunk00060.bin
-
BSCCM.tar.gz_chunk00061.bin
-
BSCCM.tar.gz_chunk00062.bin
-
BSCCM.tar.gz_chunk00063.bin
-
BSCCM.tar.gz_chunk00064.bin
-
BSCCM.tar.gz_chunk00065.bin
-
BSCCM.tar.gz_chunk00066.bin
-
BSCCM.tar.gz_chunk00067.bin
-
BSCCM.tar.gz_chunk00068.bin
-
BSCCM.tar.gz_chunk00069.bin
-
BSCCM.tar.gz_chunk00070.bin
-
BSCCM.tar.gz_chunk00071.bin
-
BSCCM.tar.gz_chunk00072.bin
-
BSCCM.tar.gz_chunk00073.bin
-
BSCCM.tar.gz_chunk00074.bin
-
BSCCM.tar.gz_chunk00075.bin
-
BSCCM.tar.gz_chunk00076.bin
-
BSCCM.tar.gz_chunk00077.bin
-
BSCCM.tar.gz_chunk00078.bin
-
BSCCM.tar.gz_chunk00079.bin
-
BSCCM.tar.gz_chunk00080.bin
-
BSCCM.tar.gz_chunk00081.bin
-
BSCCM.tar.gz_chunk00082.bin
-
BSCCM.tar.gz_chunk00083.bin
-
BSCCM.tar.gz_chunk00084.bin
-
BSCCM.tar.gz_chunk00085.bin
-
BSCCM.tar.gz_chunk00086.bin
-
BSCCM.tar.gz_chunk00087.bin
-
BSCCM.tar.gz_chunk00088.bin
-
BSCCM.tar.gz_chunk00089.bin
-
BSCCM.tar.gz_chunk00090.bin
-
BSCCM.tar.gz_chunk00091.bin
-
BSCCM.tar.gz_chunk00092.bin
-
BSCCM.tar.gz_chunk00093.bin
-
BSCCM.tar.gz_chunk00094.bin
-
BSCCM.tar.gz_chunk00095.bin
-
BSCCM.tar.gz_chunk00096.bin
-
BSCCM.tar.gz_chunk00097.bin
-
BSCCM.tar.gz_chunk00098.bin
-
BSCCM.tar.gz_chunk00099.bin
-
BSCCM.tar.gz_chunk00100.bin
-
BSCCM.tar.gz_chunk00101.bin
-
BSCCM.tar.gz_chunk00102.bin
-
BSCCM.tar.gz_chunk00103.bin
-
BSCCM.tar.gz_chunk00104.bin
-
BSCCM.tar.gz_chunk00105.bin
-
BSCCM.tar.gz_chunk00106.bin
-
BSCCM.tar.gz_chunk00107.bin
-
BSCCM.tar.gz_chunk00108.bin
-
BSCCM.tar.gz_chunk00109.bin
-
BSCCM.tar.gz_chunk00110.bin
-
BSCCM.tar.gz_chunk00111.bin
-
BSCCM.tar.gz_chunk00112.bin
-
BSCCM.tar.gz_chunk00113.bin
-
BSCCM.tar.gz_chunk00114.bin
-
BSCCM.tar.gz_chunk00115.bin
-
BSCCM.tar.gz_chunk00116.bin
-
BSCCM.tar.gz_chunk00117.bin
-
BSCCM.tar.gz_chunk00118.bin
-
BSCCM.tar.gz_chunk00119.bin
-
BSCCM.tar.gz_chunk00120.bin
-
BSCCM.tar.gz_chunk00121.bin
-
BSCCM.tar.gz_chunk00122.bin
-
BSCCM.tar.gz_chunk00123.bin
-
BSCCM.tar.gz_chunk00124.bin
-
BSCCM.tar.gz_chunk00125.bin
-
BSCCM.tar.gz_chunk00126.bin
-
BSCCM.tar.gz_chunk00127.bin
-
BSCCM.tar.gz_chunk00128.bin
-
BSCCM.tar.gz_chunk00129.bin
-
BSCCM.tar.gz_chunk00130.bin
-
BSCCM.tar.gz_chunk00131.bin
-
BSCCM.tar.gz_chunk00132.bin
-
BSCCM.tar.gz_chunk00133.bin
-
BSCCM.tar.gz_chunk00134.bin
-
BSCCM.tar.gz_chunk00135.bin
-
BSCCM.tar.gz_chunk00136.bin
-
BSCCM.tar.gz_chunk00137.bin
-
BSCCM.tar.gz_chunk00138.bin
-
BSCCM.tar.gz_chunk00139.bin
-
BSCCM.tar.gz_chunk00140.bin
-
BSCCM.tar.gz_chunk00141.bin
-
BSCCM.tar.gz_chunk00142.bin
-
BSCCM.tar.gz_chunk00143.bin
-
BSCCM.tar.gz_chunk00144.bin
-
BSCCM.tar.gz_chunk00145.bin
-
BSCCM.tar.gz_chunk00146.bin
-
BSCCM.tar.gz_chunk00147.bin
-
BSCCM.tar.gz_chunk00148.bin
-
BSCCM.tar.gz_chunk00149.bin
-
BSCCM.tar.gz_chunk00150.bin
-
BSCCM.tar.gz_chunk00151.bin
-
BSCCM.tar.gz_chunk00152.bin
-
BSCCM.tar.gz_chunk00153.bin
-
BSCCM.tar.gz_chunk00154.bin
-
BSCCM.tar.gz_chunk00155.bin
-
BSCCM.tar.gz_chunk00156.bin
-
BSCCM.tar.gz_chunk00157.bin
-
BSCCM.tar.gz_chunk00158.bin
-
BSCCM.tar.gz_chunk00159.bin
-
BSCCM.tar.gz_chunk00160.bin
-
BSCCM.tar.gz_chunk00161.bin
-
BSCCM.tar.gz_chunk00162.bin
-
BSCCM.tar.gz_chunk00163.bin
-
BSCCM.tar.gz_chunk00164.bin
-
BSCCM.tar.gz_chunk00165.bin
-
BSCCM.tar.gz_chunk00166.bin
-
BSCCM.tar.gz_chunk00167.bin
-
BSCCM.tar.gz_chunk00168.bin
-
BSCCM.tar.gz_chunk00169.bin
-
BSCCM.tar.gz_chunk00170.bin
-
BSCCM.tar.gz_chunk00171.bin
-
BSCCM.tar.gz_chunk00172.bin
-
BSCCM.tar.gz_chunk00173.bin
-
BSCCM.tar.gz_chunk00174.bin
-
BSCCM.tar.gz_chunk00175.bin
-
BSCCM.tar.gz_chunk00176.bin
-
BSCCM.tar.gz_chunk00177.bin
-
BSCCM.tar.gz_chunk00178.bin
-
BSCCM.tar.gz_chunk00179.bin
-
BSCCM.tar.gz_chunk00180.bin
-
BSCCM.tar.gz_chunk00181.bin
-
BSCCM.tar.gz_chunk00182.bin
-
BSCCM.tar.gz_chunk00183.bin
-
BSCCM.tar.gz_chunk00184.bin
-
BSCCM.tar.gz_chunk00185.bin
-
BSCCM.tar.gz_chunk00186.bin
-
BSCCM.tar.gz_chunk00187.bin
-
BSCCM.tar.gz_chunk00188.bin
-
BSCCM.tar.gz_chunk00189.bin
-
BSCCM.tar.gz_chunk00190.bin
-
BSCCM.tar.gz_chunk00191.bin
-
BSCCM.tar.gz_chunk00192.bin
-
BSCCM.tar.gz_chunk00193.bin
-
BSCCM.tar.gz_chunk00194.bin
-
BSCCM.tar.gz_chunk00195.bin
-
BSCCM.tar.gz_chunk00196.bin
-
BSCCMNIST-tiny.tar.gz_chunk00000.bin
-
BSCCMNIST.tar.gz_chunk00000.bin
-
BSCCMNIST.tar.gz_chunk00001.bin
-
BSCCMNIST.tar.gz_chunk00002.bin
-
BSCCMNIST.tar.gz_chunk00003.bin
-
BSCCMNIST.tar.gz_chunk00004.bin
-
BSCCMNIST.tar.gz_chunk00005.bin
-
README.md
Abstract
Computational microscopy, in which hardware and algorithms of an imaging system are jointly designed, shows promise for making imaging systems that cost less, perform more robustly, and collect new types of information. Often, the performance of computational imaging systems, especially those that incorporate machine learning, is sample-dependent. Thus, standardized datasets are an essential tool for comparing the performance of different approaches. Here, we introduce the Berkeley Single Cell Computational Microscopy (BSCCM) dataset, which contains over 400,000 images of individual white blood cells. The dataset contains images captured with multiple illumination patterns on an LED array microscope and fluorescent measurements of the abundance of surface proteins that mark different cell types. We hope this dataset will provide a valuable resource for the development and testing of new algorithms in computational microscopy and computer vision with practical biomedical applications.
README: Berkeley Single-Cell Computational Microscopy (BSCCM) Dataset
https://doi.org/10.5061/dryad.sxksn038s
This dataset contains the raw data for the Berkeley Single Cell Computational Microscopy Dataset. The data is compressed and chunked to facilitate downloading. The easiest way to download and use it is through the bsccm
python package. The code for this package can be found at https://github.com/Waller-Lab/BSCCM/blob/main/Getting_started.ipynb and is archived with a DOI at https://zenodo.org/doi/10.5281/zenodo.10392182
Loading data&
The Getting Started jupyter notebook shows the full documentation for how to use this dataset, including installation, downloading, image/metadata querying, and more. Here we reproduce the first few steps of the notebook:
First, install the
bsccm
python package withpip install bsccm
Then download the data:
from bsccm import download_dataset
dataset_path = download_dataset('/path/to/download', tiny=True)
print('Downloaded dataset to ' + dataset_path)
- Then open an image
dataset = BSCCM
(dataset_path)
valid_indices = dataset.get_indices()
an_index = valid_indices[0]
image = dataset.read_image(an_index, channel='DPC_Left')
Data organization
After downloading and decompressing the data as described in the Getting_Started notebook, the dataset will be organized as follows. This section describes how the data is organized: which files contain which data, what metadata is available, etc. We note that a full understanding of these details is not necessary for using the dataset, as the Python package we provide abstracts away many of these implementation details.
File structure and organization
All image data are stored in Zarr datasets using Blosc/zstd compression. Tabular metadata (i.e., per-cell metadata) are stored in .csv files. Global metadata, which contains information that is not specific to individual cells, but rather pertains to the whole dataset is stored in text files in Javascript Object Notation (JSON) format.
Each top-level BSCCM (regular, coherent, tiny, or coherent-tiny) contains (up to) 5 items:
- BSCCM_images.zarr: A Zarr dataset containing all the images of cells
- BSCCM_backgrounds.zarr: A Zarr dataset containing the background intensity over the full field of view, for each LED array illumination pattern.
- BSCCM_global_metadata.json: A text file containing metadata about the full dataset (pixel size, wavelength, channel names, etc.) in JSON format
- BSCCM_index.csv: A comma-separated value (CSV) file containing metadata specific to each cell in the dataset
- BSCCM_surface_markers.csv: A comma-separated value (CSV) file containing information about the surface protein marker levels of each cell, along with many measurements derived from the fluorescence images and intermediate values used in computing these levels
BSCCM_images.zarr
Zarr datasets contain a hierarchy of directories. For the BSCCM_images.zarr file, this has the following structure:
+-- antibodies_CD16
| +-- batch_0
| | +-- slide_replicate_0
| | +-- dpc
| | +-- cell_0
| | +-- cell_1
| | ...
| | +-- fluor
| | ...
| | +-- led_array
| | ...
| | +-- histology
| ...
+-- antibodies_CD45
The outermost directory contains "antibodies_" followed by the name of the antibody used to stain the cells, or "unstained"/"all" for the no-antibody and all-antibody conditions, respectively.
BSCCM_backgrounds.zarr
This file contains the background images for each channel across the full field of view (2056x2056 pixels). The top-level directory contains the channel name. The structure is as shown below.
+-- Brightfield
| +-- 5\_percentile
| +-- 10\_percentile
| +-- 20\_percentile
| +-- 40\_percentile
| +-- 50\_percentile
+-- DF\_50
BSCCM_global_metadata.json
This file contains metadata specific to the full dataset, including names of channels, collection settings like exposure, and useful information for calibration like wavelength, objective NA, etc. It is a text file with JSON structure.
{
"led_array":
{
"image_shape": [128, 128],
"channel_names": ["Brightfield", ... "LED119"],
"channel_indices": {"Brightfield": 0, ... "LED119": 200},
"exposure_ms": {"Brightfield": 8, ... "LED119": 200},
"camera": {
"offset": 30,
"gain_db": 4,
"quantum_efficiency": 0.68
},
"wavelength_nm": 515,
"pixel_size_um": 0.166,
"objective": {"NA": 0.5, "magnification": 20}},
"fluorescence":
{
"image_shape": ...
...
BSCCM_index.csv
This contains per-cell metadata in a single, large CSV file with one row per each cell. It has the following columns:
"global_index": An integer uniquely identifying the cell
"position_in_fov_y_pix"/"position_in_fov_x_pix":Location of the cell center within the image field of view
"detection_radius": The radius reported by the blob finding algorithm that initially located the cell, which gives a rough estimate of its size
"has_matched_histology_cell": Whether or not the cell has a matching cell in histology contrast
"fov_center_x"/"fov_center_y"/"fov_center_z": the microscope stage coordinates of the field of view from which the cell was drawn
"batch": the index of the cell isolation experiment the cells were drawn from (either 0 or 1)
"antibodies": the name or the single antibody used to stain the cells, or 'all' or 'unstained' if every antibody or no antibodies were used
"imaging_date": the date the slide of cells was imaged
"data_path": the path to the image data with the BSCCM\_images.zarr file
"slide_replicate": the index of the slide replicate within the same antibody/batch conditions (either 0 or 1)
BSCCM_surface_markers.csv
This contains per-cell calculations about fluorescence surface marker levels. It is entirely derived from the fluorescence imaging data. It contains metadata in a single, large CSV file with one row per each cell.
"global_index": An integer uniquely identifying the cell
// raw measurements derived from fluorescence images
"Fluor_426-446_total_raw" //Raw foreground fluorescence in 426-446nm channel
"Fluor_500-550_total_raw" //Raw background fluorescence in 500-550nm channel
...
"Fluor_426-446_background" //Raw background fluorescence in 426-446nm channel
...
(Many more intermediate calculations)
...
// protein levels estimates from unmixing procedure
"CD45_single_antibody_model_unmixed" // CD45 protein levels using 2 spectrum unmixing model
"CD123_single_antibody_model_unmixed"
...
"CD123/HLA-DR/CD14_full_model_unmixed" // Combined CD123/HLA-DR/CD14 protein levels using 4 spectrum unmixng model
...