An imaging flow cytometry dataset for profiling the immunological synapse of therapeutic antibodies
Data files
Abstract
Therapeutic antibodies are widely used to treat severe diseases. Most of them alter immune cells and act within the immunological synapse, an essential cell-to-cell interaction to direct the humoral immune response. Although many antibody designs are generated and evaluated, a high-throughput tool for systematic antibody characterization and function prediction is lacking. Here, we generate the largest publicly available imaging flow cytometry (IFC) data set of the human immunological synapse containing over 2.8 million images. This dataset is used to analyze class frequency and morphological changes under different immune stimulation.
In addition to the dataset, we introduce the first comprehensive open-source framework, scifAI (single-cell imaging flow cytometry AI, https://github.com/marrlab/scifAI), for preprocessing, feature engineering, and explainable, predictive machine learning IFC data. Using scifAI, we analyze class frequency- and morphological changes under different immune stimulation. scifAI is universally applicable to IFC data and, given its modular architecture, straightforward to incorporate into existing workflows and analysis pipelines, e.g., for rapid antibody screening and functional characterization.
Cell line culture
EBV-transformed B-lymphoblastoid cell line (B-LCL) from donor 333 was obtained from Astarte Biologics (# 1038-3161JN16) and cells were cultivated in RPMI-1640 medium (PAN-Biotech; cat # P04-17500) with 10% FBS (Anprotec; cat # AC-SM-0014Hi) and 2 mM L-glutamine (PAN-Biotech; cat# P04-80100).
Immune synapse formation and imaging flow cytometry
To analyze immune synapses, human memory CD4+ T cells were isolated from PBMCs of nine healthy human donors using a negative selection EasySep Enrichment kit from STEMCELL Technologies (cat #19157). Live/dead staining of T and B-LCL cells was separately performed using the fixable viability dye eF780 for 15 min at RT (eBioscience; cat # 65-0865-14). Cells were then re-suspended in RPMI-1640 medium supplemented with 10% FBS (Anprotec; cat # AC-SM-0014Hi), 5% Penicillin-Streptomycin (Gibco; cat # 15140-122) and 2 mM L-glutamine (PAN-Biotech; cat # P04-80100). Afterward, B-LCL cells were transferred into a well of a 96-well round bottom plate (300.000 cells per well) and were pre-incubated with the superantigen Staphylococcal enterotoxin A (SEA) (Sigma-Aldrich; cat # S9399) for 15 min at 37°C or left untreated. Human CD4+ Tmem were added to the afore-prepared B-LCL cells (250.000 cells per well) to generate a final ratio of 4:3 (B-LCL:Tmem) and subsequently the appropriate in-house made compounds (10 µg/mL of Isotype Ctrl or Teplizumab and 1 µg/mL (5 nM) of Ctrl-TCB, CD19-TCB or CD20-TCB) were added to the B-LCL-Tmem cell co-culture. To strengthen the conjugate formation between B-LCL and T cells they were centrifuged at 300xg for 30 sec and then directly transferred to a 37°C incubator for 45 min. Thereafter, the medium in each well was carefully aspirated with a pipette and cells were immediately fixed for 12 min at RT followed by permeabilization using the Foxp3/Transcription factor staining buffer set from eBioscience (cat # 00-5523-00). Intracellular staining was performed in permeabilization buffer containing fluorescently-labeled antibodies for 40 min at 4°C: CD3-BV421 (clone UCHT1, Biolegend; cat # 300433), HLA-DR-PE-Cy7 (clone L243, Biolegend; cat # 307616), Phalloidin AF594 (ThermoFisher; cat # A12381) and P-CD3ζ Y142-AF647 (K25-407.69, BD cat # 558489). After washing, cells were suspended in FACS buffer (PBS supplemented with 2% FBS) and acquired on an Amnis ImageStreamX Mark II Imaging Flow Cytometer (Luminex) equipped with five lasers (405, 488, 561, 592 and 640 nm). On average, around 55,000 images were collected per sample at 60x magnification on a low-speed setting. IDEAS software (version 6.2.187.0, EMD Millipore) was used for data analysis and labeling of cells. To identify immune synapses using the IDEAS software the gating strategy in Supplementary Fig 1a was implemented. Cells were first gated on in-focus live+ CD3+ MHCII+ cells. Within this population images that show single CD3+ T cells and single MHCII+ B-LCL cells were selected using the area and aspect ratio feature. Next, to exclude non-interacting cells the CD3 intensity within a self-created synapse mask was determined. The synapse mask was defined as a combination of the morphology CD3 and MHCII mask with a dilation of 3. Only synapses that showed a CD3 signal in the mask were gated. Finally, T+B-LCL cells in one layer were excluded by using the height and area feature of the brightfield (BF) and single T-B-LCL synapses were analyzed.
Each image is saved as a `.h5` file. It includes these keys: `image`, `mask`. In case there is a label available, the `label` is provided as `str`. The image is a 16-bit image with one bright field channel and multiple fluorescent channels. The mask is the corresponding segmentation for each channel.
Additionally, each file name includes a number, which is the object number from the IFC experiments. Apart from each file, the data comes from four experiments, nine donors, and these conditions: -SEA, +SEA, Teplizumab, CD19-TCB, and CD20-TCB.
```bash
data_path/Experiment_1/Donor_1/-SEA/*.h5
data_path/Experiment_1/Donor_1/-SEA /*.h5
.
.
.
data_path/Experiment_1/Donor_2/-SEA /*.h5
data_path/Experiment_1/Donor_2/-SEA /*.h5
.
.
.
data_path/Experiment_4/Donor_M/CD20-TCB./*.h5
```
For the feature extraction, you first need to calculate the `metadata` data frame by providing the correct data path.
```python
import scifAI
data_path = <PATH TO THE DATA FOLDER>
metadata = scifAI.metadata_generator(data_path)
```
After that, you need to define the feature union from `sklearn` based on the desired features. For example:
```python
from sklearn.pipeline import FeatureUnion
from scifAI.ml import features
feature_union = FeatureUnion([
("MaskBasedFeatures", features.MaskBasedFeatures()),
("GLCMFeatures", features.GLCMFeatures()),
("GradientRMS", features.GradientRMS()),
("BackgroundMean", features.BackgroundMean()),
("PercentileFeatures", features.PercentileFeatures()),
("CellShape", features.CellShape()),
("Collocalization", features.Collocalization()),
("IntersectionProperties", features.IntersectionProperties()),
("CenterOfCellsDistances", features.CenterOfCellsDistances())
]
)
```
Finally, you can pass the feature union to the `FeatureExtractor` as a `sklearn` pipeline:
```python
from sklearn.pipeline import Pipeline
from scifAI.ml import FeatureExtractor
pipeline = Pipeline([("features", feature_union)])
feature_extractor = FeatureExtractor(pipeline)
list_of_features = feature_extractor.extract_features(metadata)
```
The output of `extract_features` would be a list, where each element is a dictionary of features for every row in the `metadata`. Finally, you can transform the `list_of_features` to a DataFrame by simply running:
```python
df_features = pd.DataFrame(list_of_features)
```
Where every row in the `df_features` contains the corresponding features from the same row in `metadata`.
Considering that there are many features, we suggest reducing the features with no variance. In addition, imputing with `0.` is the best option as it follows the biological assumptions for the feature extraction process.
```python
df_features = df_features.fillna(0.)
df_features = df_features.loc[:, df_features.std() > 0.]
```
For different examples, you can follow our examples in the [docs](docs) folder.