Causal identification of single-cell experimental perturbation effects with CINEMA-OT
Data files
Jul 24, 2023 version files 4.12 GB
-
041719_CSE_filtered_feature_bc_matrix.zip
71.70 MB
-
041719_MOCK_filtered_feature_bc_matrix.zip
64.78 MB
-
041719_RV_filtered_feature_bc_matrix.zip
67.40 MB
-
041719_RVCSE_filtered_feature_bc_matrix.zip
85.89 MB
-
H1D2.zip
25.43 MB
-
H1D7.zip
163.77 MB
-
H2D2.zip
246.12 MB
-
H2D7.zip
294.87 MB
-
H3D2.zip
287.57 MB
-
H3D7.zip
232.65 MB
-
Integrated_.h5ad
1.54 GB
-
README.md
565 B
-
rvcse_221021.h5ad
1.04 GB
Sep 22, 2023 version files 5.28 GB
Abstract
Recent advancements in single-cell technologies allow characterization of experimental perturbations at single-cell resolution. While methods have been developed to analyze such experiments, the application of a strict causal framework has not yet been explored for the inference of treatment effects at the single-cell level. In this work, we present a causal inference-based approach to single-cell perturbation analysis, termed CINEMA-OT (Causal INdependent Effect Module Attribution + Optimal Transport). CINEMA-OT separates confounding sources of variation from perturbation effects to obtain an optimal transport matching that reflects counterfactual cell pairs. These cell pairs represent causal perturbation responses permitting a number of novel analyses, such as individual treatment effect analysis, response clustering, attribution analysis, and synergy analysis. We benchmark CINEMA-OT on an array of treatment effect estimation tasks for several simulated and real datasets and show that it outperforms other single-cell perturbation analysis methods. Finally, we perform CINEMA-OT analysis of two newly-generated datasets: (1) rhinovirus and cigarette smoke-exposed airway organoids, and (2) combinatorial cytokine stimulation of immune cells. In these experiments, CINEMA-OT reveals potential mechanisms by which cigarette smoke exposure dulls the airway antiviral response, as well as the logic that governs chemokine secretion and peripheral immune cell recruitment.
README: Dataset Readme
The Rhinovirus infection dataset
The raw count matrix files are "041719_RVCSE_filtered_feature_bc_matrix.zip", "041719_RV_filtered_feature_bc_matrix.zip", "041719_MOCK_filtered_feature_bc_matrix.zip", and "041719_CSE_filtered_feature_bc_matrix.zip". The preprocessed Scanpy object is provided as "rvcse_221021.h5ad".
The combinatorial interferon stimulation dataset
The raw count matrix files are "H1D2.zip", "H1D7.zip", "H2D2.zip", "H2D7.zip", "H3D2.zip", and "H3D7.zip". The integrated raw Scanpy object is provided as "Integrated_raw.h5ad". The preprocessed Scanpy object is provided as "Integrated_.h5ad".
Methods
The rhinovirus infection data:
Primary human bronchial epithelial cells from healthy adult donors were obtained from a commercial vendor (Lonza) and cultured at air-liquid interface according to the manufacturer's instructions (Stem Cell Technologies) using reduced hydrocortisone. Cells were kept at air-liquid interface for 4 weeks before experiment; maturation of beating cilia and mucus production was confirmed using light microscope. Cells were then infected with mock or 105 PFU human rhinovirus 1A per organoid, with or without exposure to 2% cigarette smoke extract (CSE). Single cell suspension is collected by trypsin digestion at 5 days post-infection and submitted to single cell RNA sequencing using The 10X Genomics single-cell 3′ protocol.
The combinatorial interferon stimulation data:
The study was approved by Institutional Review Boards at Yale University (following Yale melanoma skin SPORE IRB protocol). Healthy donors consented to donation of peripheral blood for research use. Human PBMC were isolated using Lymphoprep density gradient medium (STEMCELL). PBMC were plated at 1 million cells per ml and stimulated with 1000U/ml human IFNα2 (R&D systems), 1000U/ml human IFNβ (pbl assay science 11415), 1000U/ml human IFNγ (pbl assay science), 1ug/ml human IFN-III /IL-29 (R&D systems), 100ng/ml human IL-6 (NCI Biological Resources Branch Preclinical Biologics Repository), 20ng/ml human TNFα (R&D systems), and combinatorial cytokines IFNβ + IL-6, IFNβ + TNFα, IFNβ+ IFNγ at indicated concentrations above for up to 48 hours. Single cell RNA sequencing libraries were sequenced on Illumina NovaSeq at read length of 150bp pair-end and depth of 300 million reads per sample.
Usage notes
Both the 10x format count matrix files and the preprocessed Scanpy objects are provided for both datasets.