Long-lasting, subtype-specific regulation of somatostatin interneurons during sensory learning
Data files
Jul 25, 2025 version files 10.93 GB
-
Github_repository_clone.zip
630.55 MB
-
README.md
13.24 KB
-
SST_Manuscript_Data.zip
10.30 GB
Abstract
Somatostatin (SST)-expressing inhibitory neurons are a major class of neocortical gamma-amino butyric acid (GABA) neurons, where morphological, electrophysiological, and transcriptomic analyses indicate more than a dozen different subtypes. However, whether this diversity is related to specific roles in cortical computations and plasticity remains unclear. Here we identify learning-dependent, subtype-specific plasticity in layer 2/3 somatostatin neurons of the mouse somatosensory cortex. Martinotti-type, somatostatin neurons expressing calbindin-2 show a selective decrease in excitatory synaptic input and stimulus-evoked calcium responses as mice learn a stimulus-reward association. Using these insights, we develop a label-free classifier using basal activity from in vivo imaging that accurately predicts learning-associated response plasticity. Our data indicate that molecularly-defined SST neuron subtypes play specific and highly-regulated roles in sensory information processing and learning.
Description of the data and file structure
Calcium imaging data were collected through 2-photon imaging and were processed through Suite2p as described in the Methods section. The files associated with each animal contain raw extracted fluorescence values and trial information across various training conditions
Fixed tissue data was collected through confocal fluorescence imaging and was processed through Imaris 8.4.1 to attain volume measurements
Files and variables
File: SST_Manuscript_Data.zip
Description: This folder contains calcium imaging data and fluorescence confocal imaging data used for analysis
Calcium Imaging
Ai148_ACC: Genetically encoded calcium indicator in somatostatin neurons during Acclimation
Ai148_PSE: Genetically encoded calcium indicator in somatostatin neurons during Pseudotraining
Ai148_SAT: Genetically encoded calcium indicator in somatostatin neurons during Sensory Association Training
Calb2_SAT: Virally transduced calcium indicator in somatostatin neurons, differentiated by calretinin x somatostatin intersectional labeling
Calb2_PSE: Virally transduced calcium indicator in somatostatin neurons, differentiated by calretinin x somatostatin intersectional labeling
CR ID Sheet by eye and raw values: This sheet contains the estimates for determining CR identity for in vivo imaging during pseudo-training. 'CR ID' refers to estimated CR identity by eye (Not used for analysis). 'ACC6 raw': This column contains pixel values from the color channel imaging the cell fill from recorded cells.
M### indicates the mouse identity
FOV# indicates the field of view the cells were collected.
Fall.mat contains raw imaging data for:
-
F - somatic signal
-
Fneu - neuropil signal
-
iscell - whether ROI was classified as a cell during imaging
-
spks - deconvolved spikes (not used for analysis)
-
ops - not used for analysis
-
stat- this file sepcifically enables reloading into suite2p, the calcium imaging analysis program used to collect cell ROIs (not used for analysis)
Arduino
Each sheet contains the trial type(real or fake - puff or no puff). Some files collected later contain columns with trial number, solenoid on time, arduino clock time, solenoid off time (all in milliseconds). These were not used for analysis in this manuscript and can be ignored.
Every two sheets is a new day of imaging (i.e., Sheet 1&2 are two different sessions of day 1, Sheet 3&4 are two different sessions of day 2, etc.) If only one session was recorded for a particular FOV, the sheet was duplicated to maintain a consistent file structure.
Arduino Time Point
Each sheet contains the precise time point at which the solenoid was activated in milliseconds
Every two sheets is a new day of imaging (i.e., Sheet 1&2 are two different sessions of day 1, Sheet 3&4 are two different sessions of day 2, etc). If only one session was recorded for a particular FOV, the sheet was duplicated to maintain a consistent file structure.
Fixed Tissue
Fixed Tissue SST Manuscript data
This Excel file contains data for the reconstructed PSD95.FingR puncta from somatostatin neurons imaged using confocal microscopy. Units are measured in micrometers cubed. The first sheet contains a summary of the animals and experimental conditions they were tested, and whether they were stained for calbindin2 (denoted as CR). In the 'CR Stained?' column, N refers to not stained and Y refers to stained tissue. In the next columns, Y/N refers to whether this data is available. Data may be unavailable if it was not collected as part of the manuscript, or the tissue did not contain cells transduced in that particular layer.
The following sheets contain data from volumetric confocal images captured in different layers of the cortex across various experimental conditions. The sheet names describe the type of data they contain.
L#rawUnlabeled: Each column represents volumes of individually reconstructed surfaces from an animal trained in either Acclimation(ACC), Sensory Association Training 1 Day (SAT1), Sensory Association Training 5 Days (SAT5). The training condition each animal belongs to is indicated by its heading. The column label represents the animal it belonged to.
L#CR: These sheets contain reconstructed puncta separated by cell identity. Each heading represents whether the surfaces belonged to calbindin2 (CR) positive dendrites or calbindin2 negative dendrites. On the left are negative surfaces, while the right columns contain positive surfaces (indicated by their heading). Averages from puncta belonging to each animal are located next to the animal they were collected from. All puncta refers to all of the surfaces that were plotted for the cumulative distributions.
hM4Di: This sheet contains puncta from animals that were transduced with hM4Di. Left columns contain the mean puncta size from cells which were either positive or negative for Calbindin2 (CR+/-) or hM4Di (H+/-). The right columns contain all puncta from these four conditions.
L2 Pseudo: This sheet contains mean puncta sizes from PSD95 puncta surfaces from layer 2 of barrel cortex in animals either undergoing pseudotraining or the control acclimation condition. This tissue was stained for calbindin2 (CR) and separated accordingly.
V1: puncta sizes for all puncta collected, labeled as belonging to either calretinin-positive or calretinin-negative dendrites in L2 V1
Code/software
We used MATLAB and Python for the analysis of the raw data. The zip: Github_repository_clone.zip contains the analysis scripts used for calcium imaging and behavior datasets.
The GitHub link for scripts used for all analyses has been provided:
https://github.com/barthlab/Long-lasting-subtype-specific-regulation-of-somatostatin-interneurons-during-sensory-learning
(**)
Imported from SST Manuscript Data
(***)
Created by analysis code
Quick Start
- Unzip the downloaded dataset in the
data
directory - Run scripts 1-9 in sequential order
Project Structure
The folder structure should look like this:
├── data/
│ ├── **Calcium imaging/
│ │ ├── **Ai148_PSE/
│ │ ├── **Ai148_SAT/
│ │ ├── **Calb2_SAT/
│ │ └── **Calb2_PSE/
│ ├── Behavior/
│ ├── Feature/
│ ├── ***Extracted Feature/
│ ├── ***Best Clustering/
│ └── ***Clustering Result/
├── figures/
│ ├── 1_raw_data/
│ ├── 2_overview/
│ ├── 3_plasticity_manifold/
│ ├── 4_diagram/
│ ├── 5_features/
│ ├── 6_main_figure/
│ ├── 7_examples/
│ ├── 8_justification/
│ └── 9_behavior/
├── src/
│ ├── basic/
│ ├── feature/
│ ├── ploter/
│ ├── behavior/
│ ├── data_manager.py
│ └── config.py
├── script1_overview.py
├── script2_plasticity_manifold.py
├── script3_diagram.py
├── script4_feature_candidate.py
├── script5_prediction.py
├── script6_example.py
├── script7_clustering_distance.py
├── script8_confusion_matrix.py
└── script9_behavior.py
Analysis Scripts Documentation
Main Analysis Scripts
-
script1_overview.py
- Raw data visualization and overview analysis- Purpose: Generates overview visualizations of calcium imaging data including heatmaps and peak response analysis
- Outputs: Raw data plots, heatmap overviews by cell type, peak complex visualizations
- Dependencies: All experiments (Calb2_PSE, Calb2_SAT, Ai148_SAT, Ai148_PSE)
-
script2_plasticity_manifold.py
- Plasticity manifold analysis- Purpose: Analyzes plasticity changes between the Acclimation period and learning periods (SAT/PSE)
- Outputs: Plasticity manifold plots comparing baseline to learning periods
- Dependencies: Calb2_SAT and Calb2_PSE experiments only
-
script3_diagram.py
- Diagram generation for figures- Purpose: Creates specific diagram visualizations for publication figures
- Outputs: Overview diagrams, large view plots, trial diagrams
- Dependencies: Calb2_SAT experiment, specific example FOV
-
script4_feature_candidate.py
- Feature extraction and ranking- Purpose: Extracts features from calcium imaging data and ranks them by statistical significance
- Outputs: Sorted feature names JSON files, feature hierarchy plots, distribution plots
- Dependencies: Calb2_SAT experiment for feature ranking
-
script5_prediction.py
- Main clustering and prediction analysis- Purpose: Performs dimensionality reduction, clustering, and generates main figure visualizations
- Outputs: Embedding plots, clustering visualizations, feature summaries, fold-change analysis
- Dependencies: All experiments require pre-computed feature rankings
-
script6_example.py
- Example cell visualizations- Purpose: Generates example visualizations of individual cells and clusters
- Outputs: Individual cell examples colored by cluster ID and cell type
- Dependencies: Ai148_PSE and Calb2_PSE experiments
-
script7_clustering_distance.py
- Clustering validation analysis- Purpose: Analyzes clustering quality and distance distributions
- Outputs: Neighbor distribution plots for clustering validation
- Dependencies: Calb2_SAT experiment, top features
-
script8_confusion_matrix.py
- Classification performance evaluation- Purpose: Generates a confusion matrix for cell type classification performance
- Outputs: Confusion matrix heatmap with F1 score
- Dependencies: None, just for revision figures
-
script9_behavior.py
- Behavioral data analysis- Purpose: Analyzes behavioral performance data and correlate with imaging data
- Outputs: Daily behavior summary plots, performance bars by cell clusters
- Dependencies: Ai148_SAT experiment, requires clustering results
Supporting Modules (src/
Directory)
src/config.py
- Configuration parameters and experimental settingssrc/data_manager.py
- Core data structures and experiment managementsrc/data_status.py
- Data status tracking and validationsrc/basic/
- Basic utilities, terminology, and data operationssrc/feature/
- Feature extraction, clustering, and dimensionality reductionsrc/ploter/
- Visualization and plotting functionssrc/behavior/
- Behavioral data processing and analysis
Additional Scripts (other_scripts/
Directory)
-
matlab_pertrial_analysis.m
- MATLAB per-trial calcium imaging analysis- Processes Suite2p output for per-trial analysis of calcium signals
- Requires MATLAB with Statistics and Signal Processing Toolboxes
-
sat_cage_code_arduino/sat_cage_code_arduino.ino
- Arduino behavioral control code- Controls behavioral apparatus for somatosensory learning experiments
- Requires Arduino IDE with FileIO library
Software Requirements
Python Environment
-
Python Version: 3.11
-
Required Packages:
text numpy>=1.21.0 matplotlib>=3.5.0 scipy>=1.7.0 pandas>=1.3.0 scikit-learn>=1.0.0 seaborn>=0.11.0 umap-learn>=0.5.0 tqdm>=4.62.0 colorist>=1.4.0 xlwt>=1.3.0 xlrd>=2.0.0 openpyxl>=3.0.0
MATLAB Environment (for matlab_pertrial_analysis.m
)
- MATLAB Version: R2019b or later
- Required Toolboxes:
- Statistics and Machine Learning Toolbox
- Signal Processing Toolbox
- Input Files: Suite2p output files (F.npy, Fneu.npy, iscell.npy, ops.npy)
- Additional Files: Arduino timing data in Excel format
Arduino Environment (for behavioral control)
- Arduino IDE: 1.8.0 or later
- Required Libraries: FileIO library
- Hardware: Arduino-compatible board with relay control capabilities
Expected Outputs
Each script generates specific outputs in the figures/
directory:
1_raw_data/
- Raw data visualizations2_overview/
- Overview heatmaps and peak analysis3_plasticity_manifold/
- Plasticity analysis plots4_diagram/
- Publication diagrams5_features/
- Feature analysis plots6_main_figure/
- Main clustering and embedding results7_examples/
- Individual cell examples8_justification/
- Clustering validation plots9_behavior/
- Behavioral analysis results
Notes and Support
- Full analysis workflow may take several hours, depending on system specifications
- Consider running scripts individually for debugging
- README.md generated by LLM, manually checked, and modified
- For any issues or questions, please contact Max at xma3@andrew.cmu.edu
Access information
Other publicly accessible locations of the data:
Data were collected through different experimental preparations. Figures 1, 3, and 4 use calcium imaging data. Figure 2 uses confocal fluorescence imaging data processed through the image analysis program Imaris 8.4.1.