Complementary cortical and thalamic contributions to cell-type-specific striatal activity dynamics during movement
Data files
Dec 22, 2025 version files 5.19 GB
-
neural_activity_data.zip
2.05 GB
-
paw_position_data.zip
2.40 GB
-
README.md
8.73 KB
-
TrainedModels.zip
738.97 MB
Abstract
Coordinated motor behavior emerges from information flow across brain regions. How long-range inputs influence cell-type-specific activity within motor circuits remains unclear. The dorsolateral striatum (DLS) contains direct- and indirect-pathway medium spiny neurons (dMSNs and iMSNs) that exhibit distinct roles in movement control, and receives converging cortical and thalamic inputs. We performed 2-photon imaging from dMSNs, iMSNs, and their cortical and thalamic inputs identified by monosynaptic rabies tracing, as mice executed a skilled locomotion task. We used recurrent neural network (RNN) classifiers and hierarchical clustering analyses to reveal functionally heterogeneous subpopulations in each population. We found that dMSNs were preferentially active at movement onset and offset, and iMSNs during execution. Cortical and thalamic inputs were preferentially active during onset/offset and execution, respectively. dMSN- and iMSN-projecting neurons in each region showed similar trial-averaged activity patterns, although single-trial features might contribute to cell-type-specific differences. Furthermore, a subset of thalamic neurons projecting to dMSNs encoded rhythmic limb movements in a locomotion phase-specific manner, a pattern also found in a small subset of dMSNs. Inactivation of either cortex or thalamus substantially reduced MSN activity. These results suggest that corticostriatal and thalamostriatal inputs contribute complementary motor-related information via shared and cell-type-specific pathways.
Dryad DOI: https://doi.org/10.5061/dryad.np5hqc07j
This repository contains the full analysis pipeline for the paper:
Gjoni E., Sristi R. D., Liu H., Dror S., Lin X., O’Neil K., Arroyo O. M.,
Hong S. W., Kim H., Liu J., Blumenstock S., Lim B., Mishne G., & Komiyama T. (2025).
Complementary cortical and thalamic contributions to cell-type-specific striatal activity dynamics during movement.
The code reproduces Figures 1–4 from the manuscript.
Data Requirements
Before running any notebooks, download and unzip the following datasets:
- Neural activity data
Unzipneural_activity_data.zipintodata/ - Paw-position (DeepLabCut) data
Unzippaw_position_data.zipintodata/ - Trained TEA-Net models
UnzipTrainedModels.zipintodata/
Detailed descriptions of each dataset are provided below.
Repository Structure
data/neural_activity_data/paw_position_data/TrainedModels/
figure1_raw_data.ipynbfigure2_classifier_TEAnet.ipynbfigure3_clustering.ipynbfigure4_rhythmicity.ipynbdata_loader.pyutils/models/
Neural Activity Data Format
Region and Cell-Type Naming Conventions
This repository uses two naming conventions.
Brain regions:
- Paper terminology: DLS, M1, M2, PF
- Code / filenames: Str, m1, Ctx, Tha
Mapping:
- DLS → Str
- M1 → m1
- M2 → Ctx
- PF → Tha
Cell types:
- dMSNs (direct pathway neurons) → D1R
- iMSNs (indirect pathway neurons) → A2A
All data files use the code naming convention.
The manuscript uses the paper naming convention.
Reproducing Figures
- Figure 1:
figure1_raw_data.ipynb - Figure 2:
figure2_classifier_TEAnet.ipynb - Figure 3:
figure3_clustering.ipynb - Figure 4:
figure4_rhythmicity.ipynb
Detailed Data description
Contents of neural_activity_data/
For each region (Str, m1, Ctx, Tha) and each cell type (D1R, A2A), the folder contains four .npy files describing sampling frequency, behavioral alignment, neuron metadata, and neural activity matrices.
1. avgFreq_{nType}_{region}.npy
Example: avgFreq_D1R_Str.npy
- Contains a single scalar value.
- Represents the average sampling frequency (in Hz) at which neural activity was collected for that region and cell type.
- Used to convert frame indices to time in seconds.
2. GoCueOffset_{nType}_{region}.npy
Example: GoCueOffset_A2A_m1.npy
- Contains an array of integers.
- Each value indicates the frame index of the Go Cue for that neuron within the extracted time window.
- Provides a consistent behavioral alignment reference across neurons and trials.
3. Stacked/allTrials_index_{nType}_{region}.npy
Example: Stacked/allTrials_index_D1R_Ctx.npy
- Contains a matrix where each row describes one neuron.
- Columns correspond to:
- Animal_ID
- Session_ID
- Neuron_ID
- The nth row corresponds exactly to the nth row in
allTrials_stackfor the same region and cell type.
4. Stacked/allTrials_stack_{nType}_{region}.npy
Example: Stacked/allTrials_stack_A2A_Tha.npy
- Contains the neural activity matrix for all neurons across all trials for a given region and cell type.
- Shape:
- (number_of_neurons × number_of_trials, number_of_time_frames)
- Each row corresponds to the neuron described in the same row of
allTrials_index.
The neural activity window spans approximately 19 seconds, consisting of:
- 5 seconds pre-ITI
- 1 second between Go Cue and ladder movement onset
- 8 seconds of ladder traversal
- 5 seconds post-ITI
Summary
Together, these four files provide a complete description of:
- Sampling frequency
- Behavioral alignment
- Neuron metadata
- Neural activity time series
Paw Position and Neural Activity Aligned Data for Figure 4 rhythmicity analysis
This repository includes paw position–aligned neural activity data used for rhythmicity and behavior–neural coupling analyses.
File Overview
- df_neural_paw_{region}pca_direction_aligned_centroid{forepaw}.pickle
Contains paw position and corresponding neural activity data for valid trials. - df_movement_start_end_{region}.pickle
Contains trial-wise movement onset and offset information.
Movement File (df_movement_start_end_*)
- Each row corresponds to a single trial.
- Columns include:
- 0: Animal ID
- 1: Session ID
- 2: Recording date (DDMMYY format)
- 3: nType
- 4: Trial ID
- 5: Animal movement start time (relative to GoCue, where GoCue = 0)
- 6: Animal movement end time (relative to GoCue, where GoCue = 0)
- This file specifies the start and end of animal movement for each trial.
Neural–Paw Aligned Data (df_neural_paw_*)
The neural–paw file is a dictionary with the following entries.
Neural Activity Data (df_neural)
- Stored as a pandas DataFrame.
- Columns represent:
- 0: Animal
- 1: session1d
- 2: Trial1d
- 3: NeuronId
- 4: Date and session number (DDMMYY_F{sess_num})
- 5: Neuron count in session
- 6: nType (0 = iMSN, 1 = dMSN)
- 7 onward: Neural activity time series
Paw Position Data (df_paw_left and df_paw_right)
- Stored as pandas DataFrames.
- df_paw_left and df_paw_right are identical in structure; only the paw-position values differ (left vs right forepaw).
- Columns represent:
- 0: Animal
- 1: session1d
- 2: Date (DDMMYY)
- 3: D1R/A2A (string)
- 4: TrialID
- 5: boolean cell type (0 = iMSN, 1 = dMSN)
- 6 onward: Paw position time series
- session: Date and session number (DDMMYY_F{sess_num})
PCA Alignment
- Paw position data consists of 2D trajectories (x and y coordinates) for each forepaw.
- Principal Component Analysis (PCA) is applied to the paw trajectory to extract the dominant direction of movement.
- The first principal component defines the movement direction used for alignment.
- If forepaw = left, the PCA direction is computed from the left forepaw position.
- If forepaw = right, the PCA direction is computed from the right forepaw position.
paw position is projected along this PCA-derived movement axis, ensuring that all trials are aligned to the dominant direction of limb movement.
Additional Metadata
- start_data_idx_paw
Column index from which paw position data begins in the paw DataFrames. - start_data_idx
Column index from which neural activity data begins in the neural DataFrame. - time
Time vector relative to GoCue; length matches the neural and paw time series.
Temporal Alignment and Trial Selection
- Neural and paw position data are aligned to the ladder movement epoch.
- Trials with missing/unavailable paw positions are removed.
TrainedModels Directory
The TrainedModels/ directory contains all trained TEA-net models used for neuron type classification across brain regions and control conditions.
Folder Structure
Each subfolder follows the naming pattern:
{region}_shuffle_{boolean}
Where:
region∈ {Str, m1, Ctx, Tha}boolean∈ {True, False}
The meaning of boolean is:
shuffle_False: Models trained on true labels (main results reported in the paper).shuffle_True: Models trained under shuffle control, where neuron labels are randomly permuted to estimate chance-level performance.
Example folders:
Str_shuffle_FalseCtx_shuffle_TrueTha_shuffle_False
Model Files Inside Each Folder
Within each {region}_shuffle_{boolean} folder are multiple trained TEA-net models saved from different random initializations and cross-validation folds.
Each model file follows the naming convention:
{ML_model_name}_{region}_model_lr_{lr}_numsample_{num_samples}_{rand_init_iter}_{cv}_{batch_size}.model
Where:
ML_model_name: Name of the architecture (e.g.,rnn_attention_model)region: Brain region (Str, m1, Ctx, Tha)lr: Learning rate used for trainingnum_samples: Nneurons sampled during trainingrand_init_iter: Random seed initialization index for model trainingcv: Cross-validation fold indexbatch_size: Batch size used during training
Key Training Indices
The two most important indices in the filename are:
rand_init_iter:- Controls random weight initialization and data sampling
- Multiple values are used to assess robustness across random seeds
cv:- Indicates the cross-validation fold
- Values range from 0 to 9, corresponding to 10-fold cross-validation
- Each fold uses a different train/test split of neurons
Together, these ensure that classification performance is evaluated across multiple random initializations and multiple train/test splits.
