Multiple modes of cholesterol translocation in the human Smoothened receptor

Bansal, Prateek 1 ; Kinnebrew, Maia 2 ; Rohatgi, Rajat2 ; Shukla, Diwakar 1

Published Jan 08, 2025 on Dryad. https://doi.org/10.5061/dryad.76hdr7t4w

Data files

Jan 08, 2025 version files 24.70 GB

Fig2_Dryad.tar.gz

12.14 GB
Fig4_Dryad.tar.gz

12.55 GB
Fig7_dryad.tar.gz

3.14 MB
README.md

7.99 KB
Smo_mutants_QPCR_data_dryad_formatted.xlsx

24.77 KB

Jan 08, 2025 version files 24.70 GB

Fig2_Dryad.tar.gz

12.14 GB
Fig4_Dryad.tar.gz

12.55 GB
Fig7_dryad.tar.gz

3.14 MB
README.md

7.99 KB
Smo_mutants_QPCR_data_dryad_formatted.xlsx

24.77 KB

Abstract

Smoothened (SMO), a member of the G Protein-Coupled Receptor superfamily, mediates Hedgehog signaling and is linked to cancer and birth defects. SMO responds to accessible cholesterol in the ciliary membrane, translocating it via a longitudinal tunnel to its extracellular domain. Reaching a complete mechanistic understanding of the cholesterol translocation process would help in the development of cancer therapies. Competing hypotheses based on available structures support entry of cholesterol from outer and inner membrane leaflets, but the exact mechanism of translocation remains unclear. Using atomistic molecular dynamics simulations (∼2 millisecond simulations) and biochemical assays of SMO mutants, we assess the energetic feasibilities of proposed hypotheses. We show that the energetic barriers for cholesterol translocation from either leaflets are comparable. Mutagenesis experiments and complementary simulations of SMO mutants validate the role of critical amino acid residues along the translocation pathways. Our data suggests that cholesterol can take either pathway to enter SMO, thus explaining contradictory experimental observations in literature. Thus, our results illuminate the energetics and provide a first molecular description of cholesterol translocation in SMO.

https://doi.org/10.5061/dryad.76hdr7t4w

Description of the data and file structure

The following is an explanation of the overall dataset submitted to this repository:

The repository contains all the data (.npy files) that were used to plot the figures in the manuscript: Energetics of cholesterol perception and translocation in the human Smoothened receptor. The data presented here is being shared to increase reproducibility.

Accessing Data within Files

The data within any .pkl file presented in the dataset can be accessed using the pickle package, a part of the standard python library. For example:

import pickle
all_distances = pickle.load(open('./totdist_2ms.pkl', 'rb')) #Imports all the distances used to make MSM

The data within any .npy file presented in the dataset can be accessed using the numpy package, a part of the standard python library. For example:

import numpy as np
trajectory_distance = np.load('./p14960_run0_clone104_strip.npy')  #Imports a particular .npy associated with an MD trajectory

The code can be edited to access the data within any .pkl or .npy file associated with this repository.

Files and variables

Fig7_dryad.tar.gz (3.14 MB): Compressed .tar.gz with the data used to construct Figure 7 of the manuscript. The data was computed by sampling 10000 frames belonging to each minima (along the translocation pathway) from the MSM, using the microstates probabilities. The code used plotting the data is available here. Here is a detailed explanation of the files and folders inside the .tar.gz:
- entered_protein_holeframes_to_trajframes.pkl contains the trajectories and frames used to construct the hole plot for Fig 7a in the manuscript.
- entered_protein_x.npy contains the x-coordinate data for the average plot radius for Fig 7a.
- entered_protein_y.npy contains the y-coordinate data for the average plot radius for Fig 7a.
- bottom_holeframes_to_trajframes.pkl contains the trajectories and frames used to construct the hole plot for Fig 7c in the manuscript.
- bottom_x.npy contains the x-coordinate data for the average plot radius for Fig 7c.
- bottom_y.npy contains the y-coordinate data for the average plot radius for Fig 7c.
- membrane_holeframes_to_trajframes.pkl contains the trajectories and frames used to construct the hole plot for Fig 7e in the manuscript.
- membrane_x.npy contains the x-coordinate data for the average plot radius for Fig 7e.
- membrane_y.npy contains the y-coordinate data for the average plot radius for Fig 7e.
- left_holeframes_to_trajframes.pkl contains the trajectories and frames used to construct the hole plot the case where the cholesterol is in the membrane.
- left_x.npy contains the x-coordinate data for the average plot radius for the case where the cholesterol is in the membrane.
- left_y.npy contains the y-coordinate data for the average plot radius for the case where the cholesterol is in the membrane.
- distI_refined_1p4ms_final.pkl contains the names of the trajectories that were used to make fig. 7 (a, c, e) (Common Pathway)
- msmweights_200_95.pkl MSM weights used to reweigh each Free Energy plot in Fig 7.
Fig2_dryad.tar.gz (12.15 GB): Compressed .tar.gz with the data used to construct Figure 2 of the manuscript. The plotting code is available here. Here is a detailed explanation of the files and folders inside the .tar.gz:
- The dataset contains all the .npy files used to plot Fig 2a and 2b from the manuscript.
- distI_2ms.pkl contains the file names of all the trajectories simulated.
- totdist_2ms.pkl contains the corresponding .npy files of the 89 distances used to construct the Markov State Model. The first distance (index 0) is the negative of the z-coordinate of the cholesterol being transported.
- weights_msm_200_95_300_23.pkl contains the weights used to construct the final MSM weighted free energy plot.
- pkl_Fig2a_angle contains the 11663 .npy files of the x- and y-coordinates of the cholesterol which undergoes translocation.
- pkl_Fig2b contains the 11663 .npy files containing the angle of the cholesterol with the x-y plane (-180 to 180).
Fig4_dryad.tar.gz (12.55 GB): Compressed .tar.gz with the data used to construct Figure 4 of the manuscript. The plotting code is available here. Here is a detailed explanation of the files and folders inside the .tar.gz:
- The dataset contains all the .npy files used to plot Fig 4a and 4b from the manuscript.
- distI_2ms.pkl contains the file names of all the trajectories simulated.
- totdist_2ms.pkl contains the corresponding .npy files of the 89 distances used to construct the Markov State Model. The first distance (index 0) is the negative of the z-coordinate of the cholesterol being transported.
- weights_msm_400_80_300.pkl contains the weights used to construct the final MSM weighted free energy plot.
- pkl_Fig4a_angle contains the 12513 .npy files of the x- and y-coordinates of the cholesterol which undergoes translocation.
- pkl_Fig4b contains the 12513 .npy files containing the angle of the cholesterol with the x-y plane (-180 to 180).

Smo_mutants_QPCR_data_dryad_formatted.xlsx (24.77 KB): Excel file containing the QPCR data plotted in Fig 3 and Fig 5 in the manuscript. There are three sheets in the file, each corresponding to the different pathways explored in the manuscript (Pathway 1, Pathway 2, Common Pathway). The following applies to all 3 sheets, CRD pathway, TMDPathway1 and TMDPathway2. Below is a detailed explanation of each column in the excel sheet:
- Column A (Mutant) represents the mutants tested.
- Column B (Condition) represents the Condition at which each mutant was tested (unt = No Shh added, low shh = Low amount of Shh added, high shh = Saturating amount of Shh added).
- Columns C-F (Gli1, Gli2, Gli3, Gli4): Raw quantitative PCR (qPCR) measurements of Gli1 transcript levels from four replicates. Gli1 is a readout of Hedgehog pathway activity, higher values indicate greater pathway activation.
- Columns G-J (Gap1, Gap2, Gap3, Gap4): qPCR measurements of Gapdh transcript levels from four replicates, serving as a normalization control for Gli1 expression.
- Column K (average gap) : the mean of gap1, gap2, gap3, and gap4.
- Column L (values): raw Gli1 expression values, adjusted against Gapdh.
- Columns M-P (dct1, dct2, dct3, dct4) These represent the ΔCt values for Gli1 relative to Gapdh for each replicate.
- Columns Q-T (abundance1, abun2, abun3, abun4) The normalized Gli1 expression levels calculated based on the ΔCt values, expressed as fold changes.
- Column U (average) mean of the normalized Gli1 expression levels.
- Column V (average untreated) mean normalized Gli1 expression level in untreated conditions.
- Columns W-Z (fc1, fc2, fc3, fc4), highlighted in green: Fold changes (FC) in Gli1 expression for each replicate under specific conditions (unt, low or high SHH) compared to untreated conditions. These are the values plotted in the manuscript.

Code/software

Code used to plot the figures in the paper is available Github.

Access information

Other publicly accessible locations of the data:

All trajectories simulated for this manuscript (> 2 TB) are available on Box.