Energetics of cholesterol perception and translocation in the human Smoothened receptor
Data files
Jan 08, 2025 version files 24.70 GB
-
Fig2_Dryad.tar.gz
12.14 GB
-
Fig4_Dryad.tar.gz
12.55 GB
-
Fig7_dryad.tar.gz
3.14 MB
-
README.md
7.99 KB
-
Smo_mutants_QPCR_data_dryad_formatted.xlsx
24.77 KB
Abstract
Smoothened (SMO), a member of the G Protein-Coupled Receptor superfamily, mediates Hedgehog signaling and is linked to cancer and birth defects. SMO responds to accessible cholesterol in the ciliary membrane, translocating it via a longitudinal tunnel to its extracellular domain. Reaching a complete mechanistic understanding of the cholesterol translocation process would help in the development of cancer therapies. Competing hypotheses based on available structures support entry of cholesterol from outer and inner membrane leaflets, but the exact mechanism of translocation remains unclear. Using atomistic molecular dynamics simulations (∼2 millisecond simulations) and biochemical assays of SMO mutants, we assess the energetic feasibilities of proposed hypotheses. We show that the energetic barriers for cholesterol translocation from either leaflets are comparable. Mutagenesis experiments and complementary simulations of SMO mutants validate the role of critical amino acid residues along the translocation pathways. Our data suggests that cholesterol can take either pathway to enter SMO, thus explaining contradictory experimental observations in literature. Thus, our results illuminate the energetics and provide a first molecular description of cholesterol translocation in SMO.
README: A mechanism for cholesterol transport in the human Smoothened Receptor
https://doi.org/10.5061/dryad.76hdr7t4w
Description of the data and file structure
The following is an explanation of the overall dataset submitted to this repository:
The repository contains all the data (.npy files) that were used to plot the figures in the manuscript: Energetics of cholesterol perception and translocation in the human Smoothened receptor. The data presented here is being shared to increase reproducibility.
Accessing Data within Files
The data within any .pkl
file presented in the dataset can be accessed using the pickle
package, a part of the standard python library. For example:
import pickle
all_distances = pickle.load(open('./totdist_2ms.pkl', 'rb')) #Imports all the distances used to make MSM
The data within any .npy
file presented in the dataset can be accessed using the numpy
package, a part of the standard python library. For example:
import numpy as np
trajectory_distance = np.load('./p14960_run0_clone104_strip.npy') #Imports a particular .npy associated with an MD trajectory
The code can be edited to access the data within any .pkl
or .npy
file associated with this repository.
Files and variables
- Fig7_dryad.tar.gz (3.14 MB): Compressed .tar.gz with the data used to construct Figure 7 of the manuscript. The data was computed by sampling 10000 frames belonging to each minima (along the translocation pathway) from the MSM, using the microstates probabilities. The code used plotting the data is available here. Here is a detailed explanation of the files and folders inside the .tar.gz:
entered_protein_holeframes_to_trajframes.pkl
contains the trajectories and frames used to construct the hole plot for Fig 7a in the manuscript.entered_protein_x.npy
contains the x-coordinate data for the average plot radius for Fig 7a.entered_protein_y.npy
contains the y-coordinate data for the average plot radius for Fig 7a.bottom_holeframes_to_trajframes.pkl
contains the trajectories and frames used to construct the hole plot for Fig 7c in the manuscript.bottom_x.npy
contains the x-coordinate data for the average plot radius for Fig 7c.bottom_y.npy
contains the y-coordinate data for the average plot radius for Fig 7c.membrane_holeframes_to_trajframes.pkl
contains the trajectories and frames used to construct the hole plot for Fig 7e in the manuscript.membrane_x.npy
contains the x-coordinate data for the average plot radius for Fig 7e.membrane_y.npy
contains the y-coordinate data for the average plot radius for Fig 7e.left_holeframes_to_trajframes.pkl
contains the trajectories and frames used to construct the hole plot the case where the cholesterol is in the membrane.left_x.npy
contains the x-coordinate data for the average plot radius for the case where the cholesterol is in the membrane.left_y.npy
contains the y-coordinate data for the average plot radius for the case where the cholesterol is in the membrane.distI_refined_1p4ms_final.pkl
contains the names of the trajectories that were used to make fig. 7 (a, c, e) (Common Pathway)msmweights_200_95.pkl
MSM weights used to reweigh each Free Energy plot in Fig 7.
- Fig2_dryad.tar.gz (12.15 GB): Compressed .tar.gz with the data used to construct Figure 2 of the manuscript. The plotting code is available here. Here is a detailed explanation of the files and folders inside the .tar.gz:
- The dataset contains all the .npy files used to plot Fig 2a and 2b from the manuscript.
distI_2ms.pkl
contains the file names of all the trajectories simulated.totdist_2ms.pkl
contains the corresponding .npy files of the 89 distances used to construct the Markov State Model. The first distance (index 0) is the negative of the z-coordinate of the cholesterol being transported.weights_msm_200_95_300_23.pkl
contains the weights used to construct the final MSM weighted free energy plot.pkl_Fig2a_angle
contains the 11663 .npy files of the x- and y-coordinates of the cholesterol which undergoes translocation.pkl_Fig2b
contains the 11663 .npy files containing the angle of the cholesterol with the x-y plane (-180 to 180).
Fig4_dryad.tar.gz (12.55 GB): Compressed .tar.gz with the data used to construct Figure 4 of the manuscript. The plotting code is available here. Here is a detailed explanation of the files and folders inside the .tar.gz:
- The dataset contains all the .npy files used to plot Fig 4a and 4b from the manuscript.
distI_2ms.pkl
contains the file names of all the trajectories simulated.totdist_2ms.pkl
contains the corresponding .npy files of the 89 distances used to construct the Markov State Model. The first distance (index 0) is the negative of the z-coordinate of the cholesterol being transported.weights_msm_400_80_300.pkl
contains the weights used to construct the final MSM weighted free energy plot.pkl_Fig4a_angle
contains the 12513 .npy files of the x- and y-coordinates of the cholesterol which undergoes translocation.pkl_Fig4b
contains the 12513 .npy files containing the angle of the cholesterol with the x-y plane (-180 to 180).
Smo_mutants_QPCR_data_dryad_formatted.xlsx (24.77 KB): Excel file containing the QPCR data plotted in Fig 3 and Fig 5 in the manuscript. There are three sheets in the file, each corresponding to the different pathways explored in the manuscript (Pathway 1, Pathway 2, Common Pathway). The following applies to all 3 sheets,
CRD pathway
,TMDPathway1
andTMDPathway2
. Below is a detailed explanation of each column in the excel sheet:- Column A (
Mutant
) represents the mutants tested. - Column B (
Condition
) represents the Condition at which each mutant was tested (unt = No Shh added, low shh = Low amount of Shh added, high shh = Saturating amount of Shh added). - Columns C-F (
Gli1, Gli2, Gli3, Gli4
): Raw quantitative PCR (qPCR) measurements of Gli1 transcript levels from four replicates. Gli1 is a readout of Hedgehog pathway activity, higher values indicate greater pathway activation. - Columns G-J (
Gap1, Gap2, Gap3, Gap4
): qPCR measurements of Gapdh transcript levels from four replicates, serving as a normalization control for Gli1 expression. - Column K (
average gap
) : the mean ofgap1
,gap2
,gap3
, andgap4
. - Column L (values): raw Gli1 expression values, adjusted against Gapdh.
- Columns M-P (
dct1, dct2, dct3, dct4
) These represent the ΔCt values for Gli1 relative to Gapdh for each replicate. - Columns Q-T (
abundance1, abun2, abun3, abun4
) The normalized Gli1 expression levels calculated based on the ΔCt values, expressed as fold changes. - Column U (
average
) mean of the normalized Gli1 expression levels. - Column V (
average untreated
) mean normalized Gli1 expression level in untreated conditions. - Columns W-Z (
fc1, fc2, fc3, fc4
), highlighted in green: Fold changes (FC) in Gli1 expression for each replicate under specific conditions (unt, low or high SHH) compared to untreated conditions. These are the values plotted in the manuscript.
- Column A (
Code/software
Code used to plot the figures in the paper is available Github.
Access information
Other publicly accessible locations of the data:
- All trajectories simulated for this manuscript (> 2 TB) are available on Box.
Methods
Dataset was collected using Long-Timescale Molecular Dynamics Simulations (MD). Simulations were started from four starting points that consisted of Smoothened (SMO) protein embedded in a membrane. Each starting point contained the cholesterol at a different site in SMO. Simulations were performed in an adaptive manner, consisting of multiple rounds of sampling with hundreds of trajectories run in parallel for each round. After every round of sampling, we gathered the entire dataset until that point, clustered the trajectory frames according to metrics that would differentiate between the different stages of cholesterol transport. Then simulations were seeded from clusters with the least population, and the next round of simulations was run. In total, the entire dataset consists of 2 ms of unbiased simulations. The files deposited in this repository contain the .npy files computed from each MD trajectory, which were used to plot the free energy landscapes in the manuscript. Simulations were run using AMBER and OpenMM MD simulation engines.
Another set of biased simulations were run with different SMO mutants to describe the overall transport process for cholesterol, and compared to WT SMO. The .pmf files generated by NAMD are submitted to this repository for easy replication.
Python was extensively used to analyse the dataset generated. This work has been performed by Prateek Bansal and Diwakar Shukla at the University of Illinois Urbana-Champaign.
This Dryad dataset has been made available in compliance with the "minimum dataset"/source data requirement per the Data Availability guidelines by Nature Communications.