Data from: Elucidation of molecular mechanisms of sex-based arrhythmias
Data files
Feb 05, 2024 version files 148.38 GB
-
herg_o2_dht_zm50_namd_u01_r1_70_partial.tar.gz
-
herg_o2_dht_zm50_namd06_pullx_us0x.tar.gz
-
herg_o2_e2b_zm50_namd_u01_r1_50.tar.gz.partaa
-
herg_o2_e2b_zm50_namd_u01_r1_50.tar.gz.partab
-
herg_o2_e2b_zm50_namd06_pullx_us0x.tgz
-
hormone_herg_silcs_docking_.tar.gz
-
README.md
Abstract
Female sex has been shown to be an independent risk factor for both inherited and acquired heart rhythm abnormalities, such as long QT syndrome (LQTS) and associated arrhythmias. Notably, female sex is a key element in up to 70% prevalence of drug-induced acquired LQTS, However, fundamental molecular mechanisms that explain this phenomenon are not well understood. Previous experimental and clinical studies suggested that it is likely related to differential levels of sex hormones (estradiol, progesterone and testosterone) playing opposite roles in pro-arrhythmia proclivities, exacerbating or mitigating effects of mutations or drugs on cardiac ion channels. In the American Heart Association (AHA) sponsored career development award 19CDA34770101 "Elucidation of molecular mechanisms of sex-based arrhythmias" we focused on hormone interactions with the human Kv11.1 potassium channel (encoded by the hERG - human Ether-à-go-go-Related Gene), a major contributor to cardiac action potential repolarization and an anti-target for diverse drug molecules. We performed a comprehensive set of in silico atomistic modeling and simulations on hERG structure and function modulation by sex hormones in combination with hERG channel blockers with different proclivities for arrhythmogenesis. These studies were informed by and will also guide electrophysiological experiments on cardiomyocytes and hERG-expressing HEK cells by our collaborators. Molecular dynamics (MD) simulation and molecular docking data, presented here and validated by electrophysiological recordings, will provide us with quantitative estimates of such hormone modulatory effects and will be used for elucidation of molecular mechanisms of sex-dependent heart rhythm abnormalities and thus the ways to combat them via rational design of sex-specific cardiac safe pharmaceuticals and/or their dose adjustments. The dataset contains MD simulations of hERG channel - hormone and drug interactions as well as their analyses. Multi-microsecond-long umbrella sampling MD simulations as well as fragment-based Site-Identification by Ligand Competitive Saturation (SILCS) docking results were performed and presented here.
README: Title of Dataset
Data for the grant "Elucidation of molecular mechanisms of sex-based arrhythmias"
This dataset is associated with the In the American Heart Association (AHA) sponsored career development award 19CDA34770101 "Elucidation of molecular mechanisms of sex-based arrhythmias, in which we focused on hormone and/or drug interactions with the human Kv11.1 potassium channel (encoded by the hERG - human Ether--go-go-Related Gene), a major contributor to cardiac action potential repolarization and an anti-target for diverse drug molecules. The dataset contains MD simulations of hERG channel - hormone and/or drug interactions as well as their analyses. Multi-microsecond-long umbrella sampling MD simulations as well as fragment-based Site-Identification by Ligand Competitive Saturation (SILCS) docking results were performed and presented here.
Description of the data and file structure
The dataset includes gzipped tar ball files, which contain multiple files, which need to be extracted and uncompressed.
herg_o2_e2b_zm50_namd_u01_r1_50.tar.gz.partaa - gzipped tar ball file (part 1/2) for the hERG channel - estradiol (E2b) interactions using all-atom umbrella sampling molecular dynamics (US-MD) simulations
herg_o2_e2b_zm50_namd_u01_r1_50.tar.gz.partab - gzipped tar ball file (part 2/2) for the hERG channel - estradiol (E2b) interactions using all-atom umbrella sampling molecular
dynamics (US-MD) simulations
herg_o2_dht_zm50_namd_u01_r1_70_partial.tar.gz - gzipped tar ball file for the hERG channel - dihydrotestosterone (DHT) interactions using all-atom umbrella sampling molecular dynamics (US-MD) simulations.
herg_o2_dht_zm50_namd06_pullx_us0x.tar.gz - gzipped tar ball file for the hERG channel - dihydrotestosterone (DHT) interactions using initial 50 ns-long equilibration and five 90-ns-long pulling (steered MD) runs.
herg_o2_e2b_zm50_namd06_pullx_us0x.tar.gz - gzipped tar ball file for the hERG channel - estradiol (E2b) interactions using initial 50 ns-long equilibration and five 90-ns-long pulling (steered MD) runs.
hormone_herg_silcs_docking_.tar.gz - gzipped tar ball file for the hERG channel - hormone and/or drug interactions using SILCS MC fragment-based molecular docking.
Extraction and decompression instructions for .tar.gz:
tar xvfz FILE_NAME.tar.gz
Or
gunzip FILE_NAME.tar.gz
tar xvf ILE_NAME.tar
Split tar.gz files (.partaa and .partb) need to be combined first
cat FILE_NAME.tar.gz.partaa FILE_NAME.tar.gz.partab > FILE_NAME.tar.gz
After extraction of all those files the following folders will be created:
herg_o2_e2b_zm50_namd_u01_r1_50/
herg_o2_dht_zm50_namd_u01_r1_70_partial/
herg_o2_dht_zm50_namd06_pullx_us0x/
herg_o2_e2b_zm50_namd06_pullx_us0x/
hormone_herg_docking/
Below is the description of each of these folders as well as subfolders and files within them. There are also README files within some of those folders with more detailed information.
Folder herg_o2_e2b_zm50_namd_u01_r1_50/ - data for all-atom umbrella sampling molecular dynamics (US-MD) simulations of hERG channel - estradiol (E2b) interactions.
90 US-MD windows were used with 50 ns per window, 4,500 ns of MD simulation time in total. Initial and final coordinates in the PDB and NAMD binary (.coor) formats, binary velocity (.vel) and system size (.xtc) files, protein structure files (.psf), NAMD input (.inp) and output (.out or .log) files., trajectory files for collective variables (.traj), force field parameter files (.par) are included for each US-MD run. Binary trajectory (.dcd) files are not included.
Subfolders 1/ to 90/ contain data for each of US-MD runs 1 to 90 corresponding to Z value of -50 to -5.5 Angstrom. Z distance is counted with respect to center of mass (COM) of C-alpha atoms of selectivity filter (SF) residues 624-628. In each folder there are runs 1-50, each of them corresponds to 1 ns of the simulation.
File names are:step9.[run_number]_production.[extension]
For instance, for window 5 and run 10 files are:
5/step9.10_production.col - NAMD collective variable (colvar) file, text format, used by NAMD software and can be read by NAMD or text editor
5/step9.10_production.colvars.state - NAMD collective variable (colvar) state file, text format, output of NAMD software, contains latest values of colvars at the end of each run
5/step9.10_production.colvars.traj - NAMD collective variable (colvar) trajectory file, text format, output of NAMD software, contains trajectory of colvars for the whole run
5/step9.10_production.coor - NAMD binary coordinate file at the end of each run, can be read by VMD or NAMD software after reading PDB or PSF file containing atom names and topology
5/step9.10_production.inp - NAMD input file, text format, can be read using text editor and by NAMD software
5/step9.10_production.log - NAMD output file, text format, can be read using text editor
5/step9.10_production.vel - NAMD binary velocity file at the end of each file, can be read by VMD or NAMD software after reading PDB or PSF file containing atom names and topology.
5/step9.10_production.xsc - NAMD unit cell file (x, y, z and angles) at the end of each run, text file, can be read using text editor, NAMD and VMD software
5/step9.10_production.xst NAMD unit cell file (x, y, z and angles) trajectory for the entire run, text file, can be read using text editor, NAMD and VMD software
For each window we also have initial coordinates and velocity files
5/step8.5_production.coor - initial NAMD binary coordinate file for each US-MD window, can be read by VMD or NAMD software after reading PDB or PSF file containing atom names and topology
5/step8.5_production.vel - initial NAMD binary velocity file for each US-MD window, can be read by VMD or NAMD software after reading PDB or PSF file containing atom names and topology
5/step8.5_production.xsc - initial NAMD unit cell file (x, y, z and angles) at the end of each run, text file, can be read using text editor, NAMD and VMD software.
5/step5_assembly.namd.str - NAMD stream file, text format, can be read by NAMD software and text editor
5/checkfft.str - NAMD stream file for fast Fourier transform (FFT) parameters, text format, can be read by NAMD software and text editor
5/production_restraint.namd.col - NAMD collective variable (colvar) initial file, text format, used by NAMD software and can be read by NAMD or text editor
Subfolder all_inp/ contain input files and job submission scripts, common for all US-MD windows or templates (.templ) used to generate ones specific for each US-MD run. All files are in text format and can be read by any text editor.
all_inp/checkfft.str - NAMD stream file for fast Fourier transform (FFT) parameters, text format, can be read by NAMD software and text editor
all_inp/herg_mx0_cometgpu_us_wx_02.qsub.templ - SLURM job submission script template for SDSC Comet GPU nodes
all_inp/herg_mx0_comet_us_wx_02.qsub.templ - SLURM job submission script template for SDSC Comet CPU nodes
all_inp/production_restraint.namd.col.templ - NAMD stream file template, text format, can be read by NAMD software and text editor
all_inp/step5_assembly.namd.str - NAMD collective variable template file, text format
all_inp/step9.1_production.inp.templ - NAMD input file template for run #1
all_inp/step9_production.inp - NAMD input file template for runs #2-50
Subolder toppar/ contains CHARMM parameter files in text format, all files can be read by any text editor, NAMD or CHARMM software. par_.prm - parameter files for different biomolecular systems (proteins, lipids etc.). toppar_.str - auxiliary parameter files for specific molecules (e.g., water).
Subfolder restraints/ contains NAMD files for restraints used in the MD simulations. All files are in text format and can be read by any text editor, NAMD or CHARMM software. *.ref files are reference coordinates (in the PDB text format) with restraints written as a previous to the last column (e.g., 1.00 - restraint, 0.00 - no restraint) for each atom. write_ca_rest.tcl is VMD script in the text format, which generates those restraint files. dihe.txt is restraint file for dihedral angles.
Subfolder e2b/ contains CHARMM topology (e2b.rtf) and parameter (e2b.prm) files for estradiol in the text format, can be read by NAMD or CHARMM software.
Subfolder pmf_convergence/ contains text files of US_MD collective variables to compute free energy or potential of mean force (PMF) profile using weighted histogram analysis method (WHAM).
pmf_[win_num]_[run_num].dat are text data files for US_MD window and run number.
wham_old2_newf.f90 - WHAM program script written in Fortran 90, can be compiled using GNU Fortran.
wham_old2_newf.exe - WHAM program executable in the binary format, can be run on x86 computers using Linux
wham.inp - text input file for WHAM program
wham_clean.run - shell script in text format to remove unneeded files.
win_ene_hist2o.run - shell script in text format to generate PMF histogram.
Files in the parent subfolder ./
check_max_run.run - shell script in text format to check max run for each US-MD window
convert_NAMD2WHAM*.sh - shell script in text format to convert colvar trajectories from NAMD to WHAM program format
copy_run_qsub_us_windows.run - - shell script in text format to copy US-MD window files
e2b*.qsub - queue submission scripts for different platforms and for different US-MD windows (w#).
edit_pmf.sh - shell script in text format to edit information in WHAM colvar trajectory files if needed.
make_all_us_windows.run - shell script in text format to generate all US-MD windows
random_num.txt - text file with random numbers.
README.txt - text file with description of MD simulation procedure.
shuffle_windows_new.py - Python script in text format to generate non-repeating random numbers for subsequent US-MD windows.
shuffle_windows.py - Python script in text format to generate random numbers for subsequent US-MD windows (not used).
step5_assembly.namd.pdb - Initial molecular coordinates in the PDB text NAMD compatible format.
step5_assembly.pdb - Initial molecular coordinates in the PDB text original format.
step5_assembly.psf - CHARMM protein structure file (PSF) in the text format.
step5_assembly.xplor_ext.psf - CHARMM protein structure file (PSF) in the extended text format.
Folder herg_o2_dht_zm50_namd_u01_r1_70_partial/ - data for all-atom umbrella sampling molecular dynamics (US-MD) simulations of the hERG channel - dihydrotestosterone (DHT) interactions.
90 US-MD windows were used with up to 70 ns per window, 6,300 ns of MD simulation time in total. Initial and final coordinates in the PDB and NAMD binary (.coor) formats, binary velocity (.vel) and system size (.xtc) files, protein structure files (.psf), NAMD input (.inp) and output (.out or .log) files, trajectory files for collective variables (.traj), force field parameter files (.par) are included for each US-MD run. Binary trajectory (.dcd) files are not included. Some files not important for free energy and diffusion coefficient calculation analysis are missing but can be re-generated using new runs if needed.
Subfolders 1/ to 90/ contain data for each of US-MD runs 1 to 90 corresponding to Z value of -50 to -5.5 Angstrom. Z distance is counted with respect to center of mass (COM) of C-alpha atoms of selectivity filter (SF) residues 624-628. In each folder there are runs 1-30 (windows 1-50) or 1-15 (windows 51-90), each of them corresponds to 1 ns of the simulation.
Subfolder all_inp/ contain input files and job submission scripts, common for all US-MD windows or templates (.templ) used to generate ones specific for each US-MD run. All files are in text format and can be read by any text editor.
Subolder toppar/ contains CHARMM parameter files in text format, all files can be read by any text editor, NAMD or CHARMM software. par_.prm - parameter files for different biomolecular systems (proteins, lipids etc.). toppar_.str - auxiliary parameter files for specific molecules (e.g., water).
Subfolder restraints/ contains NAMD files for restraints used in the MD simulations. All files are in text format and can be read by any text editor, NAMD or CHARMM software.
Subfolder dht/ contains CHARMM topology (dht.rtf) and parameter (dht.prm) files for dihydrotestosterone in the text format, can be read by NAMD or CHARMM software.
Subfolder pmf_convergence/ contains text files of US_MD collective variables to compute free energy or potential of mean force (PMF) profile using weighted histogram analysis method (WHAM).
Files in the parent subfolder ./
check_max_run.run - shell script in text format to check max run for each US-MD window
convert_NAMD2WHAM*.sh - shell script in text format to convert colvar trajectories from NAMD to WHAM program format
copy_run_qsub_us_windows.run - - shell script in text format to copy US-MD window files
dht*.qsub - queue submission scripts for different platforms and for different US-MD windows (w#).
edit_pmf.sh - shell script in text format to edit information in WHAM colvar trajectory files if needed.
make_all_us_windows.run - shell script in text format to generate all US-MD windows
random_num.txt - text file with random numbers.
README.txt - text file with description of MD simulation procedure.
shuffle_windows_new.py - Python script in text format to generate non-repeating random numbers for subsequent US-MD windows.
shuffle_windows.py - Python script in text format to generate random numbers for subsequent US-MD windows (not used).
step5_assembly.namd.pdb - Initial molecular coordinates in the PDB text NAMD compatible format.
step5_assembly.pdb - Initial molecular coordinates in the PDB text original format.
step5_assembly.psf - CHARMM protein structure file (PSF) in the text format.
step5_assembly.xplor_ext.psf - CHARMM protein structure file (PSF) in the extended text format.
all_ligand_tumble_hERG.tcl - example CMD TCL script for ligand tumbling analysis (it was used for a different ligand and needs to be modified for DHT if needed).
Please see detailed file type description for the folder herg_o2_e2b_zm50_namd_u01_r1_50/ above.
Folder herg_o2_dht_zm50_namd06_pullx_us0x/ - data for initial 50 ns-long equilibration and five 90-ns-long pulling (steered MD) runs of the hERG channel - dihydrotestosterone (DHT) interactions.\
Initial and final coordinates in the PDB and NAMD binary (.coor) formats, binary velocity (.vel) and system size (.xtc) files, protein structure files (.psf), NAMD input (.inp) and output (.out or .log) files., trajectory files for collective variables (.traj), force field parameter files (.par) are included for each US-MD run. Binary trajectory (.dcd) files are not included.
Subfolder CHARMM-GUI/ has files in text format generated by input generator, bilayer builder by www.charmm-gui.org
Subfolder initial/ contains CHARMM scripts (all in text format) to generate initial hERG channel - hormone structure
Subfolder namd06/ contains NAMD files for 50 ns-long equilibration
Subfolders namd_pullf1[1-5]/ contain NAMD files for five 90-ns-long pulling (steered MD) runs.
Subfolder namd_us02/ contains NAMD files for US-MD simulation (initial files and templates only).
Please see detailed description of file types for the description of the folder herg_o2_e2b_zm50_namd_u01_r1_50/ above and detailed instructions on setting up and running simulations in the file README.txt
Folder herg_o2_e2b_zm50_namd06_pullx_us0x/ - data for initial 50 ns-long equilibration and five 90-ns-long pulling (steered MD) runs of the hERG channel - estradiol (E2b) interactions.
Initial and final coordinates in the PDB and NAMD binary (.coor) formats, binary velocity (.vel) and system size (.xtc) files, protein structure files (.psf), NAMD input (.inp) and output (.out or .log) files., trajectory files for collective variables (.traj), force field parameter files (.par) are included for each US-MD run. Binary trajectory (.dcd) files are not included.
Subfolder CHARMM-GUI/ has files in text format generated by input generator, bilayer builder by www.charmm-gui.org
Subfolder initial/ contains CHARMM scripts (all in text format) to generate initial hERG channel - hormone structure
Subfolder namd06/ contains NAMD files for 50 ns-long equilibration
Subfolders namd_pullf1[1-5]/ contain NAMD files for five 90-ns-long pulling (steered MD) runs.
Subfolder namd_us02/ contains NAMD files for US-MD simulation (initial files and templates only).
Please see detailed description of file types for the description of the folder herg_o2_e2b_zm50_namd_u01_r1_50/ above and detailed instructions on setting up and running simulations in the file README.txt
Folder hormone_herg_docking/ - data for SILCS-MC fragment-based molecular docking the hERG channel - hormone and/or drug interactions.
Only final PDB files containing hERG channel and drug and/or hormone molecules for top-scoring (lowest interaction energy) binding poses are included. All files are in the text PDB format and can be read by any text editor, multiple molecular visualization software (VMD, USCF Chimera or ChimeraX, PyMol etc.)
Sharing/Access information
Links to other publicly accessible locations of the data: None
Data was derived from the following sources: None
Code/Software
All-atom molecular dynamics (MD) simulations were performed using nanoscale molecular dynamics (NAMD) software, version 2.13 or 2.14 and were analyzed using visual molecular dynamics (VMD) software version 1.9.3.
Fragment-based Site-Identification by Ligand Competitive Saturation (SILCS) molecular docking calculations used software 2022.2. SILCS fragment-generation MD simulation runs used Grimaces software version 2020.2 or later.
Methods
The dataset was collected using molecular dynamcis (MD) simulations and fragment-based Site-Identification by Ligand Competitive Saturation (SILCS) molecular docking calculations of the wild-type hERG potassium channel model based on the cryogenic electron microscopy (cryo-EM structure) with PDB ID: 5VA2. Pore and voltage sensing domain residues 405-668 were used in the model. Rosetta structural modeling was used for de novo missing loop building.
hERG channel models in a ~260 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine (POPC) lipid bilayer soaked by 0.15 M aqueous KCl solution and a hormone molecule initially placed in aqueous solution under . All simulations were run using NAMD 2.13 or later in the NPT ensemble at 310 K and 1 atm pressure using Nose-Hoover thermostat and Langevin piston barostat. Standard cutoff scheme and particle mesh Ewald (PME) will be used for non-bonded interactions. Standard CHARMM biomolecular force fields (C36 for lipids, CHARMM36m for protein and standard CHARMM ion parameters) and TIP3P water model were used for compatibility with our previous studies. One equilibration and 5 steered MD (SMD) “pulling” 90-ns long simulations were performed for each hormone (estradiol and dihydrotestosterone) using our previously developed protocol . The final frames from these simulations was be used as starting points for umbrealla sampling MD (US-MD) runs The simulation length was chosen to reach a well equilibrated system based on our prior experience. We needed to perform at least 5 SMD simulations with different initial conditions to avoid a bias in initial drug orientations in subsequent US-MD runs.
For US-MD simularions, we used hERG channel model in a POPC membrane solvated by 0.15 M aqueous KCl + 1 drug placed at different z positions along the channel axis in 0.5 Å intervals. The same force field and general MD simulation parameters as described above was used. US-MD simulations with harmonic restraints on drug center of mass (COM) with respect to Ca COM of hERG selectivity filter (SF) (624SVGFG) residues were performed to compute drug free energy and diffusion coefficient profiles across the channel pore. Weak 0.2 kcal/mol/Å2 positional restraints for the SF backbone and the whole pore domain Ca atoms were used as was done in our previous simulations to preserve a channel conformational state and also minimize their effect on drug binding. We used 90 US-MD simulation windows covering a range -50 £ z £ -5.5 Å, from the bottom of the SF and down through the pore, extending far enough into the solvent to get bulk-like free energy and diffusion coefficient values. Starting structures were taken from the 5 SMD simulations described above, choosing one of them at random for each z position to avoid an initial bias. 40 ns per window with an initial ~10 ns as equilibration, equivalent to amount of sampling used in similar studies and in our previous calculations, where 40 ns per US-MD window was barely sufficient to reach a desired convergence.
For SILCS simulations using 2022.2 software we used hERG channel models in a ~220 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine (POPC) & 13 cholesterol lipid bilayer soaked by 0.15 M aqueous KCl solution along with multiple molecules of 9 different co-solvents (methanol, benzene etc.). All simulations were run using Gromacs 2020.2 or later in the NPT ensemble at 310 K and 1 atm pressure. Standard cutoff scheme and particle mesh Ewald (PME) were used for non-bonded interactions. Standard CHARMM biomolecular force fields (C36 for lipids, CHARMM36m for protein and standard CHARMM27 ion parameters) and TIP3P water model were used for compatibility with our previous studies . Ten 100-ns MD simulations will be performed for each channel model as dictated by established SILCS protocol. Each 1 ns run is intervened by 200,000 steps of Grand Canonical Monte Carlo (GCMC) for co-solvent molecule insertion/deletion and translations/rotations. This was followed by SILCS MC docking of drug and/or hormone molecules into the hERG channel pore. Top-scoring (most favorable binding free energy) hERG channel - drug structures are available for further analysis.
Usage notes
PDB files contaning molecular structures can be opened with multiple molecular modeling and visualization software such as VMD, UCSF Chimera or ChimeraX, PyMol etc.
Protein structure files (PSF) containing system topologies and molecular dynamics (MD) simulation trajectory files in the binary DCD format can be opened using VMD or PyMol software.
Input and output files can be opened using any text editor.