Data and code from: Decoding protein–membrane binding interfaces from surface-fingerprint-based geometric deep learning and molecular dynamics simulations
Data files
Feb 11, 2026 version files 10.49 GB
-
data.tar.gz
10.49 GB
-
README.md
5.42 KB
Abstract
Predicting protein–membrane interactions is a formidable challenge due to the subtle physicochemical features that distinguish membrane-binding regions of a protein surface, as well as the scarcity of experimentally resolved membrane-bound protein conformations. Here, we present MaSIF-PMP, a geometric deep learning model that leverages molecular surface fingerprints to predict interfacial binding sites (IBSs) of peripheral membrane proteins (PMPs). MaSIF-PMP integrates geometric and chemical surface features to produce spatially resolved IBS predictions. Compared to existing models, MaSIF-PMP achieves superior performance for IBS classification, while feature ablation studies and transfer learning analyses reveal distinct determinants governing protein–membrane versus protein–protein interactions. We further show that molecular dynamics (MD) simulations can validate model predictions, refine IBS labels, and capture composition-dependent membrane binding patterns. These results establish MaSIF-PMP as an effective framework for IBS prediction and highlight the potential of incorporating conformational dynamics from MD to improve both model accuracy and biological interpretability.
Author: ByungUk Park
README generated on: 10/21/2025
This archive (data.tar.gz) contains all data relevant to molecular dynamics (MD) simulations of peripheral membrane proteins (PMPs) with HMMM (Highly Mobile Membrane Mimetic) model systems, as described in the following work:
B. Park and R. C. Van Lehn. (2025). Decoding protein-membrane binding interfaces from surface-fingerprint-based geometric deep learning and molecular dynamics simulations. bioRxiv. https://doi.org/10.1101/2025.10.14.682447.
The archive also includes scripts for running the simulations and generating MD-based membrane-binding interface labels. All simulations were performed using GROMACS 2021.5.
File Types & Software Requirements
Among the simulation files, some (.gro, .ndx, .top, .itp, .mdp and .xvg) are plain text files that can be opened with standard text editors, while others (.tpr, .xtc) are binary files that require GROMACS software to read or execute.
.tpr: Simulation input (coordinates, parameters, etc.).ndx: List of indices of atom/residue groups.top: System topology and parameters.itp: Modular components of a full topology (parameters, atom types, bonds, angles,dihedrals, etc.).mdp: Simulation control parameters (simulation time, temperature, pressure, etc.).gro: System structure (coordinates, atom type, box size).xtc: Compressed trajectory
Software & package (tested version in parentheses):
- GROMACS (2021.5)
- Python (3.10)
- MDTraj (1.10.1)
- NumPy (1.24.4)
- pandas (1.5.3)
Directory Structure
1.alpha-tocopherol_transfer_protein/
Contains all input and output files for simulations of α-tocopherol transfer protein (α-TTP) with HMMM membranes.
Includes .gro, .itp, .mdp, .xtc files, and the interface labels (.csv) derived from production-phase trajectories.
Structure:
input_files/- Input files for replica simluations of each initial orientation (
orientation_#/subdirectories) - Each
orientation_#/dicrectory includes:- Initial structure (
.gro), index (.ndx), topology (.top), and simulation parameter (.mdp) files generated by CHARMM-GUI toppar/— force field and topology parameter files
- Initial structure (
- Input files for replica simluations of each initial orientation (
output_files/- Output files for replica simulations of each orientation (
orientation_#/replica#subdirectories) - Contains energy minimization, equilibration, and production outputs
- Output files for replica simulations of each orientation (
traj_based_iface/- Consensus binding interface labels determined from production-phase trajectories, generated via
traj_based_pmp_iface_consensus_label.ipynb - Subdirectories:
orientation_#/— consensus labels per orientation (3W67_A_consensus_iface.csv) in MaSIF-PMP-compatible formatunion/— union of consensus labels across all orientations (3W67_A_union_iface.csv)
- Consensus binding interface labels determined from production-phase trajectories, generated via
3W67_A__iface_0.5nm.csv- Ground-truth membrane-binding interface derived from the final snapshot of MD simulation (via
pmp_iface_from_coord.ipynb) - Formatted for MaSIF-PMP compatibility
- Ground-truth membrane-binding interface derived from the final snapshot of MD simulation (via
3w67_mem_bound.pdb- Membrane-bound conformation of α-TTP (orientation 1, replica 1)
2.phospholipase_A2/
Contains all input and output files for phospholipase A2 simulations with anionic and zwitterionic HMMM membranes.
Subdirectories:
anionic_memb/zwitterionic_memb/
Each subdirectory follows the same organization as described in 1.alpha-tocopherol_transfer_protein.
(Only single-replica simulations were run for each orientation; thus directories are named orientation_#/ instead of orientation_#/replica#.)
3.glycosyl_hydrolase/
Contains input and output files for simulations of glycosyl hydrolase with HMMM membranes, following the same organizational structure described above.
4.oxysterol-binding_protein_homologue/TGN/
Contains input and output files for simulations of oxysterol-binding protein homologue (Osh4) with TGN-like HMMM membranes.
Same structure as above, with additional files:
1ZHZ_A_TGN_iface_0.5nm.csv- Ground-truth interface from the final MD snapshot (
pmp_iface_from_coord.ipynb) - MaSIF-PMP compatible format
- Ground-truth interface from the final MD snapshot (
1ZHZ_A_TGN_iface_paper.csv- Ground-truth interface from previous MD studies of Osh4–TGN membrane interactions
1zhz_tgn_hmmm.pdb- Membrane-bound conformation sampled from the simulation
CSV files contain the following variables:
- cathpdb
- pdb
- residue_name
- IBS
- chain_id
- residue_number
Scripts
Scripts for running simulations and generating MD-based membrane-binding interface labels are provided below.
pmp_iface_from_coord.ipynb- Generates interface labels from simulation coordinate files (
.gro) - Update variables before use:
root,sys_name,PDB_ID,hmmm_type, etc.
- Generates interface labels from simulation coordinate files (
traj_based_pmp_iface_consensus_label.ipynb- Generates production-phase trajectory-based interface labels (
.xtc) - Update variables before use:
root,sys_name,PDB_ID,hmmm_type, etc.
- Generates production-phase trajectory-based interface labels (
submit_runs_charmmgui.sh-
Bash script for running energy minimization, equilibration, and production using CHARMM-GUI–generated inputs
-
Update parameters before running:
root_dir: path to the target systemOMP_NUM_THREADS: number of threads to match GPU/CPU configuration
-
Run command:
$ bash submit_runs_charmmgui.sh
-
