Data from: Bayesian estimation of muscle mechanisms and therapeutic targets using variational autoencoders
Data files
Mar 06, 2025 version files 5.42 GB
-
Experimental_stress_data.csv
390.36 KB
-
README.md
5.15 KB
-
std_scaler.bin
23.01 KB
-
testing_dataset.pt
54.18 MB
-
training_dataset.pt
5.36 GB
Abstract
Cardiomyopathies, often caused by mutations in genes encoding muscle proteins, are traditionally treated by phenotyping hearts and addressing symptoms post irreversible damage. With advancements in genotyping, early diagnosis is now possible, potentially preventing such damage. However, the intricate structure of muscle and its myriad proteins make treatment predictions challenging. Here we approach the problem of estimating therapeutic targets for a mutation in mouse muscle using a spatially explicit half sarcomere muscle model. We selected 9 rate parameters in our model linked to both small molecules and cardiomyopathy-causing mutations. We then randomly varied these rate parameters and simulated an isometric twitch for each combination to generate a large training dataset. We used this dataset to train a Conditional Variational Autoencoder (CVAE), a technique used in Bayesian parameter estimation. This repository contains the training and testing datasets we used in the associated research article.
https://doi.org/10.5061/dryad.d51c5b0bj
Description of the data and file structure
Data From: Identifying mechanisms and therapeutic targets in muscle using Bayesian parameter estimation with conditional variational autoencoders
https://doi.org/10.5061/dryad.d51c5bremote0bj
Description of the data and file structure
These simulations were performed using our model, located at https://github.com/travistune3/multifil_five_state.
Each rate combination was simulated 50 times with each twitch being 1000 ms in length, then force (pN) was averaged to form a single twitch and converted to stress (mN/mm2) using the cross sectional area of the simulated half sarcomere. We split off 1% of the simulations as testing data. We then found the mean and standard deviation of the training set and used them to scale both the training and testing datasets. The transformed datasets are what is recorded here, as well as the scaling factors, which can be used to restore the dataset to stress (mN/mm2).
Data is provided as .pt files, associated with Pytorch, which is also what our ML method is written in. Pytorch is free and instruction to download it can be found at https://pytorch.org/get-started/locally/. The datasets .pt files can be opened using Pytorch and contains a list of tensors corresponding to the data and labels. Data vectors are single column time series data, and labels are a vector corresponding to the 9 rate vectors. E.g. data, labels = dataset[i] corresponds to observation i, with 'data' being stress and 'labels' being the rate factors.
The scaler file is a python scaler object from python's sklearn, also free and available at https://scikit-learn.org/stable/install.html.
Once the dataset, pytorch, and sklearn are downloaded, you can load the datasets with torch.load, and the scaler (.bin file) with python’s build in load function. Alternatively, we have provided code at https://github.com/travistune3/CVAE. The script ‘cvae_multifil.py’ contains all the code necessary to view the data load the pre-trained model (CVAE.py, also in the github link), or train new models, just change the path to files to the downloaded files on your computer.
Both training and testing datasets were scaled (z-scored) using the training datasets mean and variance, and the data shown has already been transformed. The datasets can be restored to ‘real’ units of mN/mm2 using the function scaler.inverse_transform().
The rate factors indicated are the log of the actual factors, since we wanted a log uniform scale. Therefore the rate [0,0,0...,0] corresponds to the default rate e.g., 10^0 = 1 indicating the base rates are multiplied by 1x.
The file 'Experimental_stress_data.csv' contains experimental data take from mice cardiac muscle. The columns are stress and label, with label corresponding to the treatment: 'control' (or wild type), 'i61Q', 'danicamtiv'. 'Old' refers to the fact that some data was first published in https://doi.org/10.1161/circresaha.123.322629, 'new' refers to data first reported in the article here: https://doi.org/10.1016/j.bpj.2024.11.3310.
Files and variables
File: std_scaler.bin
Description: scaler containing the mean and variance of the data, which we used to z score the data set prior to training. Requires sklearn.
File: testing_dataset.pt
Description: Pytorch dataset object, contains the test dataset. Open with torch.load from pytorch https://pytorch.org/get-started/locally/
File: training_dataset.pt
Description: Pytorch dataset object, contains the train dataset. Open with torch.load from pytorch https://pytorch.org/get-started/locally/
File: Experimental_stress_data.csv
Description: Experimental dataset we compared to
Variables
- stress: stress in mN/mm2
- label: rate factors we tried to infer, The rate factors indicated are the log of the actual factors, since we wanted a log uniform scale. Therefore the rate [0,0,0...,0] corresponds to the default rate e.g., 10^0 = 1 indicating the base rates are multiplied by 1x.
Code/software
Python
Pytorch
Sklearn
https://github.com/travistune3/CVAE
Access information
Data was derived from the following sources:
We generated this dataset by using our spatially explicit muscle model, available at https://github.com/travistune3/multifil_five_state. In this model, the myosin containing thick filaments and actin containing thin filaments are composed of a series of springs, and crossbridge formation and state changes are tracked for each myosin-actin pair individually. Crossbrdige kinetics of each head can be modified. We randomly generated rate factors over a log uniform scale from 10^-1 to 10^2, and multiplied those rate factors by the 'default' rates. We did this for 9 total rates from both the myosin motors and actin binding sites. We then took the rate factor combination and simulated the twitch which resulted from that rate 50 times and averaged for a final twitch. We then did this 10^6 times to form our training dataset.
We record here the rate factors, 'default' rate, and the training and testing split used.
