Data from: Bayesian estimation of muscle mechanisms and therapeutic targets using variational autoencoders

Tune, Travis 1 ; Kooiker, Kristina1 ; Davis, Jennifer1 ; Daniel, Thomas1 ; Moussavi-Harami, Farid1

Research facility: University of Washington

Published Mar 06, 2025 on Dryad. https://doi.org/10.5061/dryad.d51c5b0bj

Data files

Mar 06, 2025 version files 5.42 GB

Abstract

Cardiomyopathies, often caused by mutations in genes encoding muscle proteins, are traditionally treated by phenotyping hearts and addressing symptoms post irreversible damage. With advancements in genotyping, early diagnosis is now possible, potentially preventing such damage. However, the intricate structure of muscle and its myriad proteins make treatment predictions challenging. Here we approach the problem of estimating therapeutic targets for a mutation in mouse muscle using a spatially explicit half sarcomere muscle model. We selected 9 rate parameters in our model linked to both small molecules and cardiomyopathy-causing mutations. We then randomly varied these rate parameters and simulated an isometric twitch for each combination to generate a large training dataset. We used this dataset to train a Conditional Variational Autoencoder (CVAE), a technique used in Bayesian parameter estimation. This repository contains the training and testing datasets we used in the associated research article.

https://doi.org/10.5061/dryad.d51c5b0bj

Description of the data and file structure

Data From: Identifying mechanisms and therapeutic targets in muscle using Bayesian parameter estimation with conditional variational autoencoders

https://doi.org/10.5061/dryad.d51c5bremote0bj

Description of the data and file structure

These simulations were performed using our model, located at https://github.com/travistune3/multifil_five_state.

Each rate combination was simulated 50 times with each twitch being 1000 ms in length, then force (pN) was averaged to form a single twitch and converted to stress (mN/mm2) using the cross sectional area of the simulated half sarcomere. We split off 1% of the simulations as testing data. We then found the mean and standard deviation of the training set and used them to scale both the training and testing datasets. The transformed datasets are what is recorded here, as well as the scaling factors, which can be used to restore the dataset to stress (mN/mm2).

Data is provided as .pt files, associated with Pytorch, which is also what our ML method is written in. Pytorch is free and instruction to download it can be found at https://pytorch.org/get-started/locally/. The datasets .pt files can be opened using Pytorch and contains a list of tensors corresponding to the data and labels. Data vectors are single column time series data, and labels are a vector corresponding to the 9 rate vectors. E.g. data, labels = dataset[i] corresponds to observation i, with 'data' being stress and 'labels' being the rate factors.

The scaler file is a python scaler object from python's sklearn, also free and available at https://scikit-learn.org/stable/install.html.

Once the dataset, pytorch, and sklearn are downloaded, you can load the datasets with torch.load, and the scaler (.bin file) with python’s build in load function. Alternatively, we have provided code at https://github.com/travistune3/CVAE. The script ‘cvae_multifil.py’ contains all the code necessary to view the data load the pre-trained model (CVAE.py, also in the github link), or train new models, just change the path to files to the downloaded files on your computer.

Both training and testing datasets were scaled (z-scored) using the training datasets mean and variance, and the data shown has already been transformed. The datasets can be restored to ‘real’ units of mN/mm2 using the function scaler.inverse_transform().

The rate factors indicated are the log of the actual factors, since we wanted a log uniform scale. Therefore the rate [0,0,0...,0] corresponds to the default rate e.g., 10^0 = 1 indicating the base rates are multiplied by 1x.

The file 'Experimental_stress_data.csv' contains experimental data take from mice cardiac muscle. The columns are stress and label, with label corresponding to the treatment: 'control' (or wild type), 'i61Q', 'danicamtiv'. 'Old' refers to the fact that some data was first published in https://doi.org/10.1161/circresaha.123.322629, 'new' refers to data first reported in the article here: https://doi.org/10.1016/j.bpj.2024.11.3310.

Files and variables

File: std_scaler.bin

Description: scaler containing the mean and variance of the data, which we used to z score the data set prior to training. Requires sklearn.

File: testing_dataset.pt

Description: Pytorch dataset object, contains the test dataset. Open with torch.load from pytorch https://pytorch.org/get-started/locally/

File: training_dataset.pt

Description: Pytorch dataset object, contains the train dataset. Open with torch.load from pytorch https://pytorch.org/get-started/locally/

File: Experimental_stress_data.csv

Description: Experimental dataset we compared to

Variables

stress: stress in mN/mm2
label: rate factors we tried to infer, The rate factors indicated are the log of the actual factors, since we wanted a log uniform scale. Therefore the rate [0,0,0...,0] corresponds to the default rate e.g., 10^0 = 1 indicating the base rates are multiplied by 1x.

Code/software

Python

Pytorch

Sklearn

https://github.com/travistune3/CVAE

Access information

Data was derived from the following sources: