Predicting arrhythmia recurrence post-ablation in atrial fibrillation using explainable machine learning: Code repository

Bifulco, Savannah F.1; Magoon, Matthew J.1 ; Chahine, Yaacoub1; Kim, Issac1; Macheret, Fima1; Akoum, Nazem1 ; Boyle, Patrick M.1

Research facility: University of Washington

Published Jul 15, 2025 on Dryad. https://doi.org/10.5061/dryad.4tmpg4fp9

Data files

Jul 15, 2025 version files 47.50 KB

README.md

3.42 KB
xML_repo.zip

44.08 KB

Abstract

Background: Following atrial fibrillation ablation, it is challenging to distinguish patients who will remain arrhythmia-free from those at risk for recurrence. New explainable machine learning (xML) techniques allow for systematic assessment of arrhythmia recurrence risk following catheter ablation. We aim to develop an xML algorithm that predicts recurrence and reveals key risk factors to facilitate better follow-up strategy after an ablation procedure.

Methods: We reconstructed pre-and post-ablation models of the left atrium (LA) from late gadolinium enhanced magnetic resonance (LGE-MRI) for 67 patients. Patient-specific features (LGE-based measurements of pre/post-ablation arrhythmogenic substrate, LA geometry metrics, computational simulation results, and clinical risk factors) trained a random forest classifier to predict recurrent arrhythmia. We calculated each risk factor’s marginal contribution to model decision making via SHapley Additive exPlanations (SHAP). Here we provide code for xML model training, validation, and explanation in our associated publication "Predicting arrhythmia recurrence post-ablation in atrial fibrillation using explainable machine learning" in Communications Medicine. This code serves to train and test a random forest classifier and then applies SHAP analysis offers explanations of model classifications.

Results: The classifier accurately predicts post-ablation arrhythmia recurrence (mean receiver operating characteristic [ROC] area under the curve [AUC]: 0.80±0.04; mean precision-recall [PR] AUC: 0.82±0.08). SHAP analysis reveals that of 89 features tested, the key population risk factors for recurrence are: large left atrium, low LGE-quantified post-ablation scar in the atrial floor region, and previous attempts at direct current cardioversion. We also examine patient-specific recurrence predictions, since xML allows us to understand why a particular individual can have large prediction weights for some categories without tipping the balance towards an incorrect prediction. Finally, we validate our model in a completely new, 15-patient retrospective holdout cohort (80% correct).

Conclusion: Our SHAP-based explainable machine learning approach is a proof-of-concept clinical tool to explain arrhythmia recurrence risk in patients who underwent ablation by combining patient-specific clinical profiles and LGE-derived data.

Dataset DOI: 10.5061/dryad.4tmpg4fp9

Description of the data and file structure

Explainable machine learning code

Code used for random forest classifier model development, testing, and SHAP analysis-based explanations through the associated publication "Predicting arrhythmia recurrence post-ablation in atrial fibrillation using explainable machine learning" published in Communications Medicine.

Installation

Written for Python version 3.10.11.

Use pip (https://pypi.org/) to install dependencies from requirements.txt.

Model development, explanations, and testing

Train a model and create explanations by updating the BASE_DIR (lines 34-36) of train_explain_cv.py

Test a model on an entirely new dataset by updating the BASE_DIR (lines 34-36) of external_validation.py

These scripts are meant to provide minimum working examples to aid in reproducing the analysis presented in the referenced publication.

Files and variables

File: xML_repo.zip

Unzipped file contents and descriptions:

Note, runs in Python version 3.10.11.

README.md: A simplified README file.
requirements.txt: Dependencies and version information.
input_data.xlsx: An empty data file demonstrating the expected data formatting.
train_explain_cv.py: Code to train a random forest classifier and explain classifications via SHAP analysis.
- BASE_DIR: Replace the ... ellipsis with the path to a directory to read data from and output results to. Can be os.path.split(__file__)[0] to reference the script's parent directory.
- INPUT_PATH: The path to a .xlsx file of training data.
- SHEET_NAME: The sheet within the INPUT_PATH .xlsx file to read training data from.
external_validation.py: Similar to train_explain_cv.py. Reads a new dataset to test a random forest model developed by train_explain_cv.py.

Other file information:

data_formatting/: Contains scripts to help with pre-processing data (i.e., converting "Yes" and "No" to numerical values 1 and 0).
feature_selection/: Contains scripts to use LASSO regression for feature selection.
train/: Contains scripts for training a random forest model.
predict/: Contains scripts for using a random forest model to make predictions.
explain/: Contains scripts for explaining the random forest model with SHAP analysis.
utils/: Contains various other scripts required.

Code/software

Python version 3.10.11 (https://www.python.org/) is required to run this software. Other dependencies and their versions are provided in the requirements.txt file.

The only scripts that should be run directly are train_explain_cv.py and external_validation.py. Other scripts provide methods and objects to support the operations of these two scripts.## Access information

Python (https://www.python.org/) is a free and openly accessible programming language. Reference licensing information for Python and the dependencies in requirements.txt before using this software. This code offers a way to use the accessible python packages scikit-learn and shap for explainable machine learning, although these packages are already user-friendly.