Code from: Negative frequency-dependent selection: A positive outlook with deep learning
Data files
May 15, 2026 version files 81.77 KB
-
anc_predict_parameters.py
5.24 KB
-
confusion_multiROC.R
6.41 KB
-
confusion_roc.R
9.32 KB
-
get_avg_img.py
2.47 KB
-
get_indivs_ms.py
3.69 KB
-
gravel_AFR_neutral.slim
2.62 KB
-
gravel_AFR_NFDS.slim
3.90 KB
-
gravel_AFR_overdom.slim
3.80 KB
-
image_generation_ms2.py
5.17 KB
-
image_generation_vcf2.py
3.67 KB
-
predict_ancient_s.py
6.15 KB
-
predict_s.py
6.15 KB
-
predict_s2.py
6.17 KB
-
predict_x.py
6.16 KB
-
README.md
2.36 KB
-
select_vcf.py
1.12 KB
-
Snakefile
1.40 KB
-
train_multiclass.py
5.98 KB
Abstract
Balancing selection is a mode of natural selection that maintains genetic diversity through an array of mechanisms, including negative frequency-dependent selection. However, discriminating genomic footprints of negative frequency-dependent selection from those of other forms of balancing selection mechanisms is a difficult task. In this perspective, we will present directions on how to enhance the modeling of genomic signals expected from negative frequency-dependent selection to better distinguish it from neutrality and other forms of balancing selection, such as overdominance. Specifically, we demonstrate how deep learning can facilitate detection and characterization of this process through novel data preprocessing and modeling of genomic and temporal autocovariation. We also provide a series of recommendations to empiricists and method developers on how to positively approach the problem of identifying genomic footprints of negative frequency-dependent selection in the future.
Dataset DOI: 10.5061/dryad.w3r22813q
Description of the data and file structure
Genomic footprints of negative frequency-dependent selection
This repository accompanies the study exploring how negative frequency-dependent selection (NFDS) leaves distinct genomic signatures that can be detected and distinguished from other forms of balancing selection and neutrality. Our work focuses on improving the modeling of genomic and temporal autocovariation patterns through deep learning approaches.
Contents
The repository includes:
- SLiM simulation scripts for generating temporal genomic datasets under:
- Negative frequency-dependent selection (NFDS) - gravel_AFR_NFDS.slim
- Overdominance - gravel_AFR_overdom.slim
- Neutrality - gravel_AFR_neutral.slim
- Snakemake workflow to automate simulation and preprocessing - Snakefile
- Custom-adjusted TrIdent software for extracting and formatting temporal autocovariance features for machine learning,
- anc_predict_parameters.py,
- image_generation_ms2.py,
- image_generation_vcf2.py,
- predict_ancient_s.py,
- predict_s.py,
- predict_s2.py,
- predict_x.py,
- train_multiclass.py,
- confusion_multiROC.R (Plotting script)
- confusion_roc.R (Plotting script)
- get_avg_img.py (Heatplot script)
- Scripts to sample from simulated datasets reflecting YRI-like demographic scenarios with:
- Present-day samples
- Ancient samples
- Combined ancient and present-day samples
- Scripts: get_indivs_ms.py, select_vcf.py
Each scenario includes:
- 1000 replicates for training
- 1000 replicates for validation
- 1000 replicates for testing
Purpose
These resources support our investigation into distinguishing NFDS genomic signals using deep learning, with an emphasis on:
- Data preprocessing innovations
- Modeling temporal and spatial autocovariation
- Recommendations for future empirical and methodological research
Citation
If you use this repository, please cite our associated publication:
Santander CG, Campelo dos Santos AL, Arnab SP, Fumagalli M, DeGiorgio M (2025). Negative frequency-dependent selection: a positive outlook with deep learning. Philosophical Transactions of the Royal Society B.
