Data from: Analysis of statistical correlations between properties of adaptive walks in fitness landscapes

Published Jan 16, 2020 on Dryad. https://doi.org/10.5061/dryad.41ns1rn9r

Data files

Jan 16, 2020 version files 8.96 MB

Abstract

The fitness landscape metaphor has been central in our way of thinking about adaptation. In this scenario, adaptive walks are an idealized dynamics that mimics the uphill movement of an evolving population towards a fitness peak of the landscape. Recent works in experimental evolution have demonstrated that the constraints imposed by epistasis are responsible for reducing the number of accessible mutational pathways towards fitness peaks. Here we exhaustively analyze the statistical properties of adaptive walks for two empirical fitness landscapes and for theoretical NK landscapes. Some general scenario can be drawn from our simulation study. Regardless the dynamics, we observe that the shortest paths are more regularly used. Although the accessibility of a given fitness peak is reasonably correlated to the number of monotonic pathways towards it, the two quantities are not exactly proportional. A negative correlation predictability and mean path divergence is established, and so with the decrease of the number of effective mutational pathways ensues the convergence of the attraction basin of fitness peaks. On the other hand, other features are not conserved among fitness landscapes, such as the relationship between accessibility and predictability.

This repository contains the datasets of the different fitness landscapes and the codes to generate the adaptive walks presented in the paper Analysis of statistical correlations between properties of adaptive walks in fitness landscapes.

There are three zipped folders in this repository, namely AdaptiveWalks_Codes, Empirical_Landscapes and NK_Samples. In the following, we describe the contents of each folder.

AdaptiveWalks_Codes:

1) The code Adaptivewalk-random-hsp90.cpp generates random adaptive walks in the Hsp90 landscape. The empirical data from the Hsp90 fitness landscape can be obtained from C. Bank et at. PNAS 113, 14085 (2016).

As output one has estimates for mean walk length, predictability, mean path divergence, accessibility for each local optimum; and finally fitness values of those local optima.

The data about fitness values and connectivities of the sequences are already embedded in the code, whereas for the estimate of the mean path divergence the calculation of hamming distance between all pairs of sequences is obtained from the file hamming_distance_tres_colunas.txt

To compile the code:

c++ -O3 AdaptiveWalk-random-hsp90.cpp -o AdaptiveWalk-random-hsp90 -lm -lgsl -lgslcblas

To run the code:

./script-random-hsp90

To change the number of adaptive walks just change the script

2) The code Adaptivewalk-prob-hsp90.cpp generates probabilistic adaptive walks in the Hsp90 landscape.

The instructions are the same as the ones for the random version.

3) The code Adaptivewalk-hsp90-randomfreq.cpp generates random adaptive walks in the Hsp90 landscape to calculate the frequency of the mutational pathways produced through the dynamics. Note that the same information is used in 1), but in case one needs a better statistics for the evaluation of the path frequencies, which also warrants that a minimum number of walks is satisfied for every local optimum, the code is more appropriate. The input here is this minimum number of walks terminating at the least visited local optimum.

To compile the code:

c++ -O3 AdaptiveWalk-hsp90-randomfreq.cpp -o AdaptiveWalk-hsp90-randomfreq -lm -lgsl -lgslcblas

To run the code:

./script-random-hsp90-freq

4) The code Adaptivewalk-hsp90-probfreq.cpp generates probabilistic adaptive walks in the Hsp90 landscape to calculate the frequency of the mutational pathways produced through the dynamics. So, the remaining information is exactly the same as in 3).

To compile the code:

c++ -O3 AdaptiveWalk-hsp90-probfreq.cpp -o AdaptiveWalk-hsp90-probfreq -lm -lgsl -lgslcblas

To run the code:

./script-probabilistic-hsp90-freq

5) The code Adaptivewalk-GB1-randomfreq.cpp generates random adaptive walks in the GB1 fitness landscape to calculate the frequency of the mutational pathways produced through the dynamics, but also mean walk length, predictability, mean path divergence and accessibility for each local optimum. Likewise, the code warrants that a minimum number of walks is satisfied for every local optimum, which is provided in the script. This is bit tricky, as one of the local optimum of the GB1 landscape is poorly visited through the walks starting at the wild type sequence. All the information about the GB1 landscape is provided by the processed information and contained in the files elife_seq_number.txt, elife_sequence_degree.txt, elife_sequence_fitness.txt and elife_sequence_neighbors_correta.txt. The latter one could not be uploaded (430 Mb), but all those files can be generated from the code available in the folder Gb1_input_files (please have a look at the file readme.md), which will handle the original data from the manuscript by Wu et al. Elife 5, e16965 (2016).

To compile the code:

c++ -O3 AdaptiveWalk-GB1-randomfreq.cpp -o AdaptiveWalk-GB1-randomfreq -lm -lgsl -lgslcblas

To run the code:

./script-GB1-randomfreq

6) The code Adaptivewalk-GB1-probfreq.cpp generates probabilistic adaptive walks in the GB1 fitness landscape to calculate the frequency of the mutational pathways produced through the dynamics, but also mean walk length, predictability, mean path divergence and accessibility for each local optimum. The remaining information is exactly the same as in 5).

To compile the code:

c++ -O3 AdaptiveWalk-GB1-probfreq.cpp -o AdaptiveWalk-GB1-probfreq -lm -lgsl -lgslcblas

To run the code:

./script-GB1-probabilistic_freq

Empirical_Landscapes:

In this folder we present the empirical landscapes we have used in our study. Each file has two columns. The fist column presents the sequence and the second column presents the respective fitness value.

The file HSP90_fitness_landscape.txt contains the Hsp90 empirical landscape from C. Bank, S. Matuszewski, R. T. Hietpas, and J. D. Jensen, Proceedings of the National Academy of Sciences 113, 14085 (2016).

The code cleaning_file_HSP90.py reads the data from the HSP90 fitness landscape and creates the ring structure seen in Figure 1 of the manuscript. This figure can be generated by downloading the files HSP90_fitness_landscape.txt and cleaning_file_HSP90.py to the same folder and running the script cleaning_file_HSP90.py.

The file GB1_fitness_landscape.txt contains the Gb1 empirical landscape from N. C. Wu, L. Dai, C. A. Olson, J. O. Lloyd-Smith, and R. Sun, Elife 5, e16965 (2016).

NK_Samples:

This folder contains the 10 samples of the NK landscapes used to generate the correlation matrix, in which N = 8 and K = 1, 2 or 3. Each file has two columns.

The first column gives the decimal representation of the binary sequence of length N. The second column gives the fitness value of the respective sequence.

Data from: Analysis of statistical correlations between properties of adaptive walks in fitness landscapes

Data files

Abstract

Usage notes

Works referencing this dataset