The origin of life is believed to have progressed through an RNA world, in which RNA acted as both genetic material and functional molecules. The structure of the evolutionary fitness landscape of RNA would determine natural selection for the first functional sequences. Fitness landscapes are the subject of much speculation, but their structure is essentially unknown. Here we describe a comprehensive map of a fitness landscape, exploring nearly all of sequence space, for short RNAs surviving selection in vitro. With the exception of a small evolutionary network, we find that fitness peaks are largely isolated from one another, highlighting the importance of historical contingency and indicating that natural selection would be constrained to local exploration in the RNA world.
Replicate 'm'; FASTQ data for selection rounds 0-4
This archive holds FASTQ-formatted sequencing data from replicate 'm' (from May, repl. 1 of 2) of a selection for GTP-binding aptamers; it contains 3 files, each with sequences from 1 or 2 rounds of the selection. See README.txt for more information.
may-fastq.tar.bz2
Replicate 'j'; FASTQ data for selection rounds 0-4
This archive holds FASTQ-formatted sequencing data from replicate 'j' (from July, repl. 2 of 2) of a selection for GTP-binding aptamers; it contains 2 files, each with sequences from 2 or 3 rounds of the selection. See README.txt for more information.
july-fastq.tar.bz2
Selection Pool Statistics
This archive includes 6 files which contain statistics regarding the first (May) and second (July) replicate selection pools, including transition probabilities (used to model the selection pools) and the frequency of 5’ and 3’ hexamers (used to correct ligation bias). See README.txt for more information.
r0-stats.tar
Fitness Landscapes
This MS Excel document includes two spreadsheets, each containing the fitness landscape from one of the two replicate selections (May and July). In each landscape, fitness peaks are collections of related sequences and their associated fitness values. Fitness is derived from sequence abundance after correcting for: sequencing errors, bias introduced by sequencing adapter ligation, and non-uniform sequence distribution in the initial pool. Each sequence is listed along with the following information: peak rank, sequence rank (within its peak), label (indicating that peak's rank in both replicates), fitness value (with standard deviation), and edit distance from the peak center. Alignments for each peak are also given. Fitness peaks are presented in descending order of fitness (measured at the peak center), as are sequences within each peak.
MayJuly_FitnessLandscapes.xlsx