Molecular evolution can be conceptualized as a walk over a 'fitness landscape', or the function of fitness (e.g., catalytic activity)over the space of all possible sequences. Understanding evolution requires knowing the structure of the fitness landscape and identifying the viable evolutionary pathways through the landscape. However, the fitness landscape for any catalytic biomolecule is largely unknown. The evolution of catalytic RNA is of special interest because RNA is believed to have been foundational to early life. In particular, an essential activity leading to the genetic code would be the reaction of ribozymes with activated amino acids, such as5(4H)-oxazolones, to form aminoacyl-RNA. Here we combine in vitro selection with a massively parallel kinetic assay to map a fitness landscape for self-aminoacylating RNA, with nearly complete coverage of sequence space in a central 21-nucleotide region. The method (SCAPE: sequencing to measure catalytic activity paired with in vitro evolution) shows that the landscape contains three major ribozyme families (landscape peaks). An analysis of evolutionary pathways shows that, while local optimization within a ribozyme family would be possible, optimization of activity over the entire landscape would be frustrated by large valleys of low activity. The sequence motifs associated with each peak represent different solutions to the problem of catalysis, so the inability to traverse the landscape globally corresponds to an inability to restructure the ribozyme without losing activity. The frustrated nature of the evolutionary network suggests that chance emergence of a ribozyme motif would be more important than optimization by natural selection.
Correlation of Fitness Effects - script
Python script to calculate the correlation of fitness effects in a given ribozyme family.
ActivityCorrelationGamma.py
Correlation of Fitness Effects - source file
Excel source file used as input for the script ActivityCorrelationGamma.py.
ActivityObservedData.xlsx
k-seq Analysis - script
Python script to calculate catalytic kinetics for a population of sequences, using the k-Seq methodology.
kseq_tools_v01.py
k-seq Analysis - example input files
Input files for k-seq Analysis: kseq_rounds is example-rounds.txt, normalization_list is example-normalization.txt, substrate_concs is example-subst-concs.txt, rounds_to_average is example-rnds-to-avg.txt and rounds_to_error is example-rnds-to-err.txt.
example_input_files.zip
Pathways between Peak Sequences - script
Python script to searche for the shortest pathway between two sequences along a fitness landscape.
peak_pather_v01.py
Fastq files for original selection
Fastq files pertaining to sequences present in rounds of selection, from the original BYO aminoacylase ribozyme selection. Conserved primer regions have been removed; sequences correspond only to the randomized internal region of the library.
selection-branch1-fastqs.zip
Count reads files for original selection
Text files pertaining to sequences present in rounds of selection, from the original BYO aminoacylase ribozyme selection. Each file contains round information in its header, followed by sequences and the number of times they appeared in that round.
selection-branch1-counts.zip
Fastq files for second selection
Fastq files pertaining to sequences present in rounds of selection, from the second BYO aminoacylase ribozyme selection.
selection-branch2-fastqs.zip
Count reads files for second selection
Text files pertaining to sequences present in rounds of selection, from the second BYO aminoacylase ribozyme selection. Each file contains round information in its header, followed by sequences and the number of times they appeared in that round.
selection-branch2-counts.zip
Count reads files for post-kseq populations
Text files pertaining to sequences present in post-kseq populations, from the kseq assay performed at the end of selection. A, B, C etc. pertain to replicates; 1-indicates 250uM BYO, 2-indicates 50uM, 3-indicates 10uM, 4-indicates 2uM. Each file contains round information in its header, followed by sequences and the number of times they appeared in that round.
kseq-counts.zip