Emergent Properties as By-products of Prebiotic Evolution of Aminoacylation Ribozymes
Data files
Jun 08, 2022 version files 85.40 GB
-
BFO_k-seq.zip
12 GB
-
BFO_selection.zip
4.38 GB
-
BIO_k-seq.zip
11.56 GB
-
BLO_k-seq.zip
13.60 GB
-
BLO_selection.zip
4.18 GB
-
BMO_k-seq.zip
11.51 GB
-
BVO_k-seq.zip
10.67 GB
-
BWO_k-seq.zip
11.90 GB
-
ClusterBoss.zip
4 KB
-
fastq-trim.zip
1.14 KB
-
k-seq_inputs.zip
5.59 GB
-
k-seq-fitting.zip
132.10 KB
-
WFLIVM-k-seq-analysis.zip
8.68 MB
Jun 11, 2024 version files 85.40 GB
-
BFO_k-seq.zip
12 GB
-
BFO_selection.zip
4.38 GB
-
BIO_k-seq.zip
11.56 GB
-
BLO_k-seq.zip
13.60 GB
-
BLO_selection.zip
4.18 GB
-
BMO_k-seq.zip
11.51 GB
-
BVO_k-seq.zip
10.67 GB
-
BWO_k-seq.zip
11.90 GB
-
ClusterBoss.zip
4 KB
-
fastq-trim.zip
1.14 KB
-
k-seq_inputs.zip
5.59 GB
-
k-seq-fitting.zip
132.10 KB
-
README.md
5.54 KB
-
WFLIVM-k-seq-analysis.zip
8.68 MB
Abstract
The emergence of the genetic code was a major transition in the evolution from a prebiotic RNA world to the earliest modern cells. A prominent feature of the standard genetic code is error minimization, or the tendency of mutations to be unusually conservative in preserving biophysical features of the amino acid. While error minimization is often assumed to result from natural selection, it has also been speculated that error minimization may be a by-product of emergence of the genetic code. During establishment of the genetic code in an RNA world, self-aminoacylating ribozymes would enforce the mapping of amino acids to anticodons. Here we show that expansion of the genetic code, through co-option of ribozymes for new substrates, could result in error minimization as an emergent property. Using self-aminoacylating ribozymes previously identified during an exhaustive search of sequence space, we measured the activity of thousands of candidate ribozymes on alternative substrates (activated analogs for tryptophan, phenylalanine, leucine, isoleucine, valine, and methionine). Related ribozymes exhibited preferences for biophysically similar substrates, indicating that co-option of existing ribozymes to adopt additional amino acids into the genetic code would itself lead to error minimization. Furthermore, ribozyme activity was positively correlated with specificity, indicating that selection for increased activity would also lead to increased specificity. These results demonstrate that by-products of the evolution and functional expansion of the ribozyme system would lead to apparently adaptive properties of the genetic code.
https://doi.org/10.25349/D92C9C
Description of the data and file structure
This dataset contains all collected data for this project and the scripts necessary for its analysis:
- BXO_k-seq: Folders for each of six k-Seq experiments performed using BWO, BFO, BLO, BIO, BVO, and BMO, each containing:
- raw.reads: compressed paired-end FASTQ files from triplicate reactions (A, B, and C or D, E, and F) at five substrate concentrations (2, 10, 50, 250, and 1250 uM)
- counts: joined and enumerated reads for each sample generated using EasyDIVER (https://link.springer.com/article/10.1007/s00239-020-09954-0)
- bxo-results: analysis of counts files generated using k-Seq package (https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkab199/6194417)
- k-seq_inputs: A folder containing raw reads and counts files for six input samples (A-F) used in k-Seq experiments
- fastq-trim: A folder containing a Bash script (Trimming.sh) and associated readme file for preprocessing of FASTQ files from k-Seq experiments for further processing by EasyDiver
- k-seq-fitting: A folder containing a preprocessing script (count-data-preprocessing.py) and associated readme file for preparing counts files for k-Seq fitting, as well as csv files that contain qPCR-measured RNA concentrations from each k-Seq sample (rna-ng.csv) and the median RNA concentration for sequences (wildtype, single-, and double-mutants of each of five families) in the input samples (input-rna-median-ng.csv)
- WFLIVM-k-seq-analysis: A folder containing scripts for processing k-Seq fitting results and an associated readme file. These scripts can be used to produce the included output file, WFLIVM-k-seq_merged_+r+I.csv, a merged csv file that contains k-Seq fitting results from each experiment as well as additional information including:
- Associated family of each sequence
- Calculated catalytic enhancement values and associated 95% confidence intervals
- Additional promiscuity metrics like aromatic preference and promiscuity index (I)
- BXO_selection: Two folders for results from aminoacylation selections performed with BFO and BLO containing:
- raw.reads: compressed paired-end FASTQ files (four lanes) from input samples (R0) and five rounds of selection (R1-R5) in two replicate experiments (A and B)
- counts: joined and enumerated reads for each sample generated using EasyDIVER
- clusters: clustered counts files showing sequences group by similarity generated using ClusterBOSS (https://github.com/ichen-lab-ucsb/ClusterBOSS)
- ClusterBOSS: A folder containing the script (ClusterBOSS.py) and readme files for ClusterBOSS
Additional scripts for processing these data can be found at https://github.com/ichen-lab-ucsb/WFLIVM_k-SeqCode/Software
Data were collected from k-Seq experiments using methods similar to those described in two of the supplementary articles linked to this deposit (https://doi.org/10.1021/jacs.8b13298) and (https://doi.org/10.1093/nar/gkab199/6194417) using BWO, BFO, BLO, BIO, BVO, or BMO as substrates for aminoacylating ribozymes.