From messy chemistry to ecology: Autocatalysis and heritability in prebiotically plausible chemical systems
Data files
Nov 22, 2024 version files 1.14 GB
-
README.md
12.73 KB
-
Sokolskyi-et-al-SupData2024.zip
1.14 GB
Abstract
A key question in origins-of-life research, is whether heritability, and thus evolution, could have preceded genes. Out-of-equilibrium chemical reaction networks with multiple autocatalytic motifs may provide chemical "memory" and serve as units of heritability, but experimental validation is lacking. We established conditions that may be conducive to the emergence of heritable variation and developed methods to search for heritability and autocatalysis. We prepared a food set (FS) of three organic species, three inorganic salts and pyrite. We conducted a serial dilution experiment where FS was incubated for 24 hours, after which a 20% fraction was transferred into freshly prepared FS that went through the same procedure, repeated for 10 generations. To serve as controls, we also incubated the fresh solutions in each generation. We compared the chemical composition of transfer vials and no-transfer controls using liquid chromatography-mass spectrometry (LCMS), with metrics adapted from ecology and evolutionary biology. While variability was high, focusing on a subset of chemicals with more consistent patterns revealed evidence of heritable variation among vials. Using rule-based chemical reaction network inference, constrained by the LCMS data, we identified a plausible FS-driven chemical reaction network that was found to contain numerous autocatalytic cycles.
README: From messy chemistry to ecology: Autocatalysis and heritability in prebiotically plausible chemical systems
https://doi.org/10.5061/dryad.hhmgqnkrt
Description of the data
This is a supplementary dataset for the preprint titled "From messy chemistry to ecology: Autocatalysis and heritability in prebiotically plausible chemical systems" (https://www.biorxiv.org/content/10.1101/2024.08.03.606486v3.full), that includes 1) raw and processed experimental data generated in this study; 2) generated python and R code used to analyze the data; 3) inputs and outputs for the computational analyses in this study.
File structure
-code,code used in this study
--autocatalysis, code used for AC detection
---food-search.py, searches the list of ACs for ACs with a specific food set
---rxn-drawer-smiles.py, draws a reaction network with chemical structures from an input list of reactions
---smiles-conv.py, converts compound numbers from the autocatalyticsubnetworks output into SMILES using a guide file
---smiles-filter.py, filters a list of SMILES to remove compounds that don't fit set criteria (e.g., presence of atoms that are irrelevant such as Br)
--guttenberg et al., 2015, code to implement the heritable states calculation
---computemetric.py, original code from Guttenberg et al., 2015 with update libraries
---splitter.py, python code to split a filtered CompoundDiscoverer output by lineage, resulting files can be analyzed by computemetric.py
--LCMS spreadsheet analyses, code for analyzing CompoundDiscoverer (CD) outputs
---compound-lookup-ratios.py, plot the TR/NTC ratio of a selected compound in an input CD spreadsheet
---compound-lookup.py, plot the TR and NTC values separately of a selected compound in an input CD spreadsheet
---compound-search-plot.py, plot TR/NTC ratios of every compound in a dataset abundance of which is significantly (p<0.05) different between TRs and NTCs in a given number of generations
---element-counter.py, calculate number of atoms of a specific type for every compound in a list of molecules
---filter-name-formula.py, filter the raw output CD file to 1) remove compounds with no identified name AND formula, 2) sum duplicate areas, 3) remove irrelevant columns
---PCA-maker.py, make 4 PCA plots (PC1 vs PC2,3,4,5) for one generation TRs and NTCs
---Sokolskyi-et-al_permanova.qmd, R code to 1) calculate distances between TRs and NTCs, 2) calculate PERMANOVA for a selected distance metric, 3) bootstrap standard deviations for group distances; made by Cecile Ane, updated by Tymofii Sokolskyi
-LCMS data, raw and filtered CD outputs
--generation experiments
This is a set of CD outputs files.
Filtered files have multiple columns: first two are compound name and compound formula, and the rest are areas of each compound for each analyzed sample. Filtering removes compounds that don't have a calculated formula, so all of the compounds in the filtered file have an assigned formula, however, not all of them have an assigned name. Column titles for control samples always start with the generation number and assigned letter "C" or "NTC", and transfers are assigned either "N", "TR" or "T20".
Raw CD files are unprocessed by our code and include additional data for each detected compound, for example, retention times, m/z, sources for the identification, etc. For more information on these variables please consult the CD manual: https://assets.thermofisher.com/TFS-Assets/CMD/manuals/XCALI-98478-Compound-Discoverer-User-Guide-LC-Studies-XCALI98478-en.pdf. Some values in these files are empty, as CD did not assign formula or name to some of the detected peaks. Column titles in the raw output files are highlighted in blue.
---expt1-filtered.xlsx, CD output from replicate 10-generation experiment 1 filtered with filter-name-formula.py
---expt1-raw.xlsx, raw CD output from replicate 10-generation experiment 1 before filtering
---expt2-filtered.xlsx, CD output from replicate 10-generation experiment 2 filtered with filter-name-formula.py
---expt2-raw.xlsx, raw CD output from replicate 10-generation experiment 2 before filtering
--incubation experiments
In this folder is data for 3 different incubation experiments:
Experiment 1 - FS was autoclaved, and then sampled every 2 hours until 24 hours. t=0 is immediately after autoclaving. Incubation conditions are the same as in the main experiment.
Experiment 2 - vials were incubated and sampled at t=0 and t=24 hrs, with and without pyrite, autoclaving, and FS. FP = FS + pyrite; OF = only FS; WP = water + pyrite; A - autoclaved; NA - not autoclaved.
Experiment 3 - FS components were all incubated individually, and all inorganic (ammonia, TMP, bicarbonate) and organic (methanol, formic and acetic acids) components were mixed together.
---inc-expt1-filtered.xlsx
---inc-expt1-raw.xlsx
---inc-expt2-filtered.csv
---inc-expt2-raw.xlsx
---inc-expt3-filtered.csv
---inc-expt3-raw.xlsx
-networks, inputs and outputs for Rule-it and autocatalyticsubnetworks analyses
--AC outputs, outputs of autocatalyticsubnetworks program
---4it-pr-all-noOCNS-ACs-SMILES.txt, list of detected ACs in a 4-iteration Rule-it-produced network pruned with all LCMS-detected compounds (H2O, CO2, NH3, H2S rows removed from the stoichiometric matrix input)
---4it-pr-cntrlcore-noOCNS-ACs-SMILES.txt, list of detected ACs in a 4-iteration Rule-it-produced network pruned with heritable core LCMS-detected compounds (H2O, CO2, NH3, H2S rows removed from the stoichiometric matrix input)
---4it-pr-hercore-noOCNS-ACs-SMILES.txt, list of detected ACs in a 4-iteration Rule-it-produced network pruned with control core LCMS-detected compounds (H2O, CO2, NH3, H2S rows removed from the stoichiometric matrix input)
---example ACs
images of the reaction networks for a few ACs from either of the 3 previous AC lists (produced by rxn-drawer-smiles.py). ACs are numbered as in the .txt files: prall = pruned with all detected compounds; prher = pruned with heritable core; prcntrl = pruned with control core.
--ruleit outputs, outputs of the Rule-it program
---4it-network-pruned-allcmpnds.json
---4it-network-pruned-cntrlcore.json
---4it-network-pruned-hercore.json
4-iteration networks pruned with all LCMS detected compounds, control core and heritable core respectively
---4it-network.json, original 4-iteration network used as input for pruning
---pruning-output-detected cmpnds.xlsx, list of compounds detected in the network with the three pruning attempts
---stoi_mat-4it-pr-all.xlsx
---stoi_mat-4it-pr-cntrlcore.xlsx
---stoi_mat-4it-pr-hercore.xlsx
Stoichiometric matrices (inputs for autocatalyticsubnetworks) produced from each of the three pruned networks
---cmpnd inputs for pruning
Lists of compound SMILES used for pruning in Rule-it
More information on the methods in https://doi.org/10.1101/2024.08.03.606486
Files and variables
File: Sokolskyi-et-al-SupData2024.zip
Description:
-code, code used in this study
--autocatalysis, code used for AC detection
---food-search.py, searches the list of ACs for ACs with a specific food set
---rxn-drawer-smiles.py, draws a reaction network with chemical structures from an input list of reactions
---smiles-conv.py, converts compound numbers from the autocatalyticsubnetworks output into SMILES using a guide file
---smiles-filter.py, filters a list of SMILES to remove compounds that don't fit set criteria (e.g., presence of atoms that are irrelevant such as Br)
--guttenberg et al., 2015, code to implement the heritable states calculation
---computemetric.py, original code from Guttenberg et al., 2015 with update libraries
---splitter.py, python code to split a filtered CompoundDiscoverer output by lineage, resulting files can be analyzed by computemetric.py
--LCMS spreadsheet analyses, code for analyzing CompoundDiscoverer (CD) outputs
---compound-lookup-ratios.py, plot the TR/NTC ratio of a selected compound in an input CD spreadsheet
---compound-lookup.py, plot the TR and NTC values separately of a selected compound in an input CD spreadsheet
---compound-search-plot.py, plot TR/NTC ratios of every compound in a dataset abundance of which is significantly (p<0.05) different between TRs and NTCs in a given number of generations
---element-counter.py, calculate number of atoms of a specific type for every compound in a list of molecules
---filter-name-formula.py, filter the raw output CD file to 1) remove compounds with no identified name AND formula, 2) sum duplicate areas, 3) remove irrelevant columns
---PCA-maker.py, make 4 PCA plots (PC1 vs PC2,3,4,5) for one generation TRs and NTCs
---Sokolskyi-et-al_permanova.qmd, R code to 1) calculate distances between TRs and NTCs, 2) calculate PERMANOVA for a selected distance metric, 3) bootstrap standard deviations for group distances; made by Cecile Ane, updated by Tymofii Sokolskyi
-LCMS data, raw and filtered CD outputs
--generation experiments
---expt1-filtered.xlsx, CD output from replicate 10-generation experiment 1 filtered with filter-name-formula.py
---expt1-raw.xlsx, raw CD output from replicate 10-generation experiment 1 before filtering
---expt2-filtered.xlsx, CD output from replicate 10-generation experiment 2 filtered with filter-name-formula.py
---expt2-raw.xlsx, raw CD output from replicate 10-generation experiment 2 before filtering
--incubation experiments
In this folder is data for 3 different incubation experiments:
Experiment 1 - FS was autoclaved, and then sampled every 2 hours until 24 hours. t=0 is immediately after autoclaving. Incubation conditions are the same as in the main experiment.
Experiment 2 - vials were incubated and sampled at t=0 and t=24 hrs, with and without pyrite, autoclaving, and FS. FP = FS + pyrite; OF = only FS; WP = water + pyrite; A - autoclaved; NA - not autoclaved.
Experiment 3 - FS components were all incubated individually, and all inorganic (ammonia, TMP, bicarbonate) and organic (methanol, formic and acetic acids) components were mixed together.
---inc-expt1-filtered.xlsx
---inc-expt1-raw.xlsx
---inc-expt2-filtered.csv
---inc-expt2-raw.xlsx
---inc-expt3-filtered.csv
---inc-expt3-raw.xlsx
-networks, inputs and outputs for Rule-it and autocatalyticsubnetworks analyses
--AC outputs, outputs of autocatalyticsubnetworks program
---4it-pr-all-noOCNS-ACs-SMILES.txt, list of detected ACs in a 4-iteration Rule-it-produced network pruned with all LCMS-detected compounds (H2O, CO2, NH3, H2S rows removed from the stoichiometric matrix input)
---4it-pr-cntrlcore-noOCNS-ACs-SMILES.txt, list of detected ACs in a 4-iteration Rule-it-produced network pruned with heritable core LCMS-detected compounds (H2O, CO2, NH3, H2S rows removed from the stoichiometric matrix input)
---4it-pr-hercore-noOCNS-ACs-SMILES.txt, list of detected ACs in a 4-iteration Rule-it-produced network pruned with control core LCMS-detected compounds (H2O, CO2, NH3, H2S rows removed from the stoichiometric matrix input)
---example ACs
images of the reaction networks for a few ACs from either of the 3 previous AC lists (produced by rxn-drawer-smiles.py). ACs are numbered as in the .txt files: prall = pruned with all detected compounds; prher = pruned with heritable core; prcntrl = pruned with control core.
--ruleit outputs, outputs of the Rule-it program
---4it-network-pruned-allcmpnds.json
---4it-network-pruned-cntrlcore.json
---4it-network-pruned-hercore.json
4-iteration networks pruned with all LCMS detected compounds, control core and heritable core respectively
---4it-network.json, original 4-iteration network used as input for pruning
---pruning-output-detected cmpnds.xlsx, list of compounds detected in the network with the three pruning attempts
---stoi_mat-4it-pr-all.xlsx
---stoi_mat-4it-pr-cntrlcore.xlsx
---stoi_mat-4it-pr-hercore.xlsx
Stoichiometric matrices (inputs for autocatalyticsubnetworks) produced from each of the three pruned networks
---cmpnd inputs for pruning
Lists of compound SMILES used for pruning in Rule-it
Code/software
Rule-it software: https://github.com/brunocuevas/ruleit
Autocatalyticsubnetworks software: https://github.com/vblancoOR/autocatatalyticsubnetworks
Methods
This is the supplementary data for a biorxiv preprint https://doi.org/10.1101/2024.08.03.606486. It is split into 3 folders:
1) python and R code to process LCMS data in ways described in the article;
2) raw outputs of CompoundDiscoverer software on the LCMS data from the two replicate 10-generation experiments and incubation tests described in the article and the Appendix A;
3) inputs and outputs for the computational pipeline using Rule-it and autocatalyticsubnetworks software described in the article and Appendix C.