Data from: Why is inverse symmetry the fundamental architecture of DNA?
Data files
Mar 26, 2021 version files 43.95 KB
-
WarrHattonRSOS_Mar2021.zip
43.95 KB
Abstract
A striking global property of genomes observable across all domains of life is their universal inverse symmetry, manifest as equivalent frequencies of inverse complementary sequence motifs on the same strand of duplex DNA, as originally stated in Chargaff’s Second Parity Rule (CPR2) for mononucleotides. Simple mechanistic explanations of CPR2 have proven unsatisfactory.
In contrast, we use a conservation principle to explain not only inverse symmetry and its global nature, but also how it breaks down. CoHSI theory (Conservation of Hartley-Shannon Information) when applied to the structure of dsDNA, considered as a homogeneous discrete system, predicts a power-law relationship in frequency versus rank-order of sequence motifs (n-tuples) in a single strand. We show how this combines with inter-strand Watson-Crick base-pairing to predict a genome-wide power-law relationship in frequency versus rank-order of stepped-pairs of inverse complementary motifs (ie. universal inverse symmetry). These predictions were tested and validated on 175 genomes drawn from the 3 domains of life plus viruses. We find that CPR2 holds closely for genomes over 10^5 bp in length, and that CPR2 compliance declines in shorter genomes in inverse proportion to genome length and in direct proportion to n-tuple size, regardless of DNA, RNA or single- or double-stranded nature.
Methods
This is a complete computational reproducibility package allowing all results, tables and diagrams to be reproduced individually for this study, as well as performing verification checks on machine environment, availability of essential open source packages, quality of arithmetic and regression testing of the outputs.
It has been tested locally to recreate each figure and statistical computation.
Usage notes
This is self-contained. All information is included.