Data from: Strong evidence of an Information-Theoretical Conservation Principle linking all discrete systems
Hatton, Les; Warr, Gregory (2019), Data from: Strong evidence of an Information-Theoretical Conservation Principle linking all discrete systems, Dryad, Dataset, https://doi.org/10.5061/dryad.gm957n1
Diverse discrete systems share common global properties that lack a unifying theoretical explanation. However, constraining the simplest measure of total information (Hartley-Shannon) in a statistical mechanics framework reveals a principle, the Conservation of Hartley-Shannon Information (CoHSI) that directly predicts both known and unsuspected common properties of discrete systems, as borne out in the diverse systems of computer software, proteins and music. Discrete systems fall into two categories distinguished by their structure: heterogeneous systems in which there is a distinguishable order of assembly of the system's components from an alphabet of unique tokens (e.g. proteins assembled from an alphabet of amino acids), and homogeneous systems in which unique tokens are simply binned, counted and rank ordered. Heterogeneous systems are characterized by an implicit distribution of component lengths, with sharp unimodal peak (containing the majority of components) and a power-law tail, whereas homogeneous systems reduce naturally to Zipf's Law but with a drooping tail in the distribution. We also confirm predictions that very long components are inevitable for heterogeneous systems; that discrete systems can exhibit simultaneously both heterogeneous and homogeneous behavior; and that in systems with more than one consistent token alphabet (e.g. digital music), the alphabets themselves show a power-law relationship.