Skip to main content

Phylogenetic results of 6 morphological character matrices

Cite this dataset

Yu, Congyu (2022). Phylogenetic results of 6 morphological character matrices [Dataset]. Dryad.


The construction of morphological character matrices is central to paleontological systematic study, which extracts paleontological information from fossils. Although the word information has been repeatedly mentioned in a wide array of paleontological systematic studies, its meaning has rarely been clarified and there has not been a standard to measure paleontological information due to the incompleteness of fossils, difficulty of recognizing homologous and homoplastic structures, etc. Here, based on information theory, we show the deep connections between paleontological systematic study and communication system engineering. It is information, the decrease of uncertainty, in morphological characters that distinguishes operational taxonomic units (OTUs) and reconstructs evolutionary history. We propose that concepts in communication system engineering such as source coding and channel coding correspond in paleontological studies to the construction of diagnostic features and the entire character matrices, which should be distinguished as how typical communication systems are engineered because these two steps serve dual purposes. With character matrices from six different vertebrate groups, we analyzed their information properties including source entropy, mutual information, and channel capacity. Estimation of channel capacity shows upper limits of all matrices in transmitting paleontological information, indicating that, due to the presence of noise, too many characters not only increase the burden in character scoring, but also may decrease quality of matrices. Information entropy, which measure how informative a variable is, of each character is tested as a weighting criterion in parsimony-based systematic studies, the results show high consistence with existing knowledge with both good resolution and interpretability.


This dataset contains a .rar file with 24 .nex files that belong to 6 different vertebrate groups: Ornithischia (Han et al., 2017), Ceratopsia (Yu et al., 2020), Diplodocidae (Tschopp & Mateus 2017), multituberculata (Wang et al., 2019), Carnivoramorpha (Spaulding & Flynn 2012), and lizards (Tschopp et al., 2018). The original character matrices including morphological characters and encoded states can be found in the references above. 

For each group, there are 4 .nex files with names of GroupName_equal_weighting/_implied_weight_3/_implied_weight_12/_info_entropy, indicating the parsimony-based phylogenetic analysis was run under equal weighting, implied weighting (k=3), implied weighting(k=12), and information entropy weighting proposed in this study. The tree results are appended after the character matrix in each .nex file, and the strict consensus tree was appended as the last tree. For information entropy weighting, the weights keep the first two digits after the decimal point.

Phylogenetic analysis was done in TNT 1.5 (Goloboff & Catalano 2016). The strict consensus tree was appended to the last of tree list in each method.

Usage notes

See Methods above.