Alignments of chloroplast noncoding regions for context-dependency analysis


Morton, Brian (2022), Alignments of chloroplast noncoding regions for context-dependency analysis, Dryad, Dataset,


Substitutions between closely related noncoding chloroplast DNA sequences are studied with respect to the composition of the three bases on each side of the substitution, that is the hexanucleotide context. There is about 100-fold variation in rate, amongst the contexts, particularly on substitutions of A and T. Rate heterogeneity of transitions differs from that of transversions, resulting in a more than 200-fold variation in the transitions:transversion bias. The data are consistent with a CpG effect, and it is shown that both the A+T content and the arrangement of purines/pyrimidines along the same DNA strand are correlated with rate variation. Expected equilibrium A+T content ranges from 36.4% to 82.8% across contexts, while G-C skew ranges from -77.4 to 72.2 and A-T skew from -63.9 to 68.2. The predicted equilibria are associated with specific features of the content of the hexanucleotide context, and also show close agreement with the observed context-dependent compositions. Finally, by controlling for the content of nucleotides closer to the substitution site it is shown that both the third and fourth nucleotide removed on each side of the substitution directly influence substitution dynamics at that site. Overall, the results demonstrate that noncoding sites in different contexts are evolving along very different evolutionary trajectories and that substitution dynamics are far more complex than typically assumed. This has important implications for a number of types of sequence analysis, particularly analyses of natural selection, and the context-dependent substitution matrices developed here can be applied in future analyses.