Data from: Accurate estimation of substitution rates with neighbour-dependent models in a phylogenetic context
Cite this dataset
Bérard, Jean; Guéguen, Laurent (2012). Data from: Accurate estimation of substitution rates with neighbour-dependent models in a phylogenetic context [Dataset]. Dryad. https://doi.org/10.5061/dryad.5vp21b10
Most models and algorithms developed to perform statistical inference from DNA data make the assumption that substitution processes affecting distinct nucleotide sites are stochastically independent. This assumption ensures both mathematical and computational tractability, but is in disagreement with observed data in many situations -- one well-known example being CpG dinucleotide hypermutability in mammalian genomes. In this paper, we consider the class of RN95+YpR substitution models, which allows neighbour-dependent effects -- including CpG hypermutability -- to be taken into account, through transitions between pyrimidine-purine dinucleotides. We show that it is possible to adapt inference methods originally developed under the assumption of independence between sites to RN95+YpR models, using a mathematically rigorous framework provided by specific structural properties of this class of models. We assess how efficient this approach is at inferring the CpG hypermutability rate from aligned DNA sequences. The method is tested on simulated data and compared against several alternatives; the results suggest that it delivers a high degree of accuracy at a low computational cost. We then apply our method to an alignment of ten DNA sequences from primate species. Model comparisons within the RN95+YpR class show the importance of taking into account neighbour-dependent effects. An application of the method to the detection of hypomethylated islands is discussed.