Data from: Accurate estimation of substitution rates with neighbour-dependent models in a phylogenetic context

Bérard, Jean1; Guéguen, Laurent1

Published Jan 27, 2012 on Dryad. https://doi.org/10.5061/dryad.5vp21b10

Data files

Jan 27, 2012 version files 25.35 MB

appendix.pdf

191.56 KB
ENm001_AR.fa

4.66 MB
ENm001.fa

20.49 MB

Abstract

Most models and algorithms developed to perform statistical inference from DNA data make the assumption that substitution processes affecting distinct nucleotide sites are stochastically independent. This assumption ensures both mathematical and computational tractability, but is in disagreement with observed data in many situations -- one well-known example being CpG dinucleotide hypermutability in mammalian genomes. In this paper, we consider the class of RN95+YpR substitution models, which allows neighbour-dependent effects -- including CpG hypermutability -- to be taken into account, through transitions between pyrimidine-purine dinucleotides. We show that it is possible to adapt inference methods originally developed under the assumption of independence between sites to RN95+YpR models, using a mathematically rigorous framework provided by specific structural properties of this class of models. We assess how efficient this approach is at inferring the CpG hypermutability rate from aligned DNA sequences. The method is tested on simulated data and compared against several alternatives; the results suggest that it delivers a high degree of accuracy at a low computational cost. We then apply our method to an alignment of ten DNA sequences from primate species. Model comparisons within the RN95+YpR class show the importance of taking into account neighbour-dependent effects. An application of the method to the detection of hypomethylated islands is discussed.

Data from: Accurate estimation of substitution rates with neighbour-dependent models in a phylogenetic context

Data files

Abstract

Usage notes

appendix

ENm001

ENm001_AR

Works referencing this dataset