Skip to main content
Dryad

Estimating accurate gene trees in the presence of intra-locus recombination: A simulation study

Abstract

Accurate gene trees are difficult to estimate with traditional methods due to the effects of recombination. New methods that co-estimate gene trees and recombination breakpoints function differently than the traditional maximum likelihood (ML) framework, and therefore have the potential to alleviate inaccuracies caused by recombination. However, the accuracy of gene trees produced by these methods has yet to be evaluated under a broad range of conditions. Using simulations, we studied gene tree accuracy in the presence of intra-locus recombination. Using a previously published model of human population history, we simulate the process of recombination along large sections of a genome to produce DNA sequence alignments. We varied three parameters that influence gene tree accuracy: recombination rate, population size, and substitution rate. We then compare the accuracy of gene trees estimated from different methodologies, including traditional maximum likelihood estimation of single and concatenated regions, as well as more sophisticated co-estimation methods. Unsurprisingly, we found that traditional approaches can only produce accurate gene trees in narrow regions of parameter space; as the number of sites used to estimate a gene tree increases, recombination becomes more and more problematic. Some, but not all, of the co-estimation methods successfully circumvent this tradeoff and have the potential to produce accurate gene trees in broader regions of parameter space. These results indicate that by adopting co-estimation methods, systematists may be able to improve gene tree accuracy.