Skip to main content
Dryad logo

Data from: The effect of gene flow on coalescent-based species-tree inference


Long, Colby; Kubatko, Laura (2018), Data from: The effect of gene flow on coalescent-based species-tree inference, Dryad, Dataset,


Most current methods for inferring species-level phylogenies under the coalescent model assume that no gene flow occurs following speciation. Several studies have examined the impact of gene flow (e.g., Eckert and Carstens (2008); Chung and Ane (2011); Leache et al. (2014); Solis-Lemus et al. (2016)) and of ancestral population structure (DeGeorgio and Rosenberg, 2016) on the performance of species-level phylogenetic inference, and analytic results have been proven for network models of gene flow (e.g., Solis-Lemus et al. (2016); Zhu et al. (2016)). However, there are few analytic results for a continuous model of gene flow following speciation, despite the development of mathematical tools that could facilitate such study (e.g., Hobolth et al. (2011); Andersen et al. (2014); Tian and Kubatko (2016)). In this paper, we consider a three-taxon isolation-with-migration model that allows gene flow between sister taxa for a brief period following speciation, as well as variation in the effective population sizes across the species tree. We derive the probabilities of each of the three gene tree topologies under this model, and show that for certain choices of the gene flow and effective population size parameters, anomalous gene trees (i.e., gene trees that are discordant with the species tree but that have higher probability than the gene tree concor- dant with the species tree) exist. We characterize the region of parameter space producing anomalous trees, and show that the probability of the gene tree that is concordant with the species tree can be arbitrarily small. We then show that there is theoretical support for using SVDQuartets with an outgroup to infer the rooted three-taxon species tree in a model of gene flow between sister taxa. We study the performance of SVDQuartets on simulated data and compare it to three other commonly-used methods for species tree inference, AS- TRAL, MP-EST, and concatenation. The simulations show that ASTRAL, MP-EST, and concatenation can be statistically inconsistent when gene flow is present, while SVDQuartets performs well, though large sample sizes may be required for certain parameter choices.

Usage Notes


National Science Foundation, Award: DMS-1440386