Skip to main content
Dryad

Data from: Using genomic location and coalescent simulation to investigate gene tree discordance in Medicago L.

Cite this dataset

Sousa, Filipe et al. (2017). Data from: Using genomic location and coalescent simulation to investigate gene tree discordance in Medicago L. [Dataset]. Dryad. https://doi.org/10.5061/dryad.85c0s

Abstract

Several well-documented evolutionary processes are known to cause conflict between species-level phylogenies and gene-level phylogenies. Three of the most challenging processes for species tree inference are incomplete lineage sorting, hybridization and gene duplication, which may result in unwarranted comparisons of paralogous genes. Several existing methods have dealt with these processes but none has yet been able to untangle all three at once. Here, we propose a stepwise method by which these processes can be discerned using information on genomic location coupled with coalescent simulations. In the first step, highly discordant genes within genomic blocks (putative paralogs) are identified and excluded from the data set and, in the second step, blocks of linked genes are grouped according to their hybrid history. Existing multispecies coalescent software can then be applied to recover the principal tree(s) that make up the species tree/network without violating the underlying model. The potential of the approach is evaluated on simulated data derived from a species network composed of nine species, of which one is of hybrid origin, and displaying a single-gene duplication that leads to paralogous comparisons. We apply our method to an empirical set of 12 genes from 7 species sampled in the plant genus Medicago that display phylogenetic discordance. We identify the causes of the discordance and demonstrate that the Medicago orbicularis lineage experienced an episode of ancient hybridization. Our results show promise as a new way to explore phylogenetic sequence data that can significantly improve species tree inference in presence of hybridization and undetected paralogy or other causes leading to extremely discordant gene trees.

Usage notes