Skip to main content
Dryad logo

Data from: The multilocus multispecies coalescent: a flexible new model of gene family evolution

Citation

Li, Qiuyi; Scornavacca, Celine; Galtier, Nicolas; Chan, Yao-Ban (2021), Data from: The multilocus multispecies coalescent: a flexible new model of gene family evolution, Dryad, Dataset, https://doi.org/10.5061/dryad.dz08kprvq

Abstract

Incomplete lineage sorting (ILS), the interaction between coalescence and speciation, can generate incongruence between gene trees and species trees, as can gene duplication (D), transfer (T) and loss (L). These processes are usually modelled independently, but in reality, ILS can affect gene copy number polymorphism, i.e., interfere with DTL. This has been previously recognised, but not treated in a satisfactory way, mainly because DTL events are naturally modelled forward-in-time, while ILS is naturally modelled backwards-in-time with the coalescent. Here we consider the joint action of ILS and DTL on the gene tree/species tree problem in all its complexity. In particular, we show that the interaction between ILS and duplications/transfers (without losses) can result in patterns usually interpreted as resulting from gene loss, and that the realised rate of D, T and L becomes non-homogeneous in time when ILS is taken into account. We introduce algorithmic solutions to these problems. Our new model, the multilocus multispecies coalescent (MLMSC), which also accounts for any level of linkage between loci, generalises the multispecies coalescent model and offers a versatile, powerful framework for proper simulation and inference of gene family evolution.

Methods

We use the MLMSC simulator to simulate gene trees on the fungal species tree using three different duplication and loss rates (10-10, 5x10-10 and 10x10-10, duplication and loss rates are assumed to be equal), three effective population sizes (107, 5x107, and 10x107), and 0.9 year per generation. For each set of parameters, we run 500 simulations and calculate the proportion of unilocus trees for which the haplotype forest contains more than one haplotype tree (i.e., copy number hemiplasy occurres).

We also include the full model specification in a separate pdf file.