Data from: Assignment of homoeologues to parental genomes in allopolyploids for species tree inference, with an example from Fumaria (Papaveraceae)
Data files
Jan 27, 2015 version files 3.37 MB
-
Chloroplast_alignment_Suppl_10.nex
-
Chloroplast_sequences_Fig4.nex
-
Chloroplast_sequences_Fig5.nex
-
Chloroplast_sequences_Fig6.nex
-
Nuclear_sequences_Fig3.nex
-
Nuclear_sequences_Fig5_6_Suppl_10.nex
-
Supplementary_material.pdf
Abstract
There is a rising awareness that species trees are best inferred from multiple loci while taking into account processes affecting individual gene trees, such as substitution model error (failure of the model to account for the complexity of the data) and coalescent stochasticity (presence of incomplete lineage sorting). Although most studies have been carried out in the context of dichotomous species trees, these processes operate also in more complex evolutionary histories involving multiple hybridizations and polyploidy. Recently, methods have been developed that accurately handle incomplete lineage sorting in allopolyploids, but they are thus far restricted to networks of diploids and tetraploids. We propose a procedure that improves on this limitation by designing a workflow that assigns homoeologues to hypothetical diploid ancestral genomes prior to genome tree construction. Conflicting assignment hypotheses are evaluated against substitution model error and coalescent stochasticity. Incongruence that cannot be explained by stochastic mechanisms needs to be explained by other processes (e.g., homoploid hybridization or paralogy). The data can then be filtered to build multilabeled genome phylogenies using inference methods that can recover species trees, either in the face of substitution model error and coalescent stochasticity alone, or while simultaneously accounting for hybridization. Methods are already available for folding the resulting multilabeled genome phylogeny into a network. We apply the workflow to the reconstruction of the reticulate phylogeny of the plant genus Fumaria (Papaveraceae) with ploidal levels ranging from 2x to 14x. We describe the challenges in recovering nuclear NRPB2 homoeologues in high ploidy species while combining in vivo cloning and direct sequencing techniques. Using parametric bootstrapping simulations we assign nuclear homoeologues and chloroplast sequences (four concatenated loci) to their common hypothetical diploid ancestral genomes. As these assignments hinge on effective population size assumptions, we investigate how varying these assumptions impacts the recovered multilabeled genome phylogeny.