There is a rising awareness that species trees are best inferred from multiple loci while taking into account processes affecting individual gene trees, such as substitution model error (failure of the model to account for the complexity of the data) and coalescent stochasticity (presence of incomplete lineage sorting). Although most studies have been carried out in the context of dichotomous species trees, these processes operate also in more complex evolutionary histories involving multiple hybridizations and polyploidy. Recently, methods have been developed that accurately handle incomplete lineage sorting in allopolyploids, but they are thus far restricted to networks of diploids and tetraploids. We propose a procedure that improves on this limitation by designing a workflow that assigns homoeologues to hypothetical diploid ancestral genomes prior to genome tree construction. Conflicting assignment hypotheses are evaluated against substitution model error and coalescent stochasticity. Incongruence that cannot be explained by stochastic mechanisms needs to be explained by other processes (e.g., homoploid hybridization or paralogy). The data can then be filtered to build multilabeled genome phylogenies using inference methods that can recover species trees, either in the face of substitution model error and coalescent stochasticity alone, or while simultaneously accounting for hybridization. Methods are already available for folding the resulting multilabeled genome phylogeny into a network. We apply the workflow to the reconstruction of the reticulate phylogeny of the plant genus Fumaria (Papaveraceae) with ploidal levels ranging from 2x to 14x. We describe the challenges in recovering nuclear NRPB2 homoeologues in high ploidy species while combining in vivo cloning and direct sequencing techniques. Using parametric bootstrapping simulations we assign nuclear homoeologues and chloroplast sequences (four concatenated loci) to their common hypothetical diploid ancestral genomes. As these assignments hinge on effective population size assumptions, we investigate how varying these assumptions impacts the recovered multilabeled genome phylogeny.
Nuclear alignment Figure 3.
Nuclear alignment used for constructing MrBayes phylogeny in Figure 3.
Nuclear_sequences_Fig3.nex
Chloroplast alignment Figure 4.
Chloroplast alignment used for constructing MrBayes phylogeny in Figure 4.
Chloroplast_sequences_Fig4.nex
Nuclear alignment for Figures 5, 6 and Supplementary Figure 10
Nuclear alignment used for Figures 5, 6 (STEM analyses )and Supplementary Figure 10 (BEAST analysis)
Nuclear_sequences_Fig5_6_Suppl_10.nex
Chloroplast alignment Figure 5.
Chloroplast alignment used for constructing STEM species tree in Figure 5.
Chloroplast_sequences_Fig5.nex
Chloroplast alignment Figure 6.
Chloroplast alignment used for constructing STEM species tree in Figure 6.
Chloroplast_sequences_Fig6.nex
Chloroplast alignment Supplementary Figure 10.
Chloroplast alignment used for constructing BEAST phylogeny in Supplementary Figure 10.
Chloroplast_alignment_Suppl_10.nex
Supplementary Material
SUPPLEMENTARY 1., 2. Plant material and GenBank accessions for nuclear and chloroplast sequences respectively.
SUPPLEMENTARY 3. Material and methods for sequencing the plant material with a description of all primers designed for this study
SUPPLEMENTARY 4. Illustration of the clone filtering procedure for identifying putative PCR-recombinant sequences and cross taxon contamination. All clones for the octoploid F. mirabilis (specimen 13673) were used to construct a neighborNet (uncorrected p-distances) in SplitsTree4. From ploidal level, four distinct sequences are expected. Each group highlighted in green was composed of several nearly identical sequences and was retained for phylogenetic analysis. Red sequences, connected to the main frame of the network with zero-length terminal edges, were considered to be putative PCR recombinants and were therefore discarded. A sufficient number of clones was not recovered for the blue group, which was further investigated with clade specific primers.
SUPPLEMENTARY 5. Description of the phylogenetic analyses deployed for inferring gene trees.
SUPPLEMENTARY 6. Details of the BEAST analysis dating the Dicentra-Corydalis split.
SUPPLEMENTARY 7. Description and results of the simulation that assessed the performance of the coalescent stochasticity test.
SUPPLEMENTARY 8. Table that reports the result of the coalescent stochasticity test for the sets of sequences derived from the individual small Ne and large Ne analyses of coalescent stochasticity tests. Here, all the sequences in a set are analyzed together and the inference Ne is increased until the full set passes the coalescent stochasticity test.
SUPPLEMENTARY 9. Tanglegram inferred with RAxML from sequences of the primary set. Nuclear (a) and chloroplast (b) topologies correspondences were visualized in Dendroscope 3 (Huson and Scornavacca 2012).
SUPPLEMENTARY 10. Genome tree inferred with BEAST. The maximum clade credibility chronogram was built from all nuclear homoeologues (except sequences from F. bastardii), together with chloroplast haplotypes from the primary set plus those that passed the substitution model error test. PP ≥ 0.70 are indicated below branches. Icons indicate the chloroplast sequences that were added to the analysis at the different steps of genome tree reconstruction.
Supplementary_material.pdf