Data from: Ancestral gene flow and parallel organellar genome capture result in extreme phylogenomic discord in a lineage of angiosperms
Folk, Ryan A.
Mandel, Jennifer R.
Freudenstein, John V.
Published Sep 08, 2016 on Dryad.
Cite this dataset
Folk, Ryan A.; Mandel, Jennifer R.; Freudenstein, John V. (2016). Data from: Ancestral gene flow and parallel organellar genome capture result in extreme phylogenomic discord in a lineage of angiosperms [Dataset]. Dryad. https://doi.org/10.5061/dryad.cd546
While hybridization has recently received a resurgence of attention from systematists and evolutionary biologists, there remains a dearth of case studies on ancient, diversified hybrid lineages-clades of organisms that originated through reticulation. Studies on these groups are valuable in that they would speak to the long-term phylogenetic success of lineages following gene flow between species. We present a phylogenomic view of Heuchera, long known for frequent hybridization, incorporating all three independent genomes: targeted nuclear (~400,000 bp), plastid (~160,000 bp), and mitochondrial (~470,000 bp) data. We analyze these data using multiple concatenation and coalescence strategies. The nuclear phylogeny is consistent with previous work and with morphology, confidently suggesting a monophyletic Heuchera. By contrast, analyses of both organellar genomes recover a grossly polyphyletic Heuchera,consisting of three primary clades with relationships extensively rearranged within these as well. A minority of nuclear loci also exhibit phylogenetic discord; yet these topologies remarkably never resemble the pattern of organellar loci and largely present low levels of discord inter alia. Two independent estimates of the coalescent branch length of the ancestor of Heuchera using nuclear data suggest rare or nonexistent incomplete lineage sorting with related clades, inconsistent with the observed gross polyphyly of organellar genomes (confirmed by simulation of gene trees under the coalescent). These observations, in combination with previous work, strongly suggest hybridization as the cause of this phylogenetic discord.
Concatenated nuclear alignment, 70 taxa
Concatenated nuclear tree, unpartitioned
Concatenated nuclear tree, partitioned by gene
Concatenated nuclear tree, partitioned by splice site
Chloroplast alignment, 70 taxa
Chloroplast tree, unpartitioned
Chloroplast tree, partitioned
Chloroplast alignment of Sanger loci
Chloroplast tree of Sanger loci
Mitochondrial alignment, 66 taxa
Mitochondrial tree, unpartitioned
Mitochondrial tree, partitioned
Mitochondrial tree, upper 50% of taxa in terms of coverage
RAxML nuclear partition file, by gene
RAxML nuclear partition file, by splice site
RAxML chloroplast partition file, coding vs. non-coding
RAxML chloroplast partition file, Sanger loci, by gene
RAxML mitochondrial partition file, coding vs. non-coding
ASTRAL optimal gene trees (in one file) and boostraps (separate files in subdirectory)
BUCKy gene MCMC distributions, with individual gene alignments
Online Appendix 1
Table of voucher information, collection localities, SRA accession numbers, and coverage values for nuclear, plastid, and mitochondrial contigs, as well as individual low-copy nuclear genes. A translate table for phylogenetic files in the online appendices is also provided.
Appendix1_Coverage statistics and voucher information.xlsx
Online Appendix 2
Sanger phylogeny of three plastid regions (trnL-F, rpl32-trnL, and rps16-trnK), included in order to increase outgroup sampling (n =15; added samples summarized in Methods). Bootstrap proportions are plotted on all branches of non-zero length; branch coloration follows Fig. 1.
Online Appendix 3
All 277 analyzed gene trees as inferred in RAxML. The loci are labeled and arranged by descending order of locus length. All clade bootstraps ≥ 50 are plotted on branches.
Online Appendix 4
All 277 analyzed Bayesian MCM distributions from MrBayes, summarized as 50% majority consensus trees. The loci are labeled and arranged by decreasing length. All clade posterior probabilities ≥ 50 are plotted on branches.
Online Appendix 5
Species-genotype association file in ASTRAL-II format, also used for SVDquartets and MP-EST.
Online Appendix 6
ASTRAL-II analysis performed with individual sequences treated as species to assess potential issues with uncertain species limits. Note that these results strongly resemble the concatenation topology and are highly congruent with the ASTRAL-II topology assigning multiple sequences to putative species.
Online Appendix 7
Table of results for the φ tests for recombination.
Online Appendix 8
NeighborNet split network inferred using the concatenated nuclear data, showing conflict among nuclear loci. The currently recognized sections are indicated by colored circles: orange = sect. Holochloa; blue = sect. Heuchera; green = sect. Bracteatae; pink = sect. Rhodoheuchera.
Online Appendix 9
Comparison of coalescent branch length estimates in BUCKy (with individual sequences treated as species, using all 277 loci, subsampled MCMC, and α = 1) and MP-EST (with multiple sequences assigned to putative species). Only internal branches are shown since infraspecific sampling is required to reliably estimate external branch lengths, causing these to misleadingly take on the maximum possible values when there is only one individual per species. The outgroup branch is not to scale.
Online Appendix 10
Table of thermocycler programs and PCR primers for PCR and Sanger sequencing.
Online Appendix 11
Summary of taxa included in the analysis of Sanger-sequenced loci, including voucher information and GenBank accession numbers.
Appendix11_SangerGenbank and voucherinformation.xlsx
Online Appendix 12
Gene trees simulated under the coalescent using the MP-EST tree.
Online Appendix 13
Summary of bipartitions in the gene trees of online Appendix 12, plotted on the MP-EST species tree estimate.
Comparison of BUCKy concordance factors for a completely sampled MCMC analysis (last 90% of samples) and a subsampled analysis (last 10% of samples), using the longest 50 low-copy nuclear loci. Branch labels are concordance factors; all factors ≥ 0.1 are in bold. Only internal branches are shown with concordance factors, as in online Appendix 4.
Online Appendix 17
Comparison of BUCKy concordance factors for different alpha values in the subsampled MCMC analyses. Branch labels are concordance factors; all factors ≥ 0.1 are in bold. External branch concordance factors are not shown; all external branches (those leading to single OTUs) must have 100% concordance since every group containing one OTU must be present in all trees, and hence these numbers may be misleading.
Online Appendix 18
Analysis of the low-copy nuclear concatenated dataset with no partitioning scheme.
Online Appendix 19
Analysis of the low-copy nuclear concatenated dataset with each genetic locus treated as a separate partition.
Online Appendix 20
Concatenated low-copy nuclear phylogeny inferred in RAxML without partitioning, with four rogue taxa pruned as suggested by RogueNaRok. Branch support in sect. Rhodoheuchera (magenta) are dramatically improved with only the position of H. rubescens var. versicolor remaining uncertain. Labeling and coloring of branches follow Fig. 1.
Online Appendix 21
Analysis of the plastid genome dataset with coding and noncoding regions treated as separate partitions and the second copy of the inverted repeat region (IR) deleted. The outgroup branch is not to scale; otherwise, branches are shown proportional to ML branch lengths; branch coloration follows Fig. 1.
Online Appendix 22
Analysis of the plastid genome dataset with no partitioning scheme and the second copy of the inverted repeat region (IR) deleted. The outgroup branch is not to scale; otherwise, branches are shown proportional to ML branch lengths; branch coloration follows Fig. 1.
Online Appendix 23
Analysis of the mitochondrial genome dataset with coding and noncoding regions treated as separate partitions. The outgroup branch is not to scale; otherwise, branches are shown proportional to ML branch lengths; branch coloration follows Fig. 1.
Online Appendix 24
Analysis of the mitochondrial genome dataset with no partitioning scheme. The outgroup branch is not to scale; otherwise, branches are shown proportional to ML branch lengths; branch coloration follows Fig. 1.
Online Appendix 25
Mitochondrial phylogeny including only samples in upper 50th percentile in terms of coverage (unpartitioned). The topology is largely congruent with the tree containing 66 taxa, indicating that the topology is not sensitive to low-coverage samples. The outgroup branch is not to scale; otherwise, branches are shown proportional to ML branch lengths; branch coloration follows Fig. 1.
Online Appendix 26
Cluster network depicting conflict between plastid and nuclear topologies; blue branches represent reticulation, while black branches show tree-like relationships held in common between the two phylogenies. The topological conflict is so extreme and involves such distantly related taxa that almost no clades are held in common between the two datasets. In the two input trees, all clades with < 70% bootstrap support were collapsed.
Online Appendix 27
Tanglegram comparing mitochondrial and concatenated nuclear phylogenies, optimized in Dendroscope to minimize line crossings. Again, the concatenation tree is used to represent the nuclear phylogeny in order to maintain comparability and to show divergent placement of subspecific samples in the mitochondrial tree. In order to minimize spurious disagreement between phylogenies due to estimation error, all clades with < 70% bootstrap support have been collapsed.
Online Appendix 28
Proposed introgression events (from Table 2) mapped onto the ASTRAL topology as a visual representation of hypothesized gene flow. Solid red arrows represent proposed hybridization ancestral to a clade; dotted lines represent hybridization among species or populations only; asterisks mark taxa where there was evidence for introgression but the source of the organellar DNA was unresolved.
Online Appendix 29
The mitochondrial tree (see Fig. 3), pruned to only mitochondrial clade A, and rerooted with Mitella pentandra to match the plastid tree. This figure demonstrates that the mitochondrial tree differs from the plastid tree in ways that cannot be explained simply by subtree misrooting.
Online Appendix 30
Four mitochondrial regions that were conserved, easily aligned among taxa, and generally were represented by complete sequences in the main alignment (of which were used, respectively: positions 103,060-129,653; 132,480-142,279; 416; 224-428,828; 452,332-467,783).
Online Appendix 31
ASTRAL-II tree, including only those loci for which ϕ-tests could not detect recombination. Methods were otherwise identical to the main ASTRAL-II analysis.