Skip to main content

Data from: From gene trees to a dated allopolyploid network: insights from the angiosperm genus Viola (Violaceae)

Cite this dataset

Marcussen, Thomas et al. (2014). Data from: From gene trees to a dated allopolyploid network: insights from the angiosperm genus Viola (Violaceae) [Dataset]. Dryad.


Allopolyploidisation acounts for a significant fraction of speciation events in many eukaryotic lineages. However, existing phylogenetic and dating methods require tree-like topologies and are unable to handle the network-like phylogenetic relationships of lineages containing allopolyploids. No explicit framework has so far been established for evaluating competing network topologies, and few attempts have been made to date phylogenetic networks. We used a four-step approach to generate a dated polyploid species network for the cosmopolitan angiosperm genus Viola L. (Violaceae Batch.). The genus contains ca 600 species and both recent (neo-) and more ancient (meso-) polyploid lineages distributed over 16 sections. First, we obtained DNA sequences of three low-copy nuclear genes and one chloroplast region, from 42 species representing all 16 sections. Second, we obtained fossil-calibrated chronograms for each nuclear gene marker. Third, we determined the most parsimonious multilabelled genome tree and its corresponding network, resolved at the section (not the species) level. Reconstructing the ‘correct’ network for a set of polyploids depends on recovering all homoeologs, i.e. all subgenomes, in these polyploids. Assuming the presence of Viola subgenome lineages that were not detected by the nuclear gene phylogenies (‘ghost subgenome lineages’), significantly reduced the number of inferred polyploidisation events. We identified the most parsimonious network topology from a set of five competing scenarios differing in the interpretation of homoeolog extinctions and lineage sorting, based on (1) fewest possible ghost subgenome lineages, (2) fewest possible polyploidisation events, and (3) least possible deviation from expected ploidy as inferred from available chromosome counts of the involved polyploid taxa. Finally, we estimated the homoploid and polyploid speciation times of the most parsimonious network. Homoploid speciation times were estimated by coalescent analysis of gene tree node ages. Polyploid speciation times were estimated by comparing branch lengths and speciation rates of lineages with and without ploidy shifts. Our analyses recognise Viola as an old genus (crown age 31 Ma) whose evolutionary history has been profoundly affected by allopolyploidy. Between 16 and 21 allopolyploidisations are necessary to explain the diversification of the 16 major lineages (sections) of Viola, suggesting that allopolyploidy has accounted for a high percentage – between 67% and 88% – of the speciation events at this level. The theoretical and methodological approaches presented here for (1) constructing networks and (2) dating speciation events within a network, have general applicability for phylogenetic studies of groups where allopolyploidisation has occurred. They make explicit use of a hitherto underexplored source of ploidy information from chromosome counts to help resolve phylogenetic cases where incomplete sequence data hampers network inference. Importantly, the coalescent-based method used herein circumvents the assumption of tree-like evolution required by most techniques for dating speciation events.

Usage notes


South America
Northern Hemisphere
South Africa