Skip to main content

Disentangling sources of gene tree discordance in phylogenomic datasets: testing ancient hybridizations in Amaranthaceae s.l.

Cite this dataset

Morales-Briones, Diego F. et al. (2020). Disentangling sources of gene tree discordance in phylogenomic datasets: testing ancient hybridizations in Amaranthaceae s.l. [Dataset]. Dryad.


Gene tree discordance in large genomic datasets can be caused by evolutionary processes such as incomplete lineage sorting and hybridization, as well as model violation, and errors in data processing, orthology inference, and gene tree estimation. Species tree methods that identify and accommodate all sources of conflict are not available, but a combination of multiple approaches can help tease apart alternative sources of conflict. Here, using a phylotranscriptomic analysis in combination with reference genomes, we test a hypothesis of ancient hybridization events within the plant family Amaranthaceae s.l. that was previously supported by morphological, ecological, and Sanger-based molecular data. The dataset included seven genomes and 88 transcriptomes, 17 generated for this study. We examined gene-tree discordance using coalescent-based species trees and network inference, gene tree discordance analyses, site pattern tests of introgression, topology tests, synteny analyses, and simulations. We found that a combination of processes might have generated the high levels of gene tree discordance in the backbone of Amaranthaceae s.l. Furthermore, we found evidence that three consecutive short internal branches produce anomalous trees contributing to the discordance. Overall, our results suggest that Amaranthaceae s.l. might be a product of an ancient and rapid lineage diversification, and remains, and probably will remain, unresolved. This work highlights the potential problems of identifiability associated with the sources of gene tree discordance including, in particular, phylogenetic network methods. Our results also demonstrate the importance of thoroughly testing for multiple sources of conflict in phylogenomic analyses, especially in the context of ancient, rapid radiations. We provide several recommendations for exploring conflicting signals in such situations.

Usage notes

- The file 'Supplementa_Methods_and_Materials.tar.gz' contains the supplemental methods, figures and tables referenced in the main text

- The file 'Homologs.tar.gz' contains the 14584 homolog trees:

    raw_homologs.tar.gz - trees without any filtering or pruning

    final_homologs.tar.gz - trees after, monophyletic and paraphyletic grades of the same species masked, deep paralogs prunned, and spurious tips removed.

- The file 'Analyses_data.tar.gz' contains the data (alignments and individual gene trees) used for each of the dataset:

    filtered_transcriptomes.tar.gz - 88 filtered transcriptomes
    all_13025_orthologs_cln_aln.tar.gz - all the 13025 'monophyletic outgroup' orthologs
    105-taxon.tar.gz - 936 alignments and trees of the full 105-taxon analyses
    41-taxon.tar.gz - 1242 alignments and trees of the 41-taxon cloudogram
    11-taxon-net.tar.gz - 4138 alignments and trees of the 11-taxon(net) used for network analyses
    4-taxon.tar.gz - alignments and trees (between 7,756 and 8,793) for each of the 10 4-taxon quartets
    11-taxon-tree.tar.gz - 5936 alignments and trees of the 11-taxon(tree) analyses
    chloroplast.tar.gz - 11-taxon alignment and tree and 76 individual CDS alignment and trees of the plastid analyses


University of Minnesota System

University of Michigan–Ann Arbor

National Science Foundation, Award: DEB 1354048

Department of Energy, Office of Science, Genomic Science Program, Award: DE-SC0008834