Skip to main content
Dryad

Data for: Ancient rapid radiation explains most conflicts among gene trees and well-supported phylogenomic trees of nostocalean cyanobacteria

Cite this dataset

Pardo-De la Hoz, Carlos José et al. (2023). Data for: Ancient rapid radiation explains most conflicts among gene trees and well-supported phylogenomic trees of nostocalean cyanobacteria [Dataset]. Dryad. https://doi.org/10.5061/dryad.tht76hf1p

Abstract

Prokaryotic genomes are often considered to be mosaics of genes that do not necessarily share the same evolutionary history due to widespread Horizontal Gene Transfers (HGTs). Consequently, representing evolutionary relationships of prokaryotes as bifurcating trees has long been controversial. However, studies reporting conflicts among gene trees derived from phylogenomic datasets have shown that these conflicts can be the result of artifacts or evolutionary processes other than HGT, such as incomplete lineage sorting, low phylogenetic signal, and systematic errors due to substitution model misspecification. Here, we present the results of an extensive exploration of phylogenetic conflicts in the cyanobacterial order Nostocales, for which previous studies have inferred strongly supported conflicting relationships when using different concatenated phylogenomic datasets. We found that most of these conflicts are concentrated in deep clusters of short internodes of the Nostocales phylogeny, where the great majority of individual genes have low resolving power. We then inferred phylogenetic networks to detect HGT events while also accounting for incomplete lineage sorting. Our results indicate that most conflicts among gene trees are likely due to incomplete lineage sorting linked to an ancient rapid radiation, rather than to HGTs. Moreover, the short internodes of this radiation fit the expectations of the anomaly zone, i.e., a region of the tree parameter space where a species tree is discordant with its most likely gene tree. We demonstrated that concatenation of different sets of loci can recover up to 17 distinct and well-supported relationships within the putative anomaly zone of Nostocales, corresponding to the observed conflicts among well-supported trees based on concatenated datasets from previous studies. Our findings highlight the important role of rapid radiations as a potential cause of strongly conflicting phylogenetic relationships when using phylogenomic datasets of bacteria. We propose that polytomies may be the most appropriate phylogenetic representation of these rapid radiations that are part of anomaly zones, especially when all possible genomic markers have been considered to infer these phylogenies.

Methods

This dataset includes the draft genome assemblies from 220 cyanobacterial strains, 215 of which were previously published and retrieved from Genbank (see Table S1 in the associated manuscript) and 5 generated as part of the study. It also includes all the data, code, and output files generated as part of the analyses in the paper. See the README.md for more information. Visit also the project's GitHub repository (https://github.com/cjpardodelahoz/nostocales), which contains the version history of all the code, as well as detailed workflows for the analyses conducted as part of the study. 

Funding

National Science Foundation, Award: BEE 1929994