Phylogenomics, the use of large-scale data matrices in phylogenetic analyses, has been viewed as the ultimate solution to the problem of resolving difficult nodes in the tree of life. However, it has become clear that analyses of these large genomic data sets can also result in conflicting estimates of phylogeny. Here, we use the early divergences in Neoaves, the largest clade of extant birds, as a “model system” to understand the basis for incongruence among phylogenomic trees. We were motivated by the observation that trees from two recent avian phylogenomic studies exhibit conflicts. Those studies used different strategies: 1) collecting many characters [42 mega base pairs (Mbp) of sequence data] from 48 birds, sometimes including only one taxon for each major clade; and 2) collecting fewer characters (0.4 Mbp) from 198 birds, selected to subdivide long branches. However, the studies also used different data types: the taxon-poor data matrix comprised 68% non-coding sequences whereas coding exons dominated the taxon-rich data matrix. This difference raises the question of whether the primary reason for incongruence is the number of sites, the number of taxa, or the data type. To test among these alternative hypotheses we assembled a novel, large-scale data matrix comprising 90% non-coding sequences from 235 bird species. Although increased taxon sampling appeared to have a positive impact on phylogenetic analyses the most important variable was data type. Indeed, by analyzing different subsets of the taxa in our data matrix we found that increased taxon sampling actually resulted in increased congruence with the tree from the previous taxon-poor study (which had a majority of non-coding data) instead of the taxon-rich study (which largely used coding data). We suggest that the observed differences in the estimates of topology for these studies reflect data-type effects due to violations of the models used in phylogenetic analyses, some of which may be difficult to detect. If incongruence among trees estimated using phylogenomic methods largely reflects problems with model fit developing more “biologically-realistic” models is likely to be critical for efforts to reconstruct the tree of life.
Reddy_sup_fileS3
List of taxa with voucher information and taxonomic details.An xlsx file with three sheets ("README", "EB2 species list", and "Full taxonomy"). The "Full taxonomy" sheet compares the clades proposed in Jarvis et al. 2014 and those proposed in Prum et al. 2015.
Reddy_sup_fileS4
List of EB2 loci along with information about the partitions within loci. An xlsx file with two sheets ("README" and "EB2 locus information").
Reddy_sup_fileS6_raxml_trees_ALLtaxset.tar
tar.gz file with 56 newick format trees generated by RAxML analyses of the "ALL" taxon set (defined in the manuscript). Folder also contains a README with additional information.
Reddy_sup_fileS8_ASTRAL_nextrees
Nexus treefile with the results of ASTRAL-RAxML and ASTRAL-IQ-TREE analyses for each taxon set. Additional information about the trees can be found as comments in the nexus treefile.
Reddy_sup_figS1_fileS1_indicator_clades
Description of the "magnificent seven" superordinal clades and the "indicator clades" that we used to determine whether an estimate of avian phylogeny has a "Jarvis-like" or "Prum-like" topology. Contains an illustration (Fig. S1) of the relevant clades and text describing the probability of recovering the indicator clades by chance.
Reddy_sup_figS2_Metaves
This figure shows the circumscription of "Metaves" in four previous studies and the topology of the EB2noJAR tree (the tree based on the Early Bird II data excluding overlaps with Jarvis et al. 2014).
Reddy_sup_figS3_Prum_noJAR
This figure shows the topology of the Prum_noJAR tree (the partitioned RAxML tree for the Prum et al. 2015 dataset after excluding the data that overlap with Jarvis et al. 2014).
Reddy_sup_figS4_genetic_code
Structure of the genetic code showing the potential for the physicochemical characteristics of amino acids to lead to site-specific biases in nucleotide frequencies,
Reddy_sup_figS5_GCboxplot
Boxplot showing GC-content variation for each locus in the EB2, Prum, and Jarvis datasets. Only the parsimony informative sites were analyzed, as in Fig. 6 of the manuscript. However, this figure shows the variation when all taxa for each dataset are included in the analysis.
Reddy_sup_fileS2_PrumLoci
Information about the loci used by Prum et al. 2015. An xlsx file with two sheets ("README" and "Prum locus information").
Reddy_sup_fileS5_alignments.tar
tar.gz file with four nexus format alignments. Folder also contains a README with additional information about the sequence alignments.
Reddy_sup_fileS7_ML_treefiles.tar
tar.gz file of four nexus format treefiles. Each contains the RAxML, IQ-TREE, MrBayes, or Exabayes trees along with additional information about the trees (this information is stored as comments in each of the nexus treefiles).
Reddy_sup_fileS9_EB2noJAR_nextrees
Nexus treefile with the results of RAxML and IQ-TREE analyses of the EB2noJAR and EB2noJARnoFGB datasets using the EB2 taxon set. The data matrices used to generate these trees correspond to the EB2 data after excluding the sites overlapping with the Jarvis et al. (2014) data. The data matrix used to generate the noJARnoFGB trees also excludes the gene encoding beta-fibrinogen (FGB). Additional information about the trees can be found as comments in the nexus treefile.
Reddy_sup_fileS10_Prum_noJAR.tar
tar.gz file with three files: 1) a relaxed phylip format alignment of the Prum_noJAR data (concatenated alignments of the Prum et al. 2015 loci after excluding the loci that overlap with Jarvis et al. 2014); 2) a RAxML format partitioning file for the Prum_noJAR data; and 3) a Nexus format treefile with the estimates of phylogeny for Prum_noJAR generated by RAxML and IQ-TREE analyses. Folder also contains a README with additional information about the files.
Reddy_sup_fileS11_backbone_trees
Nexus file containing: 1) the "backbone trees" (trees reduced to ordinal lineages) used for the tree clustering analysis; and 2) symmetric distances among the trees. The file contains embedded comments. If the file is executed in PAUP* it will echo this information to the screen, print the backbone trees, and conduct the tree clustering analysis.
Reddy_sup_fileS12_squangles_treefiles.tar
tar.gz file of two nexus format treefiles. Each contains squangles trees generated using different quartet amalgamation methods. One treefile is the squangles trees for all loci (concatenated) and the other is the same analyses for the BDNF locus. The BDNF locus also includes the ML tree for that locus. Additional information about the trees can be found as comments in the nexus treefiles.
Reddy_sup_tableS1_evolutionary_rates
Table showing relative rates of sequence evolution for coding regions, introns, UCEs, and a mixed alignment that includes multiple types of genomic data. The data were obtained from the trees described by Jarvis et al. (2014).