Genomics is narrowing uncertainty in the phylogenetic structure for many amniote groups. For one of the most diverse and species-rich groups, the squamate reptiles (lizards and snakes, amphisbaenians), an inverse correlation between the number of taxa and loci sampled still persists across all publications using DNA sequence data and reaching a consensus on the relationships among them has been highly problematic. Here, we use high-throughput sequence data from 289 samples covering 75 families of squamates to address phylogenetic affinities, estimate divergence times, and characterize residual topological uncertainty in the presence of genome scale data. Importantly, we address genomic support for the traditional taxonomic groupings Scleroglossa and Macrostomata using novel machine-learning techniques. We interrogate genes using various metrics inherent to these loci, including parsimony-informative sites, phylogenetic informativeness, length, gaps, number of substitutions, and site concordance to understand why certain loci fail to find previously well-supported molecular clades and how they fail to support species-tree estimates. We show that both incomplete lineage sorting and poor gene-tree estimation (due to a few undesirable gene properties, such as an insufficient number of parsimony informative sites), may account for most gene and species-tree discordance. We find overwhelming signal for Toxicofera, and also show that none of the loci included in this study supports Scleroglossa or Macrostomata. We comment on the origins and diversification of Squamata throughout the Mesozoic and underscore remaining uncertainties that persist in both deeper parts of the tree (e.g., relationships between Dibamia, Gekkota, and remaining squamates; and between the three toxiferan clades Iguania, Serpentes, and Anguiformes) and within specific clades (e.g., affinities among gekkotan, pleurodont iguanians, and colubroid families).
Data_S1_Aligned_AHA genomic_data_for tree_order
Anchored hybrid enrichment genomic-scale data for squamata
Data_S2_Concatenated_Squam_Concatenated
Concatenated loci
Data_S3_Concatenated_partitioned_models
Partitions and models for the concatenated data
Data_S4_Tree_1_Squamate_ASTRAL_IQ_BS_Tree
ASTRAL III phylogeny using IQTree gene trees with bootstrap support
Data_S4_Tree_1_Squamate_Astral_IQ_BS_Tree
Data_S5_Tree_2_SQ_IQ_ASTRAL_Full_Annotations
ASTRAL III Phylogeny using IQ gene trees with full ASTRAL annotations on quadripartitions
Data_S6_Tree_3_Squamata_IQ_ASTRAL_ASTRAL_Support
ASTRAL III phylogeny with ASTRAL quadripartion support
Data_S6_Tree_3_Squamata_IQ_Astral_Astral_Support
Data_S7_Tree_4_Squam_IQ_Concat_Part_BS_SH
Concatenated and partitioned IQTree with bootstrap and SH support (BS/SH)
Data_S8_Dated_ASTRAL_IQ_Point_Tree
Dated phylogeny where concatenated sequence data were fit to the topology of ASTRAL III tree and dates were estimated using TreePL
Data_S8_Dated_Astral_IQ_Point_Tree
Data_S9_Dated_IQ_UF_Bootstraps_with_error
Dated phylogeny with error using 1000 IQ trees and UFBoots (Fast bootstrap) function and estimated with TreePL
Data_S10_Dated_Combined_Slow_bootstraps_RAXML_with_error
Dated phylogeny with error using 100 RAxML bootstraps and phylogenies estimates using IQTree and estimated with TreePL (slow bootstraps)
Data_S11_Tree_PL_control
Control file for running TreePL
Data_S12_All_Code_Squamate_Monophyly-Multicolinearity_STATS
Code for running Neural Network for gene interrogation and other functions
Data_S13_gene_support_species_tree_by_date
Dated nodes from the species tree showing the number of genes that support or do not support that node
Data_S14_Support_by_node_table
Nodes from species tree showing if each locus supports (1) or does not support (0) that node
Data_S15. NN_Table_All_Stats
Table of all stats for each locus for testing predictions of RFgt-st differences using NN code for regression predictions.
Data_S16_Gene_Trees_Dib_Gekkota_not_mono_ASTRAL_tree
ASTRAL III tree generated from IQ Trees not showing Dibamidae and Gekkota as monophyletic
Data_S16_Gene_Trees_Dib_Gekkota_not_mono_Astral_tree
Data_S17_Gene_Trees_Not_Supporting_Anilius_ASTRAL_tree
ASTRAL III tree generated from IQ Trees not showing Dibamidae and Gekkota as monophyletic
Data_S17_Gene_Trees_Not_Supporting_Anilius_Astral_tree
Data_S18_Gene_Trees_Not_Supporting_Toxicofera_ASTRAL_Tree
ASTRAL III tree generated from IQ Trees not showing Toxicofera as monophyletic
Data_S18_Gene_Trees_Not_Supporting_Toxicofera_Astral_Tree
Data_S19_NN_Test_Data_Anil_Dib_Tox
Data table for use with NN code (Data_S12) to make classification prediction for why genes do or do not find Toxicofera, Amerophidia, Dibamia/Gekkota.
Fig_S1_Phylogenetic Informativeness
Phylogenetic informativeness for each locus ranked by the length of time. The length of time colored for each locus is associated with strong phylogenetic informativeness over that time period. Loci at the top of the graph are informative over a larger range of time than those at the bottom
Fig_S2_NN_Model
An artificial NN model showing input neurons characterizing each AHE locus, the single hidden neuron layer (H1), and the output variable predicting RF distances between each gene tree and the species tree (RFgtst). This diagram shows both intercept terms (bias), B1 and B2, which increases NN efficiency, and synaptic weights indicated by the thickness and bolding of lines connecting the neurons
Table_S1_SquamateAssemblySummary
Excel list of Squamata species sequenced and corresponding summary statistics of capture and sequencing success.
Table_S2_Calibration_points_for_Squamata_phylogeny
Calibration points for Squamata phylogeny