Character set and phylogenetic analyses of the living and fossil egerniine scincids of Australia 2. Author Information A. Principal Investigator Contact Information Dr Kailah Thorn EdCC Earth Science Museum, University of Western Australia 35 Stirling Highway CRAWLEY WA 6009 kailah.thorn@uwa.edu.au B. Co-investigator Contact Information - for enquiries about 3D scans used Dr Mark Hutchinson South Australian Museum North Terrace, ADELAIDE SA 6000 mark.hutchinson@samuseum.sa.gov.au C. Co-investigator Contact Information - for enquiries about phylogenetic methods Prof Mike Lee College of Science and Engineering, Flinders University Sturt Road, BEDFORD PARK SA 5049 mike.lee@flinders.edu.au 3. Date of data collection : March 2017 - November 2019 4. Geographic location of data collection : Australia (various locations) 5. Information about funding sources that supported the collection of the data: KMT was supported by an Australian Postgraduate Research Training Stipend. The Mark Mitchell Foundation funded part of the fieldwork component of this project, and Scanning Electron Microscope time. SHARING/ACCESS INFORMATION Licenses/restrictions placed on the data: Please site this data set (see below) and the primary publication Thorn et al. 2021 (see below) if you use any of the files or data within. Links to publications that cite or use the data: Thorn, K.M., Hutchinson, M. N., Lee, M. S. Y., Brown, N., Camens, A. B., and Worthy, T.H. 2021. A new species of Proegernia from the Namba Formation in South Australia and the early evolution and environment of Australian egerniine skinks. Royal Society Open Science Recommended citation for this dataset: Thorn, K.M., Hutchinson, M. N., Lee, M. S. Y., Camens, A. B., and Worthy, T.H. 2021. Character set and phylogenetic analyses of the living and fossil egerniine scincids of Australia. Dryad, Dataset, https://doi:10.5061/dryad.3n5tb2rg7 DATA & FILE OVERVIEW File List: Bayesian files Namba_4521_37_100milruns.xml Namba_4521_37_Cont48UCLN_DisgammaULCN_NoOzCal_Run1.log Namba_4521_37_Cont48UCLN_DisgammaULCN_NoOzCal_Run2.log Namba_4521_37_Cont48UCLN_DisgammaULCN_NoOzCal_Run3.log Namba_4521_37_Cont48UCLN_DisgammaULCN_NoOzCal_Run4.log Namba_4521_37_Run1.trees Namba_4521_37_Run2.trees Namba_4521_37_Run3.trees Namba_4521_37_Run4.trees Namba_4521_37_Final_CONSENSUS.trees Namba_4521_37_Final_CONSENSUS.tree Namba_4521_37_Final_CONSENSUS.svg Maximum Parsimony files Namba_4521_46.txt Namba_4521_46.log Namba_4521_46.tre Namba_4521_46.tre.svg Namba_4521_46_Bootstrap.txt blockboot.run bootblock.tre Namba_4521_46-2.log - bootstrap log file Namba_Bootstrap_tree.PNG Namba_4521_46_branchlengths.tre Namba_4521_46_Branchlengths.svg Parsimony_tree.png Matrices (raw data and view files) Namba_matrices_for_Mesquite.txt Namba_continuous_data.xlsx Additional related data collected that was not included in the current data package: All Micro-CT scans are archived in the digital repository of the South Australian Museum, see Dr Mark Hutchinson to enquire about access. METHODOLOGICAL INFORMATION In order to better understand the timing of the Australian colonisation by the Egerniinae, both molecular and morphological data (including fossils) are required to generate tip-dated phylogenies. Undated parsimony and tip-dated Bayesian analyses infer, respectively, the phylogeny with the least homoplasy, and the most probable dated phylogeny. Morphological characters Morphological characters used in the following analyses consisted of 102 discrete and 48 continuous traits, forming an expanded matrix from Thorn et al. (Thorn, Hutchinson et al. 2019). The expanded character list is included. Continuous characters, derived from the measurements of the individual bones or teeth from the dentaries and maxillae, were taken from either Micro-CT scan data in Avizo Lite (v. 9.0) or SEM at Flinders Microscopy, to the nearest micrometre, or with digital callipers to the nearest ten micrometres. All measurements were converted to ratios of either dentary or maxilla length to standardise for size. Continuous character states were linearly scaled to values spanning 0–2 to replicate the mean number of discrete character states (three), for analyses in both TNT (Goloboff and Catalano 2016) and BEAST 1.8.3 (Drummond, Suchard et al. 2012), so that they do not have a disproportionate weight. Molecular partitions Molecular data sourced from Tonini et al. (Tonini, Beard et al. 2016) and Gardner et al. (Gardner, Hugall et al. 2008) were analysed using Partition Finder 2 (Lanfear, Frandsen et al. 2016) to find optimal partitions and substitution models [15]. The same six molecular (gene) partitions, 12s (412 base pairs [bps]), 16s (681 bps), ND4 (693 bps), BDNF (699 bps), CMOS (835 bps) and B-fibrinogen (1051 alignable bps) and substitution models chosen in that study (Thorn, Hutchinson et al. 2019) are used again here. Maximum Parsimony The parsimony analyses for the combined discrete morphological, continuous morphological, and molecular data were performed using TNT v.1.5 (Goloboff and Catalano 2016). Eutropis multifasciata was set as the most distant outgroup following the phylogenetic interpretations of Gardner et al. (Gardner, Hugall et al. 2008) and Thorn et al. (Thorn, Hutchinson et al. 2019). The most parsimonious tree (MPT) for the combined data was found using 1000 replicates of tree-bisection-reconnection (TBR) with up to 1000000 trees held. To assess clade support, 200 partitioned bootstrap replicates (with discrete characters, continuous characters, and each gene locus treated as a separate resampling partition), were performed using TNT, using new search methods (XMULT) with 1000 replicates and 1000000 trees held. The MPT and bootstrap trees from TNT were exported in nexus format, and continuous and discrete characters were traced (in Mesquite; Maddison and Maddison 2017). The executable files for finding the Most Parsimonious tree, and for performing 200 reps of Partitioned Bootstrap resamples can be found in the SI data files Namba_Egerniines_Topology.tnt (MPT file) and Namba_Egerniines_PartitionedBootstrap.tnt. Bayesian analysis The discrete and continuous morphological data, and molecular data were simultaneously analysed in BEAST v1.8.4 using tip-dated Bayesian approaches (Drummond, Suchard et al. 2012). Eutropis multifasciata was again set as the furthest outgroup. Polymorphic discrete morphological data were treated exactly as coded rather than as unknown, i.e. if coded as states (0,1) it was treated as 0 or 1, but not 2. The discrete character set was analysed using the Mkv-model with correction for non-sampling of constant characters (Lewis 2001, Alekseyenko, Lee et al. 2008). Despite recent disputes over the effectiveness of this model (Goloboff, Pittman et al. 2018), it is well-tested (Wright and Hillis 2014, O'Reilly, Puttick et al. 2016) and is still widely accepted and applied to morphological data (Harmon 2019). Continuous characters, transformed to span values between 0 and 2, were analysed with the Brownian motion model. Bayes factors were used to test the need to accommodate among-character rate variability for both discrete and continuous morphological characters (i.e. gamma parameter). The stratigraphic data used for tip-dating analyses were derived from fossil taxa and their associated stratigraphy noted in Table 1. No node age constraints were imposed in this analysis, all dates are retrieved from the morphological and stratigraphic age ranges from the noted fossil taxa (tips). The most appropriate available model in BEAST v.1.8.4, birth-death serial sampling (Stadler 2010), was applied. An uncorrelated relaxed clock  (Drummond, Ho et al. 2006) was separately applied to the molecular and morphological data. Each Bayesian analysis was run for 100,000,000 generations with a burn-in of 20%. The analysis was conducted four times to confirm stationarity. The post-burnin samples of all four runs were examined in Tracer 1.7.1 (Rambaut, Drummond et al. 2018) to ensure convergence was achieved. All four runs were combined in LogCombiner, and the consensus tree produced by TreeAnnotator (Drummond, Suchard et al. 2012). The executable .xml file for BEAST, all output log files, and the final consensus tree file (.tree) are available as supplementary information. References Alekseyenko, A. V., C. J. Lee and M. A. Suchard (2008). "Wagner and Dollo: a stochastic duet by composing two parsimonious solos." Systematic Biology 57(5): 772–784. Drummond, A. J., S. Y. W. Ho, M. J. Phillips and A. Rambaut (2006). "Relaxed phylogenetics and dating with confidence." PLOS Biology 4(5): e88. Drummond, A. J., M. A. Suchard, D. Xie and A. Rambaut (2012). "Bayesian phylogenetics with BEAUti and the BEAST 1.7." Molecular Biology and Evolution 29(8): 1969–1973. Gardner, M. G., A. F. Hugall, S. C. Donnellan, M. N. Hutchinson and R. Foster (2008). "Molecular systematics of social skinks: phylogeny and taxonomy of the Egernia group (Reptilia: Scincidae)." Zoological Journal of the Linnean Society 154(4): 781–794. Goloboff, P. A. and S. A. Catalano (2016). "TNT version 1.5, including a full implementation of phylogenetic morphometrics." Cladistics 32(3): 221–238. Goloboff, P. A., M. Pittman, D. Pol and X. Xu (2018). "Morphological data sets fit a common mechanism much more poorly than DNA sequences and call into question the Mkv model." Systematic Biology 68(3): 494–504. Harmon, L. J. (2019). Phylogenetic comparative methods. Published Online, Luke Harmon. Lanfear, R., P. B. Frandsen, A. M. Wright, T. Senfeld and B. Calcott (2016). "PartitionFinder 2: new methods for selecting partitioned models of evolution for molecular and morphological phylogenetic analyses." Molecular Biology and Evolution 34(3): 772–773. Lewis, P. O. (2001). "A likelihood approach to estimating phylogeny from discrete morphological character data." Systematic Biology 50(6): 913–925. Maddison, W. and D. Maddison (2017). Mesquite: a modular system for evolutionary analysis. Version 3.2. O'Reilly, J. E., M. N. Puttick, L. Parry, A. R. Tanner, J. E. Tarver, J. Fleming, D. Pisani and P. C. Donoghue (2016). "Bayesian methods outperform parsimony but at the expense of precision in the estimation of phylogeny from discrete morphological data." Biology Letters 12(4): 20160081. Rambaut, A., A. J. Drummond, D. Xie, G. Baele and M. A. Suchard (2018). "Posterior summarisation in Bayesian phylogenetics using Tracer 1.7." Systematic Biology 67(5): 901–904. Stadler, T. (2010). "Sampling-through-time in birth–death trees." Journal of Theoretical Biology 267(3): 396–404. Thorn, K. M., M. N. Hutchinson, M. Archer and M. S. Y. Lee (2019). "A new scincid lizard from the Miocene of Northern Australia, and the evolutionary history of social skinks (Scincidae: Egerniinae)." Journal of Vertebrate Paleontology 39(1): e1577873. Tonini, J. F. R., K. H. Beard, R. B. Ferreira, W. Jetz and R. A. Pyron (2016). "Fully-sampled phylogenies of squamates reveal evolutionary patterns in threat status." Biological Conservation 204, Part A: 23–31. Wright, A. M. and D. M. Hillis (2014). "Bayesian analysis using a simple likelihood model outperforms parsimony for estimation of phylogeny from discrete morphological data." PLoS One 9(10): e109210. FILE NAME TRANSLATIONS: .svg files: Resulting trees formatted for export as vector graphics for the purpose of making figures for publication. Run files (either .xml for Bayesian or .txt for MP or Bootstrap: Project_numberofcharacters_numberoftaxa_numberofruns.file File containing all trees used to make consensus tree: Project_numberofcharacters_numberoftaxa_Run#.trees Final consensus tree: Project_numberofcharacters_numberoftaxa_CONSENSUS.tree Log files: Project_numberofcharacters_numberoftaxa_clockmodels_calibrations.log (UCLN= Uncorrellated log normal) The file 'Namba_matrices_for_Mesquite.txt' can be opened in the freely available Mesquite software (Maddison and Maddison, 2017), and allows the morphological characters to be traced across the MP tree. The Excel spreadsheet containing the continuous data (Namba_continuous_data.xlsx) includes the xml codes required to convert the morphological data set for import into xml file for analyses in BEAST. Blockboot files: Must be in same folder as bootstrap run file to execute analyses. Open file in text manager for related information. DATA-SPECIFIC INFORMATION FOR PHYLO RUN FILES: Morphological characters are illustrated in the supplementary information file to Thorn et al. 2021 in Royal Society Open Science. Mesquite (Maddison and Maddison 2017) was used to record character states for all analyses and to trace characters across the final maximum parsimony tree. Files for maximum parsimony analyses were exported from Mesquite as .txt files and formatting follows the guidelines provided for TNT (Goloboff and Catalano 2016). Files for import into BEAST to run the Bayesian Analyses are formatted as .xml files and have been constructed in Notepad++ and can be read in any text reading program. Character matrices were extracted from Mesquite, and Microsoft excel was used to construct format them appropriately for the xml file. See the Excel spreadsheet for those methods.