Perianth evolution and implications for generic delimitation in the Eucalypts (Myrtaceae): DNA sequences, morphological data
Crisp, Michael et al. (2023), Perianth evolution and implications for generic delimitation in the Eucalypts (Myrtaceae): DNA sequences, morphological data, Dryad, Dataset, https://doi.org/10.5061/dryad.7sqv9s4wq
Eucalyptus was traditionally defined by the operculate perianth—hence the generic name (Latin, meaning "well-covered"). But after previous phylogenetic analysis placed Angophora, which has free sepals and petals, as sister to the bloodwood eucalypts, the latter were segregated into a new genus, Corymbia. We made a targeted capture of 101 low-copy nuclear exons from 392 samples representing 329 species-level taxa. The phylogeny was estimated using maximum likelihood (IQtree and RAxML) and the multi-species coalescent (Astral). We tested alternative relationships between four genera within Eucalypteae (Arillastrum, Angophora, Eucalyptus, Corymbia) at each of two nodes critical to generic delimitation using Shimodaira's Approximately Unbiased (AU) test. Monophyly of Arillastrum + (Corymbia + Angophora) relative to Eucalyptus sensu stricto was supported whereas monophyly of Corymbia relative to Angophora was decisively rejected. These results indicate that either Eucalyptus should be expanded to include all four genera or Corymbia should be split into two. All of the alternative relationships among the four currently recognised genera imply homoplasy in perianth evolution, specifically with respect to origins of the bud cap (operculum or calyptra), which has been traditionally used to define Eucalyptus. Inferred evolutionary transitions in perianth traits are generally congruent with divergences between major clades with a single exception: expression of separate sepals and petals in Angophora, which is nested within the operculate genus Corymbia, appears prima facie to be a reversal to the plesiomorphic perianth structure. Strictly, this is not a reversal because the petals of Angophora and Corymbia have a novel compound keel-and-limb structure that is absent in the outgroups. This structure is evident in early development, irrespective of whether the petals remain free or later become part of an operculum. Many of the currently recognised infrageneric taxa down to sectional level (and below in some cases) are well-supported by the sequence data and definable by morphological traits. Inclusion of Angophora within Eucalyptus was formally proposed two decades ago but did not gain acceptance. Here instead, we formally raise Corymbia subg. Blakella to genus rank and make the relevant new combinations.
This is an aligned dataset of 101 sequences of low copy nuclear loci by 392 species-level eucalypt taxa representing the phylogenetic diversity of Myrtaceae tribe Eucalypteae.
Hereafter, the term "Eucalypts s.l." refers collectively to Angophora, Corymbia and Eucalyptus. Sampling represented all genera within Myrtaceae tribe Eucalypteae sensu Wilson et al. (2005) except Allosyncarpia, Eucalyptopsis and Stockwellia. Within the eucalypts, both subgenera of Corymbia (Parra-O et al. 2009) and five of seven subgenera of Eucalyptus—except Acerosae (E. curtisii) and Alveolata (E. microcorys), both of which are monotypic—were sampled. There were 392 species-level taxa, including eleven outgroups, which were selected based on whole-of-family phylogenies by Wilson et al. (2005) and Thornhill et al. (2012; 2015) and included species of Osbornia and Melaleuca (Melaleuceae), Backhousia (Backhousieae), Tristaniopsis (Kanieae), Syncarpia (Syncarpieae) and Arillastrum (Eucalypteae). The tree was rooted between Melaleuceae and the rest, based on earlier studies (Wilson et al. 2005; Thornhill et al. 2015).
The majority of samples were field-collected leaf tissue with vouchers lodged in the Australian National Herbarium (CANB), where the identifications were verified by co-author Slee. These were supplemented by leaf samples taken directly from CANB herbarium sheets, with permission. Samples from Currency Creek Arboretum were taken with permission from vouchered living trees (details in Thornhill et al. 2015). All taxa and accessions sampled are listed in Supplementary Table S1, and nomenclature follows Brooker (2000), as updated by Slee et al. (2020).
Target Capture and Sequencing
We used a target-capture approach aimed at identifying and sequencing up to 200 orthologous low-copy loci from the nuclear genome with potential to resolve species-level relationships across the large family Myrtaceae, as per Choi et al. (2019), Data from: Identifying genetic markers for a range of phylogenetic utility–from species to family level, Dryad, Dataset, https://doi.org/10.5061/dryad.p20km22
In plates of 48 samples, the pooled DNA library for each specimen was hybridised to the target probes using the SeqCap EZ Developer Library (NimbleGen, Madison, USA) following the manufacturer’s instructions with minor modifications detailed in Choi et al. (2019). Recovery and wash of hybridised samples was carried out using the SeqCap Hybridisation and Wash Kit (NimbleGen, Mannheim, Germany) following the manufacturer’s instructions. After indexing-PCR and purification, the captured libraries were sequenced on the Illumina Miseq platform (one pool of 48 samples) and the HiSeq2000 (all other pools) platform (100 bp paired-end read protocol) at the Bio-molecular Research Facilities at The Australian National University.
Data Handling and Mapping of Reads
The quality of the raw reads was investigated using FastQC (Andrews 2010) (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). BBduk within BBTools was used to remove Illumina adapters, low quality reads and sequences using standard parameters (trimq=30, minlen=40, ktrim=r, hdist=1, tpe tbo; http://jgi.gov/data-and-tools/bb-tools/). The cleaned reads were rechecked using FastQC. After read cleaning, axe demultiplexer was used to sort reads by barcode using standard parameters (https://manpages.debian.org/testing/axe-demultiplexer/axe-demux.1.en.html). The reads were mapped against the E. grandis targets using bwa-mem (Li et al. 2009a). SAM files were converted to BAM, sorted and indexed using samtools v1.3.1 (Li et al. 2009b). Picard was used to remove duplicates (http://broadinstitute.github.io/picard/). Finally, Platypus was used to call variants with standard parameters (Rimmer et al. 2014).
Sequence Alignment and Editing
Sequences were imported into Geneious Prime ver. 2020.1.2 (Biomatters Ltd) for assembly, alignment and editing. Initially, each locus was aligned separately across all samples using MAFFT ver. 1.4.0 (Biomatters Ltd). After trimming, alignments were adjusted by eye. This included deleting sites with > 95% missing data. A Neighbour Joining tree was generated for each locus and inspected for anomalies, such as likely chimeric sequences indicated by long, often misplaced branches. Every locus was assessed for paralogy (multiple gene copies) as indicated by systematic sharing of polymorphisms among distantly related taxa, and such loci were excluded. Randomly scattered (unshared) polymorphic base calls were assumed to indicate allelic variation and such loci were retained. Ninety-nine of the 200 targeted genes were discarded, leaving 101 putatively single copy genes. These were concatenated using Geneious. All samples with > 60% of concatenated sequence missing were culled, leaving 392 of 521 of the original eucalypt + outgroup sequences in the final alignment, which totalled 129,354 base pairs, comprising 27,100 parsimony-informative sites, 14,807 singleton sites and 87,447 constant sites. The final set of 101 loci are listed in Supplementary Table S2, identified by their labels in the annotated Eucalyptus grandis genome (Myburg et al. 2014).
Phylogenies were first estimated from the concatenated sequences of all 101 nuclear loci, initially treated as a single partition, using maximum likelihood (ML) as implemented in RAxML ver. 8.2.12 (Stamatakis 2014) on the CIPRES Science Gateway (Miller et al. 2010) with a GTR+G model. Additionally, ML analyses were run using IQtree ver. 1.6.10 (Nguyen et al. 2015), first with a single partition and then with the DNA divided into 101 partitions, each with its own model (Chernomor et al. 2016), estimated using ModelFinder (Kalyaanamoorthy et al. 2017). Node support was estimated using Ultrafast bootstrap (UFB) with 1000 replicates (Minh et al. 2013; Hoang et al. 2018), as well as site (sCF) and gene (gCF) concordance factors (Minh et al. 2020).
Mapping of Perianth Traits
The IQtrees with branch lengths were imported to Mesquite ver. 3.61 (Maddison et al. 2019) for trait mapping and hypothesis testing. Relevant trait data from Euclid edition 4 (Slee et al. 2020) were also imported to Mesquite and we defined morphological characters for testing hypotheses about perianth evolution.
The DNA sequence data file <101 loci concatenation incl half plate 29 May.phy> is in Phylip format for input to IQtree. The file <101concat-29May.nex.run1.best_scheme.nex> defines the partition boundaries (101 individual loci), as well as containing separate models for every locus, and is required for a multi-partition analysis in IQtree.
Morphological data were combined with the sequence data into another Nexus file for exploration of perianth evolution to address the question of generic definition in the eucalypts. Open with Mesquite.
Australian Research Council, Award: Discovery grant DP130101141
Australian Research Council, Award: Discovery grant DP200103151
Chan Zuckerberg Initiative, Award: EOSS4-0000000312