Molecular systematics of the tribe Physarieae (Brassicaceae) based on the nuclear ITS, Luminidependens, and chloroplast ndhF: Sequence alignments, trees, and supplemental figures
Fuentes-Soriano, Sara; Kellogg, Elizabeth (2021), Molecular systematics of the tribe Physarieae (Brassicaceae) based on the nuclear ITS, Luminidependens, and chloroplast ndhF: Sequence alignments, trees, and supplemental figures, Dryad, Dataset, https://doi.org/10.5061/dryad.w0vt4b8pc
Physarieae is a small tribe of herbaceous annual and woody perennial mustards that are mostly endemic to North America, with its members including a large amount of variation in floral, fruit and chromosomal variation. Building on a previous study of Physarieae based on morphology and ndhF plastid DNA, we reconstructed the evolutionary history of the tribe using new sequence data from two nuclear markers, and compared the new topologies against previously published cpDNA-based phylogenetic hypotheses. The novel analyses included ca. 420 new sequences of ITS and LUMINIDEPENDENS (LD) markers for 39 and 47 species, respectively, with sampling accounting for all seven genera of Physarieae, including nomenclatural type species, and 11 outgroup taxa. Maximum parsimony, maximum likelihood, and Bayesian analyses showed that these additional markers were largely consistent with the previous ndhF data that supported the monophyly of Physarieae and resolved two major clades within the tribe, i.e. DDNLS (Dithyrea, Dimorphocarpa, Nerisyrenia, Lyrocarpa, and Synthlipsis) and PP (Paysonia and Physaria). New analyses also increased internal resolution for some closely related species and lineages within both clades. The monophyly of Dithyrea and the sister relationship of Paysonia to Physaria was consistent in all trees, with the sister relationship of Nerisyrenia to Lyrocarpa supported by ndhF and ITS, and the positions of Dimorphocarpa and Synthlipsis shifted within the DDNLS Clade depending on the employed data set. Finally, using the strong, new phylogenetic framework of combined cpDNA + nDNA data, we discussed standing hypotheses of trichome evolution in the tribe suggested by ndhF.
Taxon Sampling—The study included a total of 47 putatively diploid species of Physarieae, each represented by two different accessions, selected to capture a broad range of morphological, geographic and cytogenetic variation. Total sampling consisted of 110 accessions for the tribe and 11 outgroup species chosen based on family-level analyses and belonging to tribes Alysseae, Boechereae, Camelineae, Crucihimalayeae, Descurainieae, Halimolobeae, and Lepidieae (see Appendix 1 for details; Bailey et al. 2006; Beilstein et al. 2006, 2008; Warwick et al. 2008; Couvreur et al. 2010; Warwick et al. 2010; Huang et al. 2015; Nicolov et al. 2019). Sequences of seven outgroup taxa were downloaded from GenBank including Arabidopsis thaliana (L.) Heynh., Capsella bursa-pastoris (L.) Medik., Descurainia sophia (L.) Webb ex Prantl, Exhalimolobos berlandieri (E.Fourn.) Al-Shehbaz & C.D.Bailey, Neslia paniculata (L.) Desv., Pennellia longifolia (Benth.) Rollins and Transberingia bursifolia (DC.) Al-Shehbaz & O'Kane.
DNA Isolation—DNA was extracted from silica-gel dried leaves following the standard CTAB protocol (Doyle and Doyle 1987) and purified in cesium-chloride – ethidium-bromide gradients by ultracentrifugation or from herbarium material following Stefanović et al. (2002).
DNA Amplification—The ITS nuclear data set comprised the 5.8S gene flanked by the internal transcribed spacers ITS1 and ITS2, and was amplified with the primers ITS4 (White et al. 1990) and ITS18S (Howarth et al. 2003). The LD sequences extend from intron 4 to exon 7 and were amplified with LD-D1F and LD-XC4R primers (Slotte et al. 2006).
ITS and LD PCR reactions were performed in 25 µL total volume including 5 µL of 5× reaction buffer, 2 µL of 2.5 mM MgCl2, 2 µL (ITS) or 3 µL (LD) of 2.5 mM dNTPs, 3 µL of each primer 10 µM, 0.5 µL of Taq polymerase (5 units/µL) (Promega, Madison, Wisconsin), and 0.5 µL of DMSO. Cycling reactions for ITS and LD were 4 minutes at 95°C (ITS) or 2 minutes at 94°C (LD), 34 (ITS) or 35 (LD) cycles of 30 seconds at 94°C, 1 minute at 55°C (ITS) or 1.5 minutes at 57°C (LD), 1.5 (ITS) or 2 (LD) minutes at 72°C, and finally 7 (ITS) or 9 (LD) minutes at 72°C. The PCR products of both nDNA regions were purified with a QIAquick gel extraction kit (Qiagen Inc., Redwood City, California).
Cloning and Sequencing—In a pilot study, 20 accessions representing ten species of Physarieae and exhibiting a wide range of morphological variation were investigated to identify locus copy number, orthologous regions and variation in nuclear loci. After this initial assessment a minimum of two clones was screened for all 103 accessions representing the remaining species. PCR fragments from at least two separate PCR reactions were cloned to eliminate labeling and pipetting errors and sequenced following Sambrook et al. (1989) and Mathews et al. (2000). Sequence reactions used the fluorescent ABI Prism Big Dye 3.1 (Applied Biosystems, Foster City, California) to label the DNA for analysis in an ABI 3100 (Applied Biosystems) sequencer at the University of Missouri-St. Louis or at the PennState University Nucleic Acid Facility (State College, Pennsylvania). Universal primers T7 and SP6 were used for sequencing both ITS and LD.Sequence Editing and Alignment—SeqMan version 4 (DNASTAR, Madison, Wisconsin) and GENEIOUS version 4.0.2 (Drummond et al. 2009) were used for editing and contig assembly. Only double-stranded sequences with at least 85% overlap and Phred scores above 20 as estimated by PhredPhrap (Ewing and Green 1998; Ewing et al. 1998) and 4peaks version 1.7 (Griekspoor and Groothuis 2005) were considered good quality sequences and accepted for analysis. Base pairs with scores below 20 were eliminated from the analysis except when they matched the complementary strand with Phred scores above 20.
Sequence identities were confirmed by comparing sequences of available Brassicaceae accessions deposited in GenBank. Nucleotide sequences of both ITS and LD sequences were initially aligned in MUSCLE (Robert 2004), followed by manual alignment using MacClade version 4 (Maddison and Maddison 2005). Alignment of LD exons was guided by identification of open reading frames, exon positions and stop codons in MacClade and protein alignment using Arabidopsis thaliana as a reference species in MUSCLE and GENEIOUS.
Phylogenetic Analyses—The g1 statistics were obtained from each data set to distinguish phylogenetic signal from random noise (Hillis and Huelsenbeck 1992), and the test was performed with 10,000 replicates as implemented in PAUP version 4.04b. Data sets were examined individually and combined. Pairwise comparisons of data sets included only those taxa in common for the combined partitions. Conflict among data sets was evaluated before merging them using a partition homogeneity test or incongruence length difference (ILD) test (Farris et al. 1994). The ILD tests were conducted in PAUP version 4.04b with all invariant characters removed (Cunningham 1997), simple addition sequence, TBR branch swapping, and MAXTREES set to 500 random partitions. For each of the pairwise data partitions, 500 random partitions were analyzed as recommended by Johnson and Soltis (1998).
Phylogenies were constructed using maximum parsimony (MP) with all characters equally weighted, maximum likelihood (ML), and Bayesian analysis (BI), and indels coded as missing data. Analyses were run on the Cyberinfrastructure for Phylogenetic Research (CIPRES) cluster computer housed at the San Diego Supercomputer Center, University of California (http://www.phylo.org) and on the Beowulf computer cluster at the University of Missouri-St. Louis. Parsimony ratchet searches (Nixon 1999) were conducted using PAUPMacRat (Sikes and Lewis 2001) implemented in PAUP version 4.04b 10 for UNIX (Swofford 2002). Searches consisted of 20 independent replicates of 200 iterations, each with 15% of the characters re-weighted per iteration, and the strict consensus of the resulting trees was generated in PAUP. Bootstrap analysis was used to evaluate the support of specific branches and clades (Felsenstein 1985). Bootstrap values were calculated with 1000 full heuristic bootstrap replicates, one random sequence addition, tree-bisection-reconnection (TBR) branch swapping, and MULTREES = yes options.
For each individual gene data set ML analyses used the best-fitting evolutionary model selected by Modeltest version 3.6 (Posada and Crandall 1998) according to the Akaike Information Criteria (AIC), TrN + G for ITS and HKY85 + I + G for LD. Likelihood replicates (1000) were run on CIPRES and using RAxML bootstrapping (Stamatakis et al. 2008). The ML analyses of combined data sets were estimated as a single partition under the GTR + G model of evolution in RAxML, and as partitioned data sets following the method proposed by Meerow et al. (2009) using models of evolution and Treefinder scripts generated in KAKUSAN4 version 2 (Tanabe 2007). Scripts were implemented in Treefinder to run ML analyses (Jobb 2008). The latter strategy allowed parameters to be optimized independently among different genes included in a combined data set.
In the BI analyses, MrModeltest 2.2 (Nylander 2004) selected models GTR + G for ITS and HKY + I + G for LD, and those algorithms were implemented for single gene and combined data sets in MrBayes version 3.1 (Huelsenbeck and Ronquist 2001). The BI analyses were conducted with two independent runs of four chains for 5,000,000 generations per run (sampling every 1000 generations). Convergence across runs was evaluated by plotting-log likelihood against the number of generations. The data reached convergence within the first 100,000 generations, but the first 200,000 generations of each run were conservatively discarded as the burn in. Bayesian posterior probabilities were obtained from the majority-rule consensus trees generated in PAUP.
Initial phylogenetic analyses included sequences of all clones (ITS: 224 clones and LD: 219 clones). To minimize computational effort and reduce redundancy, clones were pruned from the original data sets according to the following four rules: (1) a single clone was chosen at random from a well resolved and monophyletic species clade to respresent a species if sequences differ only by no more than three base pairs (bp); (2) the clone resolved in the shortest branch length of a well-supported and resolved species clade was chosen to represent the species; (3) if accessions representing a species were found paraphyletic one clone for each accession was chosen to represent the species; (4) if accession representing a species were found unresolved in a polytomy with at least one other species, one clone per species was chosen at random to represent each species within that clade.
When clones from the same accession failed to form a monophyletic group, a careful check was made to identify potential errors due to contamination or labeling mistakes, and for the presence of conflicting phylogenetic signal using Splits graphs in the software Splits Tree version 4.3 (Huson and Bryant 2006). Sequences with conflicting phylogenetic signals were included in the initial phylogenetic analyses to investigate their effects on the tree topologies.
Likelihood Topology Tests—Topological congruence and evolutionary hypotheses were evaluated using the Shimodaira-Hasegawa test (S-H test, Shimodaira and Hasegawa 1999) in PAUP version 4.04b. To obtain full-taxon compatibility among the tree topologies, we reduced the nuclear and chloroplast data sets to 48 taxa and re-ran the phylogenetic analyses. The S-H test included comparison between the optimal trees (unconstrained) from the maximum likelihood analyses of ITS, LD, combined ITS + LD, and chloroplast ndhF. We also created less-resolved constraint trees that included only well-supported nodes either with > 50% or > 70% bootstrap support. Poorly supported nodes were defined as having less than 50% bootstrap support and were considered ambiguous polytomies.
Additional S-H tests were run using the combined data sets to test support for particular relationships and estimates of character evolution suggested by previous studies. Putative sister relationships were tested for Dithyrea + Dimorphocarpa as suggested by Rollins (1979) and supported by earlier ndhF data (Fuentes-Soriano and Al-Shehbaz 2013), Synthlipsis with all other members of the DDNLS Clade suggested by the ndhF data (Fuentes-Soriano and Al-Shehbaz 2013), and the alliance of Synthlipsis, Nerisyrenia, and Lyrocarpa as suggested by Bacon (1978). Hypotheses of trichome evolution in the tribe suggested by ndhF data (Fuentes-Soriano 2010) were also tested. Constraint trees were created to force the tested group to be monophyletic, while the rest of the taxa were placed at the base of a completely unresolved tree using MacClade 4.08 (Maddison and Maddison, 2005). Constraint trees were compared to the ML unconstrained trees. If likelihood values for the topologies being compared were not significantly different, each topology was considered an equally likely phylogenetic hypothesis. The S-H test was run with full optimization and 1000 bootstrap replicates.