Molecular phylogenetic analyses reveal multiple long-distance dispersal events and extensive cryptic speciation in Nervilia (Orchidaceae), an isolated basal Epidendroid genus
Data files
Nov 20, 2024 version files 551.57 KB
-
Combined_matrix_for_calibrated_tree.nex
201.88 KB
-
Nervilia_ITS81.fasta
63.63 KB
-
Nervilia_matK.fasta
172.80 KB
-
Nervilia_trnLF90.fasta
109.64 KB
-
README.md
3.62 KB
Abstract
The terrestrial orchid genus Nervilia is diagnosed by its hysteranthous pattern of emergence but is nested among leafless myco-heterotrophic lineages in the lower Epidendroideae. Comprising at least 80 species distributed across Africa, Asia, and Oceania, the genus remains poorly known and plagued by vague and overlapping species circumscriptions, especially within each of a series of taxonomically intractable species complexes. Prior small-scale, exploratory molecular phylogenetic analyses have revealed the existence of cryptic species, but little is otherwise understood of its origin, the scale and timing of its biogeographic spread, or the palaeoclimatic factors that have shaped its ecology and given rise to contemporary patterns of occurrence. Here, we sample widely throughout the generic range, including material of equivocal identity and probable undescribed status, as well as multiple accessions referable to several widespread ‘macrospecies’, enabling an evaluation of taxonomic boundaries at both species and sectional levels. Our dated ancestral area analysis supports an origin in Africa in the Early Oligocene, with spread eastwards to Asia occurring in the Late Miocene, plausibly via the Gomphotherium land bridge at a time when it supported woodland and savanna ecosystems. An ancestral association with seasonally arid landscapes is inferred. Enormous taxonomic radiation in Asia within the last 8 million years ties in with the dramatic Himalayan-Tibetan Plateau uplift and associated intensification of the Asia monsoon. Multiple long-range migrations appear to have occurred thereafter, as the genus colonised Malesia and Oceania from the Pliocene onwards, undergoing niche differentiation to occupy the understorey of closed-canopy forest in the process. The bulk of contemporary species diversity is thus relatively recent, potentially explaining the ubiquity of cryptic speciation, which still leaves numerous species overlooked and unnamed, though widespread disjunct species pairs also hint at high mobility across continents, extinction, and a history of climate-induced vicariance. Persistent taxonomic challenges are highlighted.
https://doi.org/10.5061/dryad.tb2rbp0bn
Description of the files
Note that some of the files listed below are found in the Related Works section of this dataset.
Files for phylogenetic alignment matrix
Alignments were constructed using the MAFFT multiple alignment plugin in Geneious v11.1.4 (Kearse et al., 2012), with subsequent adjustment by eye.
- Nervilia_trnLF90.fasta
- Nervilia_matK.fasta
- Nervilia_ITS81.fasta
Files for phylogenetic tree results
Phylogenetic analysis of individual and multilocus alignments was carried out using MP in PAUP* v4.0b10 and BI in MrBayes v3.2 (Huelsenbeck & Ronquist, 2003). For MP analyses, heuristic searches were conducted with 1,000 random addition replicates followed by TBR branch swapping. All characters were unordered and equally weighted with gaps (including unavailable sequences) treated as missing data. Topological robustness was assessed using 1,000 bootstrap replicates. For BI analyses, each DNA region was assigned its own model of nucleotide substitution, as determined by the Akaike information criterion (AIC) in Modeltest v3.06 (Posada & Crandall, 1998). Four simultaneous Monte Carlo Markov Chains (MCMC) were run, sampling one tree every 1,000 generations for 30,000,000 generations, starting with a randomly generated tree. Majority rule (>50%) consensus trees were constructed after removing the first 25% of sampled trees as burn-in.
- combined_BI_tree.nwk
- Combined_MP_tree.nwk
- ITS_BI_tree.nwk
- ITS_MP_tree.tre
- cpDNA_BI_tree.nwk
- CpDNA_MP_tree.tre
- Combined_tree_MP_BI.jpg
- ITS_and_cpDNA_trees_MP_BI.jpg
Files for Ancestral area reconstruction (biogeographical analysis)
Alignment matrix, calibrated tree and results, and Nervilia.dat. Four areas of endemism were defined for biogeographic analysis, reflecting the extant distribution of Nervilia demarcated by Pridgeon et al. (2005) as well as the climatic zones discernible within this range based primarily on seasonality, which is presumed to be of importance for the hysteranthous habit (Gale et al., 2021): (area 1) tropical Africa & Madagascar, (area 2) seasonal (monsoonal) tropical Asia, (area 3) aseasonal, moist tropical Asia, and (area 4) Oceania (encompassing Australasia, Micronesia, Melanesia and Polynesia). Ancestral area reconstruction was then performed using the package BioGeoBEARS (Matzke, 2016) in R 4.3.2 (R Core Team, 2023), applying the dispersal–extinction–cladogenesis (DEC) model (Ree & Smith, 2008), ML version of Dispersal Vicariance Analysis (DIVALIKE; Ronquist, 1997) and Bayesian biogeographical inference model (BAYAREALIKE; Landis et al., 2013) with the maximum range-size parameter set to three. We tested each of these models with and without founder-event speciation, which was incorporated with J-parameter modelling jump dispersal (Matzke, 2016). All six permutations were compared using likelihood values, and the Akaike information criterion (AIC) was performed in BioGeoBEARS using the maximum clade credibility tree from the BEAST analyses described above. The best-fit model was selected based on lower corrected Akaike information criterion (AICc) values with larger weight (wAICc), representing relative support for each model (Burnham & Anderson, 2002).
- Combined_matrix_for_calibrated_tree.nex
- Calibrated_tree.nwk
- Nervilia.dat
- Spatio-temporal_reconstruction_of_Nervilia.jpg
Phylogenetic analysis
Alignments were constructed using the MAFFT multiple alignment plugin in Geneious v11.1.4 (Kearse et al., 2012), with subsequent adjustment by eye. We excluded two poly-A regions comprising 41 and 61 positions in the trnL–F and matK genes, respectively (Supplementary File S1). An incongruence length difference (ILD) test (Farris et al., 1995) was performed in PAUP* v4.0b10 (Swofford, 2003) to assess whether the individual matK and trnL–F data sets, and the ITS and combined cpDNA data sets (Supplementary File S1), reflect similar potential phylogenies; 1,000 replicates, each with 1,000 random addition sequence replicates and tree bisection-reconnection (TBR) branch swapping, were performed in each test, and a P value of <0.05 was considered significant (Sullivan, 1996; Darlu & Lecointre, 2002). A “hard” incongruence test was also performed by directly comparing respective topologies, as well as resolution, for each clade generated in the separate analyses, with bootstrap percentages (BP) of ≥85% (Chase et al., 2000) and posterior probabilities (PP) of ≥0.95 (Martínez-Azorín et al., 2011) being taken as evidence of strong support.
Both the homogeneity test for the matK and trnL-F data sets (P = 0.881) and visual node-by-node comparisons of trees generated for either region individually revealed no major topological disparities for nodes of BP ≥85% and PP ≥0.95, and so the two ptDNA regions were combined. Tree topologies generated for the individual ITS and ptDNA data sets using Bayesian inference (BI) were also largely congruent with those using maximum parsimony (MP; Supplementary File S2). However, the ILD test indicated significant incongruence between the ITS and ptDNA data sets (P = 0.001). Even so, a visual comparison of the trees generated from the two data sets uncovered no topological disparities with nodes of BP ≥85% and PP ≥0.95, except for the position of a single clade containing four samples representing three species [N. bicarinata (Blume) Schltr., N. kotschyi and N. shirensis; Supplementary File S2]. Since Cunningham (1997) and Yoder et al. (2001) have argued that combined data sets improve phylogenetic accuracy regardless of incongruence, and numerous phylogenetic studies have found that trees generated from combined data sets with or without samples responsible for topological disparities remain highly consistent (e.g. Li et al., 2011; Kumar et al., 2022), we concatenated the ITS and ptDNA data sets and interpreted the resulting combined phylograms.
Phylogenetic analysis of individual and multilocus alignments was carried out using MP in PAUP* v4.0b10 and BI in MrBayes v3.2 (Huelsenbeck & Ronquist, 2003). For MP analyses, heuristic searches were conducted with 1,000 random addition replicates followed by TBR branch swapping. All characters were unordered and equally weighted with gaps (including unavailable sequences) treated as missing data. Topological robustness was assessed using 1,000 bootstrap replicates. For BI analyses, each DNA region was assigned its own model of nucleotide substitution, as determined by the Akaike information criterion (AIC) in Modeltest v3.06 (Posada & Crandall, 1998). Four simultaneous Monte Carlo Markov Chains (MCMC) were run, sampling one tree every 1,000 generations for 30,000,000 generations, starting with a randomly generated tree. Majority rule (>50%) consensus trees were constructed after removing the first 25% of sampled trees as burn-in
Ancestral area reconstruction
In constructing a dated phylogenetic tree, a single accession was selectively retained for each taxon represented by more than one sample (Supplementary File S3). Divergence times were estimated using a Bayesian uncorrelated relaxed-clock model implemented in BEAST 2.7.6 (Bouckaert et al., 2019) with priors placed on the node for tribes Nervilieae and Gastrodieae (offset 34.93 Mya, mean:1, sigma:1) and the node for subfamilies Epidendroideae and Orchidoideae (offset 64 Mya, mean:1, sigma:1), based on results presented by Givnish et al. (2015), Li et al. (2019) and Li et al. (2022). MCMC searches were run for 50,000,000 generations and sampled every 5,000 generations, with convergence being monitored using Tracer 2.7.6 (Bouckaert et al., 2019). The effective sample sizes (ESSs) of all parameters were assessed as more than 200 and the maximum clade credibility tree was computed using treeAnnotator 2.7.6 (Bouckaert et al., 2019).
Four areas of endemism were defined for biogeographic analysis, reflecting the extant distribution of Nervilia demarcated by Pridgeon et al. (2005) as well as the climatic zones discernible within this range based primarily on seasonality, which is presumed to be of importance for the hysteranthous habit (Gale et al., 2021): (area 1) tropical Africa & Madagascar, (area 2) seasonal (monsoonal) tropical Asia, (area 3) aseasonal, moist tropical Asia, and (area 4) Oceania (encompassing Australasia, Micronesia, Melanesia and Polynesia). Ancestral area reconstruction was then performed using the package BioGeoBEARS (Matzke, 2016) in R 4.3.2 (R Core Team, 2023), applying the dispersal–extinction–cladogenesis (DEC) model (Ree & Smith, 2008), ML version of Dispersal Vicariance Analysis (DIVALIKE; Ronquist, 1997) and Bayesian biogeographical inference model (BAYAREALIKE; Landis et al., 2013) with the maximum range-size parameter set to three. We tested each of these models with and without founder-event speciation, which was incorporated with J-parameter modelling jump dispersal (Matzke, 2016). All six permutations were compared using likelihood values, and the Akaike information criterion (AIC) was performed in BioGeoBEARS using the maximum clade credibility tree from the BEAST analyses described above. The best-fit model was selected based on lower corrected Akaike information criterion (AICc) values with larger weight (wAICc), representing relative support for each model (Burnham & Anderson, 2002).
