Skip to main content

Global distribution and evolutionary transitions of angiosperm sexual systems

Cite this dataset

Wang, Yunyun et al. (2021). Global distribution and evolutionary transitions of angiosperm sexual systems [Dataset]. Dryad.


Angiosperm sexual systems are fundamental to the evolution and distribution of plant diversity, yet spatiotemporal patterns in angiosperm sexual systems and their drivers remain poorly known. Using data on sexual systems and distributions of 68453 angiosperm species, we present the first global maps of sexual system frequencies and evaluate sexual system evolution during the Cenozoic. Frequencies of dioecy and monoecy increase with latitude, while hermaphrodites are more frequent in warm and arid regions. Transitions to dioecy from other states were higher than to hermaphroditism, but transitions away from dioecy increased since the Cenozoic, suggesting that dioecy is not an evolutionary end-point. Transitions between hermaphroditism and dioecy increased, while transitions to monoecy decreased with paleo-temperature when paleo-temperature > 0 °C. Our study demonstrates the biogeography of angiosperm sexual systems from a macroecological perspective, and enhances our understanding of plant diversity patterns and their response to climate change.


Sexual systems of angiosperms

A global dataset of angiosperm sexual systems was compiled from published floras and trait databases, including efloras (, Flora of China (Wu et al. 1994-2013), Tree of Sex (Ashman et al. 2014), Plant Trait Database (TRY 2012), Botanical Information and Ecology Network (BIEN, Maitner et al. 2018), Flora Republicae Popularis Sinicae (126 issues of 80 volumes), Seeds of Woody Plants in China and others. We also compiled information from recent publications (Machado et al. 2006; Sabath et al. 2016; Goldberg et al. 2017; Perini et al. 2019). Species with conflicting records of their sexual systems in different sources were double-checked and corrected. The sexual systems of a few species likely vary (e.g., Schoen et al. 2017) in response to local biotic and abiotic conditions (e.g., climate variables or pollinator densities; Barrett & Harder 2017). To eliminate the potential influences of these species, we excluded them from the following analyses. In total, our dataset contains sexual system information for 68 453 angiosperm species from 5 550 genera and 355 families (Table S1).

We divided species into three categories based on their sexual systems according to Cardoso et al. (2018): dioecy (i.e. species with separate male and female individuals), monoecy (i.e. species with separate male and female flowers on the same plant), and hermaphroditism (i.e. species with both functional pistils and stamens within the same flower). Dioecy includes androdioecious, gynodioecious, and polygamodioecious species; similarly, monoecy includes all monoecious, andromonoecious, and gynomonoecious species. Monoecy has been widely included in comparative analyses on angiosperm sexual systems (Renner 2014). We therefore included monoecy as a separate type of sexual system in our analyses.

We also compiled information on growth forms of the studied species from published floras, online databases and peer-reviewed journal articles (see Table S2). We classified species into “woody” and “herbaceous” growth forms. Woody species included those recorded as trees, shrubs and woody lianas, while herbaceous species included herbs, herbaceous lianas and subshrubs.


Geographical patterns in the frequencies of sexual systems

To document the geographical patterns in the frequencies of sexual systems, we compiled the global distributions of the angiosperm species from published floras, checklists, online databases and peer-reviewed papers (see Table S3 for the complete list of data sources) at a spatial resolution of ca. 270 000 km2 (ca. 4 longitude × 4 latitude). The species names from different data sources were standardized following the Catalogue of Life (, accessed in May 2018), which provides accepted Latin names and synonyms for vascular plants and bryophytes.

The boundaries of geographical units used for the compilation of species distributions were taken from the Global Administrative Areas database ( To reduce the variation in the sizes of the geographical units, we used geopolitical boundaries at different levels (e.g. countries, counties, states, and provinces) for different regions. Small adjacent pollical regions were merged into larger geographical units to make the sizes of geographical unit relatively homogenous across the world. Excluding the Antarctic, we divided the entire land area of the world into 484 geographical units, and the average size of these units was ca. 270 000 km2. This approach to defining geographical units has been used in several previous studies on patterns of angiosperm diversity (i.e. Xu et al. 2019; Shrestha et al. 2018).

In order to ensure the quality of the data, the distribution maps of all species included in this study were carefully examined. Introduced distributions were removed from the database following Plants of the World Online ( The final distribution database included 942 162 occurrence records for 68 453 angiosperms. Of these, information on sexual systems, growth forms and distributions was available for 66913 species, including 27748 woody and 39165 herbaceous species (Table S1).

We estimated the proportions of species with each sexual system for each geographic unit. There are well-recognized associations between sexual system and growth forms (Vamosi et al. 2003), as well differences in functional adaptations to environmental conditions between woody and herbaceous growth forms (Petit & Hampe 2006). Consequently. we estimated the proportions of sexual systems for all species combined, as well as for woody and herbaceous species separately.


Current Climate

Previous studies have found that climate influences the phenology and resource use of sexual organs during plant reproduction (Tognetti 2012; Hultine et al. 2016). We selected several variables to represent climate in our analyses. These were: mean annual temperature (MAT), mean annual precipitation (MAP), temperature seasonality (TSN, the coefficient of variation of mean monthly temperature), precipitation seasonality (PSN, the coefficient of variation of mean monthly precipitation). These variables have been used in previous studies on sexual systems (Wang et al. 2020a).

We used the anomaly of mean annual temperature and mean annual precipitation since the Last Glacial Maximum (LGM, ca 18 000–22 000 yr. BP) (MATano and MAPano, respectively) to evaluate the effects of Quaternary climate change on the distribution of angiosperm sexual systems (Araújo et al. 2008). MAT, MAP, TSN, and PSN with a spatial resolution of 1 × 1 km (Hijmans et al. 2005) for the period 1970–2000 were downloaded from the WorldClim website ( The climate variables for each geographical unit (ca. 270 000 km2) were estimated as the average of all 1 × 1 km cells within it. MATano and MAPano were calculated as the difference in MAT and MAP between the present and the LGM, and were used to represent the change in mean annual temperature and mean annual precipitation since the LGM respectively.


Paleo-temperature data

Most extant angiosperm species diversified during the Cenozoic (from 64 Million years ago [Mya] to the present), a period that experienced dramatic global climate and tectonic changes (Zachos et al. 2001). Climate change has been found to affect gender-specific resource demand and allocation, and may have further led to shifts among sexual systems (Etterson & Mazer 2016). To evaluate the effects of paleo-temperature fluctuations on the rate of transition between sexual systems during the Cenozoic, we used the global mean temperature (i.e. the global mean temperature over ice-free oceans per Mya estimated from oxygen isotopic abundances in ocean sediment cores since 64 Mya until the present, Zachos et al. 2001) as a measure of long-term global temperature change. This dataset of global mean temperature has been widely used in biogeographical and paleoclimate studies (Li et al. 2014; Turk et al. 2020).


Angiosperm phylogenies

We used the dated mega phylogeny of angiosperm species (353 185 tips) constructed by Smith & Brown (2018). The backbone of this phylogeny was constructed using molecular data from GenBank on 79 881 taxa. Species lacking sequence data were inserted into the phylogeny as polytomies following the resolution provided by the Open Tree of Life (Smith & Brown 2018). This phylogeny has been widely used in biogeographic and macroecological studies (Weigelt et al. 2020). To reduce the possible influences of polytomies on the estimation of phylogenetic analyses, we resolved the polytomies along the tips of the phylogeny using a Yule bifurcation process (Kuhn et al. 2011; Roquet et al. 2013). After matching the species names with sexual system information with the phylogeny a total of 61 230 species were retained (Table S1).


Statistical analyses

We first used beta regression (Cribari-Neto & Zeileis 2010) to assess the effects of each predictor on the global patterns of sexual system proportions per geographic unit for all species combined, as well as for woody and herbaceous species separately. We used modified t-tests that could account for the effect of spatial autocorrelations to test the significance of the predictors (Clifford et al. 1989).

To examine potential biases in estimates of the proportions of each sexual system per geographic unit caused by unequal sampling effort across regions, we first calculated the sampling proportion as the ratio between the richness of species with sexual system data and the total species richness within each geographic unit. We then used beta regression to examine the relationship between the proportion of each sexual system per geographic unit and the proportion of sampled species. A modified t-test indicated that these two variables were not correlated with each other (Fig. S1 & S2). This suggests that uneven sampling effort across space did not affect the estimated geographic pattens in proportions of sexual systems.

We used the rayDISC function of the R package corHMM (Beaulieu et al. 2013) to reconstruct the ancestral states. The rayDISC function fits a model for the evolution of multi-state categorical traits, allowing for polymorphisms and incompletely resolved trees. For the reconstruction, we fitted three different models that assumed different evolutionary scenarios. The ER model assumes that all transition rates are equal, the SYM model assumes that forward and reverse transitions share the same parameter, and the ARD model assumes that all transition rates are different.

It has been suggested that sexual systems may have influence on speciation in angiosperms (e.g. Heilbuth 2000; but see Goldberg et al. 2017). Therefore, we also estimated ancestral sexual system states using state-dependent speciation and extinction (SSE) models. Specifically, we used stochastic character mapping and HiSSE models (with both three and two hidden states separately, see Table S4 for the details on the priors) in RevBayes (Höhna et al. 2016). The HiSSE model accounts for the impact of possible state-dependent (both the observed and hidden states) diversification rates on ancestral-state reconstructions, does not assume homogenous transition rates across the phylogeny (Beaulieu & O'Meara 2016) and takes into account incomplete taxon sampling. An additional advantage of HiSSE is that it does not suffer from the high sensitivity to model misspecification reported for SSE models that do not consider hidden states (Beaulieu & O'Meara 2016). Each HiSSE analysis consisted of two independent runs each generating 2500 stochastic maps, with the first 100 generations used to tune parameters. The results were examined for convergence and effective sample size after discarding 25% of the samples from the posterior as burn-in.

Additionally, to assess the proportion of significant character associations that might be recovered by chance (Type I error) based on the number of character states and tips in our tree, we simulated stochastic character histories using the sim.history function of the phytools package in R (Revell 2012). We ran simulations for 1000 generations under the ER and the ARD models using equal and FitzJohn (FitzJohn et al., 2009) priors for root state frequencies.

Based on the ancestral state reconstructions, we counted the proportion of branches reconstructed with each sexual system in every one-million-year time interval, and estimated temporal changes. We estimated the temporal changes in the transition rates between different sexual systems. The transitions between the three sexual systems were grouped into three categories: 1) from dioecy to monoecy or to hermaphroditism (D®M and D®H, respectively); 2) from hermaphroditism to dioecy or to monoecy (H®D and H®M, respectively); and 3) from monoecy to hermaphroditism or to dioecy (M®H and M®D, respectively). We further evaluated the effect of paleo-temperature on the temporal changes in the frequency of each sexual system and the frequency of transitions between sexual systems using beta regressions.

The ER, SYM, and ARD models yielded consistent results on the temporal changes in the frequency of sexual systems and transition rates among sexual systems. The ARD model had the lowest Akaike information criterion (AIC) value (AIC values were 18442, 17621, and 17000 for ER, SYM, and ARD models, respectively). Stochastics maps built using HiSSE models with either two or three hidden states also yielded estimates of the transition rates among sexual systems consistent with the rayDISC ARD model. Simulations based only on root character state prior (either equal state probability or FitzJohn), number of tips and topology produced significantly different patterns compared with the analyses based on the actual character dataset (Fig. S3 & S4), which indicates that our results are not an analytical artifact. Therefore, we show the results from the ARD model in the main text. For reference, results from all other models were shown in the supplementary information (Fig. S5 & S6).

Our full dataset contained 61 230 species, which represent about 25% of the 261 750 total species accepted in the Angiosperm Phylogeny Website (Stevens, 2001 onward). In order to assess the reliability of transition estimates given the large fraction of missing taxa, we randomly generated 100 subsamples with the same proportion (i.e. 25%) of the species in our full dataset (n = 15 308) but balanced the proportion of sexual systems (i.e. 77-80% for hermaphroditism and 6-7% for dioecy) following Igea & Tanentzap (2020). We re-ran the transition analyses for each of the 100 subsamples, then calculated the mean results and the 95% confidence intervals. By comparing the estimates obtained from our full dataset with those generated by this random sampling procedure, we found that the results from both datasets were highly consistent (Fig. S7).

All analyses were conducted in R 3.5.3 (The R Core Team, 2019).



Strategic Priority Research Program of Chinese Academy of Sciences, Award: XDB31000000

Ministry of Science and Technology of the People's Republic of China, Award: 2017YFA0605101

National Natural Science Foundation of China, Award: 31901216,31988102

National Natural Science Foundation of China, Award: 2020JJ5977

Norwegian Metacenter for Computational Science, Award: NN9601K

Strategic Priority Research Program of Chinese Academy of Sciences, Award: XDB31000000

Natural Science Foundation of Hunan Province, Award: 2020JJ5977

Norwegian Metacenter for Computational Science, Award: NN9601K