The evolutionary dynamics of plant mating systems: how bias for studying ‘interesting’ plant reproductive systems could backfire
Data files
May 23, 2025 version files 904.13 KB
-
Analysis_MatingSyst.R
15.03 KB
-
Input_Data_MS.xlsx
684.49 KB
-
Output_Data_MS.csv
200.58 KB
-
README.md
4.03 KB
Feb 02, 2026 version files 904.47 KB
-
Analysis_MatingSyst.R
15.03 KB
-
Input_Data_MS.xlsx
684.83 KB
-
Output_Data_MS.csv
200.58 KB
-
README.md
4.03 KB
Abstract
Abstract Background and Aims
An “abominable mystery”: angiosperm sexual systems have been a source of both interest and frustration for the botanical community since Darwin. The evolutionary stability, overall frequency, and distribution of self-fertilization and mixed-mating systems have been explored in a variety of studies. However, there has been no recent study which directly addresses our knowledge of mating systems across families, the adequacy of existing data, or the potential for biases.
Scope
Here we present an updated dataset of mating systems across flowering plants covering 6,781 species and 212 families based on a synthesis of existing reviews and an original literature review using Web of Science. We assess the adequacy of this data by evaluating for bias indicating enrichment of certain families or sexual systems.
Key Results
We find that the vast majority of our data on mating systems comes from a small number of disproportionally sampled families, and that families with significant proportions of dioecious or monoecious species are much more likely to be undersampled.
Conclusions
Our results show that the frequency of selfing in angiosperms is overestimated, possibly due to increased research interest in selfing and mixed-mating systems. This suggests that systematic study bias may mean we know less about this vital facet of plant life than we think.
Data files hosted by Dryad
- Data_Dryad.zip 70.76 MB
README: The evolutionary dynamics of plant mating systems: how bias for studying ‘interesting’ plant reproductive systems could backfire
Dataset DOI: 10.5061/dryad.cc2fqz6hr
Description of the data and file structure
We conducted a literature review to synthesize efforts to quantify mating systems across angiosperms. This data allowed us to examine patterns of potential bias in the existing data based on factors such as family or sexual system, as well as how the data available on plant mating systems has changed over time. Broadly, we revisited the question of the underlying distribution of selfing and the adequacy of our existing data to unravel the complex dynamics of plant mating system evolution.
Files and variables
Supplemental files hosted on Zenodo:
Files: "Data_Dryad.zip"
Description: This zip file contains a folder named “WFO_Data” with three .xlsx files: 1) “PlantsWFOdatabasepart1.xlsx”, 2) “PlantsWFOdatabasepart2.xlsx”, and 3) “PlantsWFOdatabasepart2.xlsx”. These are datasets originate from World Flora Online, and are used with the R package “U.Taxonstand ” (Zhang et al. 2022, https://doi.org/10.1016/j.pld.2022.09.001).
- Variables are: "ID," the unique identifier from World Flora Online; "NAME," the genus and species name; "AUTHOR," the naming author for the species, "RANK," the taxonomic rank as a numerical value (where a binomial name would be 2, trinomial 3, etc), "ACCEPTED_ID," the WFO record number for the accepted ID, and "FAMILY," the family name. This is the standard input format for taxonomic databases as inputs to the U.Taxonstand package.
Files hosted by Data Dryad:
File: “Analysis_MatingSyst.R”
An annotated R script titled “AnalysisMatingSyst.R”, which was used to analyze the input data provided here the file "Input_Data_MS.xlsx".
File: “Input_Data_MS.xlsx”
An Excel workbook with three tabs: 1) “Notes + metadata”, 2) “Data 2017-2022”, and 3) “Data from large reviews.” Here, missing values are encoded as “NA”. Variables (column names) are described in Tab 1.
File: “Output_Data_MS.csv”
The finished output produced by the R script. Here, missing values are encoded as “na”.
Code/software
The script attached (“Analysis_MatingSyst.R”) was written using:
- RStudio 2023.06.2+561 “Mountain Hydrangea” Release (de44a3118f7963972e24a78b7a1ad48b4be8a217, 2023-08-25) for macOS
- R version 4.3.0 (2023-04-21) “Already Tomorrow”
The following packages were used:
- readxl - Wickham H, Bryan J (2025). readxl: Read Excel Files. R package version 1.4.5, https://github.com/tidyverse/readxl, https://readxl.tidyverse.org.
- devtools - Wickham H, Hester J, Chang W, Bryan J (2022). devtools: Tools to Make Developing R Packages Easier. https://devtools.r-lib.org/, https://github.com/r-lib/devtools.
- dplyr - Wickham H, François R, Henry L, Müller K, Vaughan D (2023). dplyr: A Grammar of Data Manipulation. R package version 1.1.4, https://github.com/tidyverse/dplyr, https://dplyr.tidyverse.org.
- U.Taxonstand - Zhang, J. & Qian, H. (2023). U.Taxonstand: An R package for standardizing scientific names of plants and animals. Plant Diversity, 45(1): 1-5. DOI: 10.1016/j.pld.2022.09.001
Access information
Other publicly accessible locations of the data:
- Data is also available on my GitHub: https://github.com/elenacicada/interesting_flowers/tree/main
Changes after May 23, 2025:
File "Input_Data_MS.xlsx" updated to correct for misalignment of author column relative to the data for entries originating from Prior and Busch (2021), index numbers 1268-2039. This change does not impact downstream analysis, but should ensure references provided to the original sources for selfing information are accurate.
