Data from: Macroevolution of floral scent chemistry across radiations of male euglossine bee-pollinated plants
Data files
Oct 24, 2023 version files 943.87 KB
-
README.md
-
Scripts_and_data.zip
Abstract
Floral volatiles play key roles as signaling agents that mediate interactions between plants and animals. Despite their importance, few studies have investigated broad patterns of volatile variation across groups of plants that share pollinators, particularly in a phylogenetic context. The “perfume flowers”, Neotropical plant species exhibiting exclusive pollination by male euglossine bees in search of chemical rewards, present an intriguing system to investigate these patterns due to the unique function of their chemical phenotypes as both signaling agents and rewards. We leverage recently-developed phylogenies and knowledge of biosynthesis along with decades of chemical ecology research to characterize axes of variation in the chemistry of perfume flowers, as well as understand their evolution at finer taxonomic scales. We detect pervasive chemical convergence, with many species across families exhibiting similar volatile phenotypes. Scent profiles of most species are dominated by compounds of either the phenylpropanoid or terpenoid biosynthesis pathways, while terpenoid compounds drive more subtle axes of variation. We find recapitulation of these patterns within two independent radiations of perfume flower orchids, in which we further detect evidence for rapid evolution of divergent floral chemistries, consistent with the putative importance of scent in the process of adaptation and speciation.
README
Perfume flower dataset
Scent data from male euglossine bee-pollinated plants, in addition to pollinator information The zipped folder contains 5 folders. Apart from the "Raw_data" folder, each folder contains all the data and source code required to run the primary R script.
Analyses were run in R version 4.2.1 with the following packages:
dplyr v. 1.1.2
tidyr v. 1.3.0
ggplot2 v. 3.4.2
compositions v. 2.0.6
ggpubr v. 0.6.0
pez v. 1.2.4
dendextend v. 1.17.1
ecodist v 2.0.9
ape v. 5.7.1
geomorph v. 4.0.5
phytools v. 1.5.1
geiger v. 2.0.11
phylogram v. 2.1.0
picante v. 1.8.2
chemodiv v. 0.2.0
webchem v. 1.2.0
ChemmineR v. 3.48.0
fmcsR v. 1.38.0
fBasics v. 4022.94
ade4 v. 1.7.22
Cells with NA refer to cases where formulae used to calculate values are unable to do so due to some inherent property of the data (e.g. when only 1 compound is present, diversity metrics are not possible to calculate).
These values are removed from analyses when diversity metrics are used as predictors.
Description of scripts and datasets within each folder
Raw_data: raw data used for the analyses prior to filtering
- Plant_pollinator.csv - csv of plant species with pollinator info and scent data; each row corresponds to a plant species - pollinator combination (so if a given plant has multiple pollinators, it occupies as many rows as its total number of pollinators).
- "plantspecies" corresponds to plant species in a plant-pollinator combination
- "plantfamily" corresponds to the family the plant is in
- "plantgenus" corresponds to the genus the plant is in
- "beespecies" corresponds to a bee pollinator of the plant
- "beegenus" corresponds to the bee genus ("Eg" = Euglossa, "Ag" = Aglae, "Ex" = Exaerete, "El" = Eulaema, "Ef" = Eufriesea")
- all other columns correspond to compounds present in the plant's perfume, with values ranging from 0 to 100 corresponding to the percentage of the compound in the total blend
- plant_VOC_averages_all.csv - csv of all plant species with scent data; each row corresponds to a species while columns correspond to compounds; values range from 0 to 100, corresponding to the percentage of the compounds in the blend
Chemodiv_analyses: Data and script to generate the chemodiv distancematrices used for downstream analyses
Files in this folder include:
- chemodivscript.R - Primary R script to generate chemical distance matrices used in downstream analyses in other subfolders.
- cleaneddataset_with_features_FINAL.csv - csv of plant species with their chemical traits after filtering out spp with less than 70% of their total scent profile resolved and standardizing such that values correspond to proportions, in addition to some other features for exploratory plotting
- "Family" corresponds to the family the plant is in
- "SuborFam" correspond to the subfamily the plant is in, if it is an orchid, or the family if not
- "richness" corresponds to the total number of compounds
- "aroprop" corresponds to the proportion of the perfume comprised of aromatic compounds
- "monoprop" corresponds to the proportion of the perfume comprised of monoterpenoid compounds
- "terpprop" corresponds to the proportion of the perfume comprised of all terpenoid compounds
- "sesprop" corresponds to the proportion of the perfume comprised of sesquiterpenoid compounds
- "faprop" corresponds to the proportion of the perfume comprised of fatty acid derivative compounds
- "carprop" corresponds to the proportion of the perfume comprised of carotenoid compounds
- "broadclass" corresponds to the broad chemical class of the plant ("T" = terpenoid-dominated, "AT" = mix of aromatic and terpenoid, "A" = aromatic-dominated, "F" = fatty acid derivative-dominated)
- "most_abundant" corresponds to the most abundant compound in the blend
- "ses_jac" corresponds to standard effect size calculated using jaccard distances
- "ses_bray" corresponds to standard effect size calculated using Bray-curtis distances
- R Compound Properties - csv of compounds present in the dataset with their InChiKey and SMILES code for chemodiv
Analyses_with_full_dataset: data and scripts to perform analysesacross all species
Files in this folder include:
- Broad_patterns_Final.R - Primary R script used to generate biosynthetic distances using the method of Junker 2018, Chemoecology, generate ordinations, and perform correlative analyses of axes of variation with specific chemical traits.
- SI1_sourceFunctions_BioSynDist_copy.R - Source code from Junker 2018, Chemoecology, to generate biosynthetic distances among species
- Coded.csv - csv containing taxonomic information
- "Family" corresponds to the family the plant is in
- "SuborFam" correspond to the subfamily the plant is in, if it is an orchid, or the family if not
- compoundxproperty_sorted_filtered - csv where rows correspond to compounds present in dataset and columns correspond to biosynthetic pathways and functional group information. Entries are coded 1 or 0 based on presence / absence
- master_VOC_averages_no_poll_cleaned.csv - csv where rows correspond to species present in dataset while columns correspond to different compounds present after data filtering, and values correspond to percentage (0 to 100)
- diversity_metrics_chemodiv.csv - csv of chemical diversity metrics generated from the chemodivscript
- finger_funchill corresponds to functional Hill diversity of compounds calculated using the "fingerprint" scheme
- finger_hill corresponds to Hill diversity of compounds calculated using the "fingerprint" scheme
- fmcs_funchill corresponds to functional Hill diversity of compounds calculated using the "fMCS" scheme
- fmcs_hill corresponds to Hill diversity of compounds calculated using the "fMCS" scheme
- fingerprintdisdata_full.csv - csv of distance matrix from "fingerprints" scheme in chemodiv
- fmcsdisdata_full.csv - csv of distance matrix from "fMCS" scheme in chemodiv
Pollinator_analyses: data and scripts for performing analysesinvolving pollinators
Files in this folder include:
- pollinator_analyses_FINAL.R - Primary R script to perform correlative tests between chemical traits and pollinators using chemical distance matrices generated earlier.
- dataset_with_everything.csv - csv containing filtered chemical traits, PCO scores, and chemodiv metrices generated from "Analyses_with_full_dataset"
- master_VOC_regions_20220224_edited - csv where rows correspond to plant species and columns correspond to pollinator species.
- "plantspecies" corresponds to plant species in a plant-pollinator combination
- "plantfamily" corresponds to the family the plant is in
- "plantgenus" corresponds to the genus the plant is in
- "beespecies" corresponds to a bee pollinator of the plant
- "beegenus" corresponds to the bee genus ("Eg" = Euglossa, "Ag" = Aglae, "Ex" = Exaerete, "El" = Eulaema, "Ef" = Eufriesea")
- all other columns correspond to compounds present in the plant's perfume, with values ranging from 0 to 100 corresponding to the percentage of the compound in the total blend
- complete_AE_Plant.tre - most general phylogeny of plants within the dataset
- euglossine_tree.tree - euglossine bee phylogeny (Ramirez et al. 2011).
- fingerprintdisdata_full.csv - csv of distance matrix from "fingerprints" scheme in chemodiv
- fmcsdisdata_full.csv - csv of distance matrix from "fMCS" scheme in chemodiv
- myscheme_full.csv - csv of distance matrix from "simple" scheme using Junker 2018 method produced in "Analyses_with_full_datset"
Phylogenetic_analyses: data and scripts for performing phylogeneticcomparative methods
Files in this folder include:
- Phylogenetic_Comparative_FINAL.R - Primary R script for calculating phylogenetic signal and disparity through time in two separate clades.
- Adams_function.r - R source code to calculate phylogenetic signal
- complete_AE_Plant.tre - most general phylogeny of plants within the dataset
- euglossine_tree.tree - euglossine bee phylogeny (Ramirez et al. 2011).
- cattreeforanalyses.tre - complete_AE_Plant.tre pruned to just the Catasetinae
- stantreeforanalyses.tre - complete_AE_Plant.tre pruned to just the Stanhopeinae
- bigtreechemodataset_new.csv - csv containing scent data for all species in the dataset with phylogenetic information. Rows correspond to species while the first 167 columns correspond to compounds, with values representing relative proportion of that compound in the species' blend. Other columns:
- "Family" corresponds to the family the plant is in
- "SuborFam" correspond to the subfamily the plant is in, if it is an orchid, or the family if not
- "richness" corresponds to the total number of compounds
- "aroprop" corresponds to the proportion of the perfume comprised of aromatic compounds
- "monoprop" corresponds to the proportion of the perfume comprised of monoterpenoid compounds
- "terpprop" corresponds to the proportion of the perfume comprised of all terpenoid compounds
- "sesprop" corresponds to the proportion of the perfume comprised of sesquiterpenoid compounds
- "faprop" corresponds to the proportion of the perfume comprised of fatty acid derivative compounds
- "carprop" corresponds to the proportion of the perfume comprised of carotenoid compounds
- "linearmono" corresponds to the proportion of the perfume comprised of linear monoterpenoid compounds
- "ringedmono" corresponds to the proportion of the perfume comprised of ringed monoterpenoid compounds
- "cineolecasette" corresponds to the proportion of the perfume comprised of cineole casette compounds
- "carvones" corresponds to the proportion of the perfume comprised of carvones compounds
- "broadclass" corresponds to the broad chemical class of the plant ("T" = terpenoid-dominated, "AT" = mix of aromatic and terpenoid, "A" = aromatic-dominated, "F" = fatty acid derivative-dominated)
- "most_abundant" corresponds to the most abundant compound in the blend
- "ses_jac" corresponds to standard effect size calculated using jaccard distances
- "ses_bray" corresponds to standard effect size calculated using Bray-curtis distances
- finger_funchill corresponds to functional Hill diversity of compounds calculated using the "fingerprint" scheme
- finger_hill corresponds to Hill diversity of compounds calculated using the "fingerprint" scheme
- fmcs_funchill corresponds to functional Hill diversity of compounds calculated using the "fMCS" scheme
- fmcs_hill corresponds to Hill diversity of compounds calculated using the "fMCS" scheme
- pcoa1_full corresponds to values of PCo 1 for the species calculated using the "simple" scheme
- pcoa2_full corresponds to values of PCo 2 for the species calculated using the "simple" scheme
- pcoa3_full corresponds to values of PCo 3 for the species calculated using the "simple" scheme
- pcoa4_full corresponds to values of PCo 4 for the species calculated using the "simple" scheme
- pcoa1_finger corresponds to values of PCo 1 for the species calculated using the "fingerprint" scheme
- pcoa2_finger corresponds to values of PCo 2 for the species calculated using the "fingerprint" scheme
- pcoa3_finger corresponds to values of PCo 3 for the species calculated using the "fingerprint" scheme
- pcoa4_finger corresponds to values of PCo 4 for the species calculated using the "fingerprint" scheme
- pcoa1_fmcs corresponds to values of PCo 1 for the species calculated using the "fMCS" scheme
- pcoa2_fmcs corresponds to values of PCo 2 for the species calculated using the "fMCS" scheme
- pcoa3_fmcs corresponds to values of PCo 3 for the species calculated using the "fMCS" scheme
- pcoa4_fmcs corresponds to values of PCo 4 for the species calculated using the "fMCS" scheme
- Catchemodataset_new.csv - csv containing filtered scent data for the Catasetinae. See bigtreechemodataset description for column meanings
- Stanchemodataset_new.csv - csv containing filtered scent data for the Stanhopeinae. See bigtreechemodataset description for column meanings
Methods
We built a database of floral perfume chemical composition, as well as pollinator identity, for any angiosperm pollinated by perfume-collecting male euglossine bees. Both published and unpublished data from the literature and from collaborators were used to build this database. For the published data, a literature search in ISI Web of Science and Scopus using the following search terms was conducted: (“scent plant” OR “perfume plant” OR “VOC” OR “volatile” OR “scent reward* plant” OR “perfume reward* flower” OR “scent reward* flower” OR “perfume reward* flower) AND (eugloss* OR “orchid bee”). In addition to this search, we screened the reference list of all obtained articles to check for works not obtained from the literature search. Only studies using headspace analyses were used. Following compilation of perfume data in a single database, we searched for the CAS number of individual floral perfume compounds in the online database (https://webbook.nist.gov/chemistry/cas-ser.html). Based on the number, we checked for possible synonyms. Compounds included more than once were then merged.
Following data curation, we excluded compounds present in relative proportions below 1% of each species’ perfume to avoid biased sampling of rare compounds with more sensitive technology in recent years. This also allowed us to compare studies that elected to characterize compounds below 1% as “trace” without providing further quantitative information. Species with less than 70% of their total perfume blends resolved were then excluded from the dataset. The resulting chemical matrix was then re-standardized such that the sum of relative proportions within each species was 1.