Investigating historical drivers of latitudinal gradients in polyploid plant biogeography: A multi-clade perspective
Data files
May 30, 2024 version files 88.13 MB
Abstract
Premise of the Study
The proportion of polyploid plants in a community increases with latitude, and different hypotheses have been proposed about which factors drive this pattern. Here, we aim to understand the historical causes of the latitudinal polyploidy gradient using a combination of ancestral state reconstruction methods. Specifically, we assess whether (1) polyploidization enables movement to higher latitudes (i.e., polyploidization precedes occurrences in higher latitudes) or (2) higher latitudes facilitate polyploidization (i.e., occurrence in higher latitudes precedes polyploidization).
Methods
We reconstruct the ploidy states and ancestral niches of 1,032 angiosperm species at four paleoclimatic time slices ranging from 3.3 million years ago to the present, comprising taxa from four well-represented clades: Onagraceae, Primulaceae, Solanum (Solanaceae), and Pooideae (Poaceae). We use ancestral niche reconstruction models alongside a customized discrete character evolution model to allow reconstruction of states at specific time slices. Patterns of latitudinal movement are reconstructed and compared in relation to inferred ploidy shifts.
Key Results
We find that no single hypothesis applies equally well across all analyzed clades. While significant differences in median latitudinal occurrence were detected in the largest clade, Poaceae, no significant differences were detected in latitudinal movement in any clade.
Conclusions
Our preliminary study is the first to attempt to connect ploidy changes to continuous latitudinal movement, but we cannot favor one hypothesis over another. Given that patterns seem to be clade-specific, a larger number of clades must be analyzed in future studies for generalities to be drawn.
README
The proportion of polyploid plants in a community increases with latitude, and different hypotheses have been proposed about which factors drive this pattern. Here, we aim to understand the historical causes of the latitudinal polyploidy gradient using a combination of ancestral state reconstruction methods. Specifically, we assess whether (1) polyploidization enables movement to higher latitudes (i.e., polyploidization precedes occurrences in higher latitudes) or (2) higher latitudes facilitate polyploidization (i.e., occurrence in higher latitudes precedes polyploidization). We reconstruct the ploidy states and ancestral niches of 1,032 angiosperm species at four paleoclimatic time slices ranging from 3.3 million years ago to the present, comprising taxa from four well-represented clades: Onagraceae, Primulaceae, Solanum (Solanaceae), and Pooideae (Poaceae). We use ancestral niche reconstruction models alongside a customized discrete character evolution model to allow reconstruction of states at specific time slices. Patterns of latitudinal movement are reconstructed and compared in relation to inferred ploidy shifts. We find that no single hypothesis applies equally well across all analyzed clades. While significant differences in median latitudinal occurrence were detected in the largest clade, Poaceae, no significant differences were detected in latitudinal movement in any clade. Our preliminary study is the first to attempt to connect ploidy changes to continuous latitudinal movement, but we cannot favor one hypothesis over another. Given that patterns seem to be clade-specific, a larger number of clades must be analyzed in future studies for generalities to be drawn.
Contained within the
The dataset also contains six code files, meant to be run in order using inputs provided in this repository (order indicated by the beginnings of R file names, 01 through 06). 00_utility_functions.R contains functions necessary to execute the code in other files. The other code files are as follows:
- 01_assembling_ploidy_data.R: Assembles clean ploidy data from the raw files (included in this dataset)
- 02_organizing_files.R: Assembles datasets for occurrence points and ploidy for each of the four clades included in our study; also prunes phylogenies for taxa in each clade that also possess ploidy data as well as sufficient occurrence data
- 03_present_day_sprich.R: Assembles species richness raster map from occurrence points.
- 04_running_machuruku.R: Runs machuruku range reconstructions to paleoclimatic time slices based on the climatic variables that characterize the distribution of each lineage included in the input phylogeny.
- 05_running_machuruku_part2.R: Runs modified corHMM reconstructions of ploidy to each paleoclimatic time slice; also runs machuruku reconstructions for individual taxa such that the median latitude and longitude can be extracted from each individual reconstructed range.
- 06_making_plots.R: Code to create the plots included in our manuscript.
The folder “CE_out” contains 113 raw genus-level ploidy data CSV files, with one file per genus (e.g., AegilopsPloidy.csv, AgropyronPloidy.csv, etc.). Each genus falls within one of the four clades of interest in our manuscript. Ploidy inferences used in our manuscript come from the "Ploidy inference" column.
The folder "gbif_points" contains four CSV files (one for each of our four clades of interest) containing occurrence points downloaded from the Global Biodiversity Information Facility (GBIF). Each row represents a single occurrence point, with column entries for latitude and longitude.
The folder "Paleoclim" contains files needed for reconstructing climatic data at various time slices. There is one folder for each time slice, as follows: 1_cur_CHELSA_V1_2B_r10m (present-day), 2_LGM_chelsa_v1_2B_r10m (the Last Glacial Maximum c. 21 thousand years ago [ka]), 3_LIG_v1_10m (Last Interglacial c. 130 ka), 4_787ka_MIS19_v1_r10m (Marine Isotope Stage 19 c. 787 ka), 5_3.205Ma_mPWP_v1_r10m (mid-Pliocene Warm Period c. 3.205 Ma), and 6_3.3Ma_M2_v1_r10m (Marine Isotope Stage M2 c. 3.3 Ma).
The folder "ploidy" contains four CSV files (one for each of our four clades of interest) containing ploidy inferences for individual species. This is a cleaned up version of the single-genus folders contained within the folder "CE_out."
The folder "Trees" contains four .tre files (one for each of our four clades of interest) containing molecular phylogenies of each clade.
Data derived from other sources are listed below:
- Ploidy data (Rice et al. 2019; “The global biogeography of polyploid plants”)
- GBIF data GBIF.org (23 July 2021) GBIF Occurrence Download (https://doi.org/10.15468/dl.pw2qns; GBIF.org (23 July 2021) GBIF Occurrence Download (https://doi.org/10.15468/dl.yesy2v; GBIF.org (23 July 2021) GBIF Occurrence Download (https://doi.org/10.15468/dl.vqm9q3; GBIF.org (23 July 2021) GBIF Occurrence Download (https://doi.org/10.15468/dl.gqu424; GBIF.org (23 July 2021) GBIF Occurrence Download (https://doi.org/10.15468/dl.3ucjgk; GBIF.org (23 July 2021) GBIF Occurrence Download (https://doi.org/10.15468/dl.78shpr; GBIF.org (23 July 2021) GBIF Occurrence Download ([https://doi.org/10.15468/dl.f9pq57].
- Onagraceae phylogeny (Freyman and Höhna 2019: “Stochastic character mapping of state-dependent diversification reveals the tempo of evolutionary decline in self-compatible Onagraceae lineages”)
- Primulaceae phylogeny (De Vos et al. 2014; “Small and ugly? Phylogenetic analyses of the “selfing syndrome” reveal complex evolutionary fates of monomorphic primrose flowers”)
- Solanaceae phylogeny (Särkinen et al. 2013; “A phylogenetic framework for evolutionary study of the nightshades (Solanaceae): a dated 1000-tip tree”
- Poaceae phylogeny (Spriggs et al. 2014; “C4 Photosynthesis Promoted Species Diversification during the Miocene Grassland Expansion”)
- PaleoClim data from the LIG (Last Interglacial, c. 130 ka), MIS19 (Marine Isotope Stage 19, c. 787 ka), mPWP (mid-Pliocene Warm Period, c. 3.205 Ma), and M2 (Marine Isotope Stage M2, c. 3.3 Ma), all using the spatial resolution of 10 arc-minutes (Brown et al. 2018; “PaleoClim, high spatial resolution paleoclimate surfaces for global land areas”)
To run the two code files, you will need the following packages (versions used during the production of this dataset are also provided):
1. raster (3.6.26)
2. ape (5.6.2)
3. taxize (0.9.100)
4. rgbif (3.7.3)
5. maptools (1.1.4)
6. sp (1.5.0)
7. rgeos (0.5.9)
8. rworldmap (1.3.6)
9. data.table (1.14.2)
10. terra (1.7.55)
11. dismo (1.3.9)
12. usdm (1.1.18)
13. phytools (1.2.0)
14. stringr (1.5.0)
15. stringi (1.7.8)
16. machuruku (1.8.3)
17. corHMM (2.8)
18. geiger (2.0.10)
19. parallel (4.2.1)
20. MASS (7.3.58.1)
21. Peacock.test (1.0)
22. dispRity (1.7.0)
23. paleotree (3.4.5)
24. dplyr (1.1.3)
25. castor (1.7.3)
Methods
Phylogenies were collected from the following sources: Onagraceae (Freyman and Höhna 2019), Primulaceae (De Vos et al. 2014), Solanaceae (Särkinen et al. 2013), and Poaceae (Spriggs et al. 2014). The phylogenies of Solanaceae and Poaceae were pruned to retain just those species in Solanum and Pooideae respectively. Occurrence points for species in each clade were downloaded from GBIF, and were cleaned for inaccuracies following protocols similar to those in Boyko et al. (2023). Ploidy data were gathered from the supplementary data of Rice et al. (2019).
The ancestral ranges of taxa in each clade were reconstructed using the program machuruku (Guillory and Brown 2021) to time periods available from PaleoClim data (Brown et al. 2018): the Last Interglacial (LIG, c. 130 ka), Marine Isotope Stage 19 (MIS19, c. 787 ka), the mid-Pliocene Warm Period (mPWP, c. 3.205 Ma), and Marine Isotope Stage M2 (M2, c. 3.3 Ma). Ploidy was reconstructed using corHMM (Beaulieu et al. 2013; Boyko and Beaulieu 2021) with modified functions that allow for ancestral state reconstruction at specific time slices rather than at nodes.
Finally, the data gathered from these analyses were examined using a variety of statistical techniques.