Large and non-spherical seeds are less likely to form a persistent soil seed bank
Data files
May 13, 2024 version files 429.17 KB
-
newtree.tre
56.61 KB
-
README.md
4.54 KB
-
wangxj_seedbank_site.csv
5.11 KB
-
wangxj_seedbank_species_mean_order.csv
121.05 KB
-
wangxj_seedbank_species.csv
241.86 KB
Abstract
There is some evidence that seed traits can affect the long-term persistence of seeds in the soil. However, findings on this topic have differed between systems. Here, we brought together a worldwide database of seed persistence data for 1474 species to test the generality of seed mass-shape-persistence relationships. We found a significant trend for low seed persistence to be associated with larger and less spherical seeds. However, the relationship varied across different clades, growth forms and species ecological preferences. Specifically, relationships of seed mass-shape-persistence were more pronounced in Poales than in other order clades. Herbaceous species that tend to be found in sites with low soil sand content and precipitation have stronger relationships between seed shape and persistence than in sites with higher soil sand content and precipitation. For the woody plants, the relationship between persistence and seed morphology was stronger in sites with high soil sand content and low precipitation than in sites with low soil sand content and higher precipitation. Improving ability to predict the soil seed bank formation process, including burial and persistence, could benefit the utilization of seed morphology-persistence relationships in management strategies for vegetation restoration and controlling species invasion across diverse vegetation types and environments.
README: Large and non-spherical seeds are less likely to form a persistent soil seed bank
https://doi.org/10.5061/dryad.jwstqjqhm
Description of the data and file structure
The attachment contains the original data analysis and plotting R code, and also contains the original data. The code can be run directly as long as the relevant data is used in the run.
CSV File Name: wangxj_seedbank_species.csv
Description of file:
Variables:
- species: plant species name
- ID_site: collection site number
- site: collection site
- vegetation: vegetation type (Grassland、Forest、Scrubland、sand scrubland、Alpine grassland and dry grassland )
- division: phylogeny division
- class/order/family/genus: phylogeny class/order/family/genus
- growth_type: plant growth type (Herbaceous and woody species)
- persistence: seed persistence (grouped species into two classes:transient vs persistent; Transient seeds existed in the soil for less than one or two years, while persistent seeds existed in the soil for at least one or two years)
- mass: seed mass (unit: mg)
- shape: seed shape; the value of the perfectly spherical seed would be zero and the maximal value of non-spherical (elongated or disc-shaped seed) is 0.333.
CSV File Name: wangxj_seedbank_site.csv
Description of file:
Variables:
- ID_site: collection site number
- reference: collection reference
- longitude: longitude of collection site
- latitude: latitude of collection site
- elevation: elevation of collection site (unit: m)
- species_richness: seed bank species number of collection site
- Method: Experimental methods for determining seed persistence; soil seedling germination experiments in methods was divided into two types: the seed abundance of different soil depths distribution relative to aboveground species (Seedling_depth); presence in the seed bank through time after controlling for seed rain (Seedling_time).
- Vegetation: Vegetation types at the collection sites
- pH: Soil pH at the collection site
- soil_bulk_density: Soil bulk density at the collection site
- soil_gravel: Soil gravel content at the collection site (unit: %)
- soil_sand: Soil sand content at the collection site (unit: %)
- soil_silt: Soil silt content at the collection site (unit: %)
- soil_clay: Soil clay content at the collection site (unit: %)
- MAT: Annual Mean Temperature (unit: °C)
- MAP: Annual Precipitation (unit: mm)
- bio2: Mean Diurnal Range (Mean of monthly (max temp - min temp)) (unit: °C)
- bio3: Isothermality(BIO2/BIO7) (* 100)
- bio4: Temperature Seasonality(standard deviation *100)
- bio5: Max Temperature of Warmest Month (unit: °C)
- bio6: Min Temperature of Coldest Month (unit: °C)
- bio7: Temperature Annual Range(BIO5-BIO6) (unit: °C)
- bio8: Mean Temperature of Wettest Quarter (unit: °C)
- bio9: Mean Temperature of Driest Quarter (unit: °C)
- bio10: Mean Temperature of Warmest Quarter (unit: °C)
- bio11: Mean Temperature of Coldest Quarter (unit: °C)
- bio13: Precipitation of Wettest Month (unit: mm)
- bio14: Precipitation of Driest Month (unit: mm)
- bio15: Precipitation Seasonality (Coefficient of Variation)
- bio16: Precipitation of Wettest Quarter (unit: mm)
- bio17: Precipitation of Driest Quarter (unit: mm)
- bio18: Precipitation of Warmest Quarter (unit: mm)
- bio19: Precipitation of Coldest Quarter (unit: mm)
CSV File Name: wangxj_seedbank_species_mean_order.csv
Description of file:
Variables:
- species/genus/family/order: phylogeny class/order/family/genus
- mass: seed mass (unit: mg)
- shape: seed shape;the value of the perfectly spherical seed would be zero and the maximal value of non-spherical (elongated or disc-shaped seed) is 0.333.
- persistence: seed persistence (grouped species into two classes:transient vs persistent; Transient seeds existed in the soil for less than one or two years, while persistent seeds existed in the soil for at least one or two years)
- growth_type: plant growth type (Herbaceous and woody species)
Sharing/Access information
The results and figures after running the code are mainly shown in the main text and supplementary results. Link to supplementary results: https://doi.org/10.6084/m9.figshare.25434607.v3
Code/Software
n/a
Methods
(a) Worldwide data compilation
We searched the ISI Web of Science and Google Scholar for papers in English containing the words “seed shape”, “seed bank” and “seed mass or seed size”, in the title or abstract (Figure S1). Our selection criteria for research sites required that plant data come primarily from English literature and local natural ecosystems, from which single-species studies and agricultural systems were excluded (Table S1, Figure S1). This search yielded data from 23 geographical locations that contained information on both seed shape, mass, and seed bank type. The search yielded integrated seed morphology and seed persistence information for 1772 records of 1474 species from 650 genera and 112 families. Nomenclature to the species, genus, and family levels was revised using http://www.catalogueoflife.org and ‘Taxonstand’ R package.
Seed shape was calculated using the three-dimensional variance equation from Thompson et al. [1] and Bekker et al. [53]:
Seed shape
where x is the seed trait value normalized by the seed length, such as x is seed length/seed length, seed width/seed length, and seed height/seed length, is average of the three. The value of the perfectly spherical seed would be zero and the maximal value of non-spherical (elongated or disc-shaped seed) is 0.333. Inconsistent seed shape algorithms among extracted study sites were transformed. The diaspore including seed and indehiscent single-seeded fruit was broadly defined as a ‘seed’. Seed mass was defined as the average dry weight of the seeds with seed coat for each species. The final dataset included a wide range of seed mass (represents the seed weight), spanning seven orders of magnitude, from 0.001 mg (Pyrola minor) to 1081 mg (Beilschmiedia tawa). Seed shape ranged from 0.0001 (Brassica tournefortii) to 0.33 (Stipa pennata).
To unify the seed persistence classifications of the original studies, we grouped species into two classes [1,2]: transient vs persistent. Transient seeds existed in the soil for less than one or two years, while persistent seeds existed in the soil for at least one or two years (details in Table S1). Twenty study sites measured seed persistence using soil seedling emergence experiment, two sites used seed burial experiment, and one site mixed both soil seedling emergenceand seed burial experiment (see Table S1 for details). Soil seedling emergence experiments included two types: the seed abundance of different soil depths distribution relative to aboveground species (seventeen study sites); presence in the seed bank through time after controlling for seed rain (three study sites; Table S1). The seed burial experiments process involved mixing seeds with soil and then burying them in a natural environment, with the seeds' persistence assessed based on their survival rate after one year.
(b) Phylogenetic tree and growth type
The phylogenetic tree was constructed with branch lengths for the 1,474 species from the megatree based on Zanne et al. [54] and Smith & Brown [55], using the ‘phylo.maker’ function in the V.PhyloMaker R package [56].
Growth type (herbaceous vs woody) of species in the soil persistence dataset was collected from the native floristic databases of the study areas (http://rian.inta.gov.ar/, https://www.nzpcn.org.nz/, http://www.floralibrary.com/, https://www.infoflora.ch/de/, https://plantnet.rbgsyd.nsw.gov.au/, https://vicflora.rbg.vic.gov.au/, https://eol.org, https://gobotany.nativeplanttrust.org/, www.iplant.cn/frps and https://plants.usda.gov. Species of shrubs and trees were classified as woody.
(c) Climate and soil data
To explore the factors affecting the prediction of seed bank persistence in different research sites, we used the soil sand content and mean annual precipitation to represent soil and climate factors. The values of MAP were first obtained from original literature sources, and the vacant values were extracted from the WorldClim using the R package ‘raster’ v.3.3-13 [57]. Soil sand content was extracted from the Harmonized World Soil Database v 1.2 (http://www.fao.org/soils-portal/soil-survey/soil-maps-and-databases/harmonized-world-soil-database-v12/en/) using the ArcGIS (v.10.8, Esri) software. When climate and soil data were extracted, if a study site had specific latitude and longitude coordinates, the data was extracted accordingly, collecting average point data for that site. If the study site was a range area rather than a specific site, five points would be extracted evenly within the study area. The data from these five points were averaged to represent the climate and soil characteristics of that particular location.
Data analysis for R
All the analyses were conducted in the R software v. 4.3.0. To reflect the ability of the seed traits to predict seed bank persistence at the unique species level, we calculated generalized linear models (family = ‘binomial’), with seed bank persistence (transient vs. persistent) being the response variable, and seed mass and shape as the predictors. In this analysis, the species was classified as belonging to the persistent seed bank if a species had multiple records and the species had at least one record belonging to the persistent seed bank (following Gioria et al.). Average values of seed mass and shape were used in the generalized linear model, and seed mass values were log10 transformed to meet a normal distribution. We obtained the optimal probability node of the seed traits prediction of persistent seed banks at the species level using the ‘pROC’ package [59]. Trait values less than the optimal probability node implied that species with low trait values are more likely to form persistent seed banks. Similarly, the analysis of seed traits to predict the seed bank persistence was tested in different growth types (herbaceous and woody), and phylogenetic order, respectively. The accuracy probabilities of using seed traits to predict seed bank persistence were tested across multiple species and sites by the ‘pROC’ package. Analysis of variance between herbaceous and woody species for seed mass and seed shape using ‘aov’ function of the ‘stats’ package.
The inclusion of species phylogenies in the model analysis depends on whether the traits are phylogenetically conserved. The phylogenetic signal in the two continuous variables (seed mass and shape) for 1,474 unique species were estimated by Pagel's λ for 10,000 randomized simulations tests with the ‘phylosig’ function in the R package ‘phytools’ v.0.6-99. The phylogenetic signal for the binary categorical variable (seed bank persistence) was estimated using the D statistic for 10,000 randomized simulation tests with ‘phylo.d’ in the R package “caper”. We presented the distribution of traits and seed bank persistence for the 1,474 species, and relationships between seed traits and seed bank persistence in phylogenetic order clades using the R package ‘ggtree’ and ‘ggtreeExtra’. Multiple comparisons for seed mass and seed shape among phylogenetic clades were analyzed using the “LSD.test” function in the R package “agricolae”.
Phylogenetic signals (Pagel's λ) of continuous factors (seed mass and shape) were estimated for 10000 randomization simulations tests with the ‘phylosig’ function of the R package ‘phytools’ v.0.6-99. Phylogenetic signal for bivariate factors (persistence) were estimated using the ‘phylo.d’ function of the R package ‘caper’. When λ = 0, related taxa are no more similar than expected by chance, while when λ = 1, the trait is evolving following a constant variance random walk or Brownian motion model; intermediate values of λ indicate a phylogenetic correlation in trait evolution that does not fully follow a Brownian motion model.
To test the effect of seed mass and shape on seed bank persistence while taking into account the shared phylogeny, we fitted a threshold model for the 1,772 multi-record species using generalized mixed-effect models with Bayesian estimation (Markov Chain Monte Carlo generalized linear mixed models, MCMCglmms) as implemented in the R package MCMCglmm. Seed bank persistence (transient as 1 vs. persistent as 0) was the response variable. Species persistence maintains the original record even when persistence and transients exist simultaneously in different study sites. Fixed effects included (i) seed mass; (ii) seed shape; (iii) the interaction between seed mass and shape; (iv) growth type (herbaceous and woody) and its interaction with seed mass and shape, (v) two environmental variables from the study sites (mean annual precipitation and soil sand content) and their interaction with seed mass and shape. The predictors were scaled so their contribution to the effect sizes could be compared. Random effects included “sites” and “species” with a reconstructed phylogenetic tree. We used weakly informative priors, with parameter-expanded priors for the random effects. Each model was run for 500,000 MCMC steps, with an initial burn-in phase of 5,000 and a thinning interval of 500 [67], resulting, on average, in 9,000 posterior distributions, except woody mode was run for one billion MCMC steps with an initial burn-in phase of 50,000 and a thinning interval of 5,000 to increase the effective sample size. We estimated the significance of model parameters by examining 95% Credible Intervals (CIs), considering parameters with CIs overlapping with zero as non-significant. Pagels’s lambda (λ) was estimated simultaneously with the MCMCglmms by calculating the mean of the posterior distribution and the 95% CI of λ as indicated by De Villemereuil and Nakagawa.