Data from: Seed production and 22 years of climatic changes in an everwet Neotropical forest

Garwood, Nancy C.1 ; Zambrano, Milton2 ; Metz, Margaret R.3 ; Vleminckx, Jason 4

Published Oct 24, 2024 on Dryad. https://doi.org/10.5061/dryad.j9kd51ckz

Data files

Oct 24, 2024 version files 22.07 MB

Abstract

Examining the cues and drivers influencing seed production is crucial to better understand forest resilience to climate change. We explored the effects of five climatic variables on seed production over 22 years in an everwet Amazonian forest, by analysing the direct effects of these variables but also how they indirectly influence seed output through their effects on flower production. We observed a decline in seed production over the study period, which was primarily explained by direct climate effects, notably rising nighttime temperatures and decreasing average water vapor pressure deficit. Conversely, higher daytime temperatures were positively related to seed output. Rainfall effects on seed production were more nuanced, showing either positive or negative relationships depending on the seasonal timing of rains. This study contributes to a deeper understanding of how changing climatic conditions may impact tree reproductive output and thereby influence future dynamics and composition of tropical forests.

Some files are stored in Dryad, and some are as Supplementary Information in Zenodo.

Appendix S1. Taxonomic and life form information on fruiting species.

Appendix S2. Mean species-specific seed dry mass (SDM; in g) and the number of seed(s) per fruit (SPF).

Appendix S3. Description of the procedure used to define the starting day of the phenological year of a species and the imputation of missing flower and seed production values during for the 16 census gaps in 2018, 2020 and 2022.

Appendix S4. R code to reproduce figures 3 and 4, and the wavelet coherence plots.

Appendix S5. Climate and phenological data used to reproduce figures 3 and 4 based on the R code (Appendix S4). YR = Year; MON = Month; YRMON = continuous time variable combining information on the year and month; SRM = Seed Rain Mass (= SP in the article, i.e. the total annual seed production at the community level; in g); PHENOYR = phenological year (0 to 21); PHENOMON = phenological month (1 to 12); PHENOYRMON = continuous time variable combining information on the phenological year and month; IRRA = Irradiance (W/m²); RAIN = Rainfall (mm); TMIN = minimum temperature (°C); TAVE = average temperature (°C); TMAX = maximum temperature (°C); RHMIN = minimum relative humidity (%); RHAVE = average relative humidity (%); VPDAVE = average vapor pressure deficit (kPa); VPDMAX = daytime vapor pressure deficit (kPa).

Appendix S6. Flower observation data. DATE = census date (month/day/year); CENSUS = number of the census; TRAP = number of the trap; CODIGO = species acronym (see Appendix S1 for correspondence); CODE8 = original species acronym (for reference, only; not used); PART = type of reproductive part (0 = buds, not used in analyses | 6 = female or hermaphroditic flowers | 9 = male flowers); QUANTITY = number of reproductive parts (approximate number on a log scale: 0 (1-9), 1 (10-99), 2 (100-999), 3 (1000-9999)); YR = Year; MON = Month; DY = continuous variable combining information on the year and the day of the year; guild = life form of the species (CW = woody climber; CS = shrub-like woody climber; CU = understory woody climber-liana; CH = herbaceous climber-vine; C? = climber with unknown habit; EH = herbaceous epiphyte; EW = woody epiphyte; EP = parasitic epiphyte; HH = herbaceous hemi-epiphyte; HV = vine-like herbaceous hemi-epiphyte; HW = woody hemi-epiphyte; HT and HL = large woody hemi-epiphyte, reaching the upper canopy; XT = terrestrial herb; SL = large shrub (≥ 1 cm dbh); SS = small shrub (< 1 cm dbh); S? = shrub of unknown size; TC = canopy tree; TE = emergent tree; TM = medium-size tree; TU = understory tree; T? = tree of unknown size); gen = genus; fam = family.

Appendix S7. Fruit observation data. DATE = census date (month/day/year); CENSUS = number of the census; TRAP = number of the trap; CODIGO = species acronym (see Appendix S1 for correspondence); CODE8 = original species acronym (for reference, only; not used); PART = type of reproductive part (0 = buds, not used in analyses | 1 = mature fruits | 2 = seeds | 3 = fruit capsules | 4 = fruit fragments | 5 = immature fruits | 7 = damaged seeds | 8 = aborted fruits | 10 = fruits eaten by animals, with good seeds | 11 = damaged fruits or seeds); QUANTITY = number of reproductive parts (absolute number); YR = Year; MON = Month; DY = continuous variable combining information on the year and the day of the year; guild = life form of the species (CW = woody climber; CS = shrub-like woody climber; CU = understory woody climber-liana; CH = herbaceous climber-vine; C? = climber with unknown habit; EH = herbaceous epiphyte; EW = woody epiphyte; EP = parasitic epiphyte; HH = herbaceous hemi-epiphyte; HV = vine-like herbaceous hemi-epiphyte; HW = woody hemi-epiphyte; HT and HL = large woody hemi-epiphyte, reaching the upper canopy; XT = terrestrial herb; SL = large shrub (≥ 1 cm dbh); SS = small shrub (< 1 cm dbh); S? = shrub of unknown size; TC = canopy tree; TE = emergent tree; TM = medium-size tree; TU = understory tree; T? = tree of unknown size); gen = genus; fam = family.

Appendix S8. Wavelet coherence analyses of the relationship between census-level seed production and each climate variable across time and frequency domains.

Fig. S1. Percentage of missing daily values for each climate variable over the 2000-2021 period.

Fig. S2. Percentage of missing values for each day of the month and each climate variable.

Fig. S3. Percentage of days with 0 to 11 missing values across climate variables, calculated over the 2000-2021 period.

Fig. S4. Percentage of days with two missing values for each pair of climate variables.

Fig. S5. Map showing the geographical location of the Yasuní Forest Dynamic Plot (YFDP) and the Nuevo Rocafuerte weather station from which we obtained daily data for three climate variables.

Fig. S6. Pearson correlations among observed daily values of normalized and standardized (z-score) variables, before imputation.

Fig. S7. Post-Hoc analysis evaluating the reliability of the BHPMF imputations using the average RMSE (Root Mean Square Deviation).

Fig. S8. Post-Hoc analyses evaluating the reliability of the BHPMF imputations by comparing correlation values among climate variables before and after imputation.

Fig. S9. Observed and imputed mean monthly values of each climate variable across the 2000-2021 study period at Yasuní.

Fig. S10. Results of the regression models examining the effects (total, direct, flower-mediated) of the average relative humidity on seed production.

Fig. S11. Results of the regression models examining the effects (total, direct, flower-mediated) of daytime vapor pressure deficit on seed production.

Fig. S12. Comparison of the R² and slope coefficient values among the best OLS models quantifying the effect of each climate variable on seed production.

Fig. S13. Comparison of regression lines and values of the seed production plotted against the average relative humidity and against the average vapor pressure deficit, during two different periods (2000-2021 and 2008-2017).

Fig. S14. Model checking from ARIMA imputation.

Fig. S15. Observed (red) and imputed (blue) seed production across the study period (264 months).

Fig. S16. Wavelet coherence scores between seed production and each climate variable, across time and frequency domains.

Study site

The Yasuní Research Station (ECY, 0°41'S, 76°24'W) is affiliated with the Pontificia Universidad Católica del Ecuador and is located in Yasuní National Park, in Ecuador. The park is situated in one of the Amazon basin's wettest areas (Funatsu et al. 2021). From 2000 to 2022, the park received annual rainfall averaging 3165 mm, with monthly averages ranging from 194 mm to 392 mm. The highest rainfall occurs in May-June, with a secondary peak in October–November (Pitman 2000; Valencia et al. 2004a). Given this rainfall regime, the park's climate is described as aseasonal (Walter et al. 1975), or everwet (McGregor & Niewoldt 1998) because rainfall nearly always exceeds 100 mm month^-1. During the same period (2000-2022), the recorded monthly minimum, mean, and maximum temperatures were within the ranges of 20.9 to 22.0°C, 24.1 to 25.6°C, and 29.5 to 32.1°C, respectively.

The landscape is characterized by evergreen, terra firme moist forests at an elevation of approximately 200 meters above sea level, across terrain marked by moderate undulations and slopes (Berdugo et al. 2022). Forest canopy height ranges from 15 to 30 meters, with emergent trees up to 50 meters tall (Valencia et al. 2004a). The soils of the region predominantly consist of Andean erosional materials (Malo & Arguello 1984) with sediment deposits from lake formations and Miocene marine incursions (Hoorn et al. 2010). This geological history has fostered speciation and led to the biological richness of the Andean Amazon (Baraloto et al. 2021).

Phenological and climatic data were gathered from the 50-ha Yasuní Forest Dynamics Plot (YFDP), initiated in 1995 for the purpose of mapping, identifying, measuring, and tagging all trees with a diameter ≥1 cm diameter at 1.3 meters above the ground (Valencia et al. 2004a,b). This plot contains an exceptionally diverse flora, with 1104 species of trees and shrubs recorded in a 25-hectare area during the first survey (Valencia et al. 2004b). Notably, it includes 40 species of Miconia (Melastomataceae), 40 species of Inga (Fabaceae), and 16 species of Myristicaceae (Valencia et al. 2004b; Queenborough et al. 2007).

Climate data

Hourly measurements of solar irradiance (W m^-2), rainfall (mm), and temperature (°C) were collected at ECY from May 2000 through February 2012. Solar irradiance was measured using two LI-COR LI-200S pyranometers (tuned to the daylight spectrum of 400 to 1100 nm). Air temperature was measured using an LI-1400-102. Rainfall was measured using an LI-1400-106 tipping bucket and an LI-1400 data logger (all instruments from LI-COR Inc., Lincoln, NE, USA) (Garwood et al. 2023). Beginning in January 2008, relative humidity was monitored using a LI-COR 1400-04 sensor. In 2012, the monitoring system was upgraded to a CR1000 data logger (Campbell Scientific, Logan, UT, USA), along with two Vaisala HMP45C thermometers, a Rotronic HC2-S3 hygrometer, a Hydrological Services TB4 rain gauge, and two LI-COR LI-200X pyranometers (also calibrated for the daylight spectrum of 400–1100 nm). This upgraded equipment provided irradiance and temperature readings every 5 minutes from 2012 to August 2021, while a manual gauge collected daily rainfall data throughout and until October 2021. Challenges with equipment maintenance generated gaps in meteorological data. The percentage of missing daily data from 2000 to 2021 was 23% for rainfall, 33% for temperature, 37% for irradiance, and as much as 68% for relative humidity (Fig. S1, S2). No meteorological data were available for 10.5% of the days, and all meteorological variables were available for 22% of the days (Fig. S3, S4).

We imputed missing climate data with Bayesian hierarchical probabilistic matrix factorisation (BHPMF, Schrodt et al. 2015). Each climate variable was first standardized (z-score transformation) then normalized (Box-Cox transformation). BHPMF is a machine-learning algorithm that exploits hierarchical information from structured data and uses its correlation structure to impute missing entries (Schrodt et al. 2015). We used days and months to structure the data imputation. Imputations were enhanced by integrating daily records of maximum and minimum temperature, along with average relative humidity, obtained from the Nuevo Rocafuerte meteorological station situated approximately 115 km east of the YFDP and at a similar elevation (Fig. S5). This information was sourced from the Visual Crossing weather data platform (www.visualcrossing.com). The selection of these three variables from the Nuevo Rocafuerte station was based on their relatively strong positive correlations (Pearson r > 0.6) with the same variables measured at YFDP. Post hoc analyses supported the effectiveness of our imputation estimates (average of 50,000 imputed values): the average standard deviation of imputed values never exceeded 0.98 on days with missing climate data, while we found strong consistency in variable correlations pre- and post-imputation (Fig. S6-S8). A visual comparison of imputed and observed values across time is shown in the supplementary information (Fig. S9).

Considering our predictions concerning the impact of nocturnal vs. diurnal temperatures on seed production, we analysed minimum daily temperature (hereafter, T_MIN) and maximum daily temperature (T_MAX) independently. We used average and minimum relative humidity (denoted as RH_AVEand RH_MIN, respectively) and average temperature (T_AVE) and T_MAX, to calculate, respectively, the average and daytime water Vapor Pressure Deficit (VPD_AVE and VPD_DAY). Observed maximum relative humidity (RH_MAX) values, reflecting nighttime air moisture saturation, were close to 100% and showed a highly skewed distribution. This, combined with the large proportion of missing values (68%), led the imputations to produce unrealistic values (i.e., higher than 100%) for nearly 26% of imputed RH_MAX values (not shown), which was not the case with RH_AVE and RH_MIN. Rodwell et al. (2014) pointed out similar technical difficulties when making imputations for variables approaching their theoretical maximums. Additionally, stomata being closed at night, we do not have a prediction regarding how nighttime vapor pressure deficit would influence reproduction. Thus, we chose not to examine the effect of VPD at night on seed outputs.

VPD_AVE and VPD_DAY were calculated in three steps. First, we calculated the average and daytime Vapor Pressure at Saturation (VPS_AVE and VPS_DAY, respectively), based on the Tetens equation (Tetens 1930):

VPS_AVE = 0.6108*e^{(17.27*TAVE / (TAVE + 237.3))} (Eq. 1)

VPS_DAY = 0.6108*e^{(17.27*TMAX / (TMAX + 237.3))} (Eq. 2)

Where units are degrees Celsius for T_AVE and T_MAX and kPa for VPS. We then calculated the average and daytime Observed Vapor Pressure (OVP_AVE and OVP_DAY, respectively), in kPa, as followed:

OVP_AVE = VPS_AVE*(RH_AVE/100) (Eq. 3)

OVP_DAY = VPS_DAY*(RH_MIN/100) (Eq. 4)

Finally, VPD_AVE and VPD_DAY corresponded to the following differences:

VPD_AVE = VPS_AVE – OVP_AVE (Eq. 5)

VPD_DAY= VPS_DAY – OVP_DAY (Eq. 6)

Seed production data

Following the protocol of Wright & Calderon (1995), 200 permanent traps, each measuring 0.75 × 0.75 m (0.57 m²) and constructed from 1-mm fiberglass wire mesh, were installed across the YFDP at a height of 0.75 m (Garwood et al., 2023). Plant reproductive parts were collected from each trap twice a month from February 2000 until March 2023 (totalling 541 completed censuses). Another 16 censuses did not occur due to political disruptions and COVID-19 (two in 2018, 12 in 2020, and two in 2022). Seeds and fruits were counted and identified to species whenever possible. If species identification was impossible, the material was collected, a unique morphospecies was assigned, and attempts were made to identify the sample against local reproductive adults and a permanent voucher collection (Valencia et al., 2004a; Garwood et al., 2023). Based on Wright & Calderon (2006), our analysis focused on fruiting species detected in a minimum of 10 of the 200 traps in at least one year. This approach yielded a dataset of 203 species, spanning 125 genera and 52 families. The most highly represented families included Fabaceae (19 species), Bignoniaceae (13 species), Moraceae (11 species), and Clusiaceae (ten species). The 203 species included 69 species of emergent or canopy trees, 83 climbers, 42 understory trees or shrubs, and nine epiphytes. Appendix S1 lists the 203 species, along with their family and life form.

The total seed mass collected within each trap was calculated by multiplying the count of whole fruits and fruit fragments by their species-specific seed dry mass and the average seed count per fruit, based on a database of measurements of these two traits for species found in YFDP (Appendix S2). For individual seed items (complete seeds or fragments), we multiplied their counts by species-specific seed dry mass. These calculations were aggregated across all traps to provide an estimate of the total seed mass for each census period.

Data analysis

To calculate an estimate of total annual seed production at the community level (hereafter, SP), we first standardised census-level seed mass values for each species separately, by dividing these values by the sum of seed mass values over all censuses. Standardisation was done separately for each species so that all species were on the same zero to one scale. For each census, we summed the standardised values across species to produce census-specific, community-level seed production values. Subsequently, we identified the commencement of the community-level fruiting phenological year, a crucial step since relying on calendar years might split the annual peak of community-level seed production across two years. To do so, we identified the day of the year that minimizes the variance of linearized dates weighted by census-specific seed production values, following the methodology described in Vleminckx et al. (2023), also explained in detail in Appendix S3). SP was then calculated as the sum of community-level seed production values across all censuses for each phenological year. Seed production values for the 16 missing censuses were imputed using automatic ARIMA fitting model based on model selection with AIC (Appendix S3). A Ljung-Box test (Ljung and Box 1978) showed a p-value reaching 0.84, indicating that the residuals were not significantly correlated and that the model was well-fitted.

We used ordinary least squares (OLS) linear models to evaluate the relationships between SP and the five climate variables (solar irradiance, rainfall, minimum and maximum temperature, and average relative humidity). For each climate variable (C), we calculated three OLS slope coefficients (Fig. 2), quantifying: (1) the total effect of C on SP; (2) the partial direct effect of C when adding flower production in the model; and (3) the effect of C on SP mediated through flower production (Fig. 2a). To facilitate interpretation, we refer to the second and third coefficients as the “direct” and “flower-mediated effect”, respectively. The flower-mediated effect was quantified with the total flower production observed during each entire flowering phenological year, using the flower production data from Vleminckx et al. (2023).

To examine the period of the fruiting phenological year when the mean condition of each climate variable (C) best explains interannual variation in SP, each of the three regression slope coefficients were calculated using different seasonal time frames for C (Fig. 2b). More specifically, C was calculated as the mean monthly value of a climate variable within all 222 possible ranges of one to twelve consecutive months over a 24-month window, starting 12 months before and ending with the current fruiting phenological year. Thus, both lagged and concurrent effects of C on SP are evaluated. Likewise, to calculate the flower-mediated effect, the 24-month window started 12 months before and ended with the current flowering phenological year.

The statistical significance of each regression slope coefficient was tested by comparing its values against a distribution of 4999 null values generated by Moran spectral randomisations (MSR, Wagner & Dray, 2015). MSR is a spatially or temporally constrained randomisation method that reproduces randomised values displaying the same autocorrelated structures as the original values, which can then be used to account for type I error rate inflation risk when testing the association between two spatially or temporally autocorrelated variables (Bauman et al. 2019).

Following our predictions, we expect the slope coefficients of each of the three effects (total, direct, flower-mediated) to be negative for T_MIN, and positive for irradiance, T_MAX and VPD, with no clear prediction for rainfall. Consequently, we used one-tailed tests for the first four climate variables and a two-tailed test for rainfall and a statistical significance threshold of 5% for all tests. Slope values were then considered significantly negative for T_MIN (or positive for irradiance, T_MAX, and VPD) if they were negative and less than (or positive and greater than) 95% of null coefficient values. Slope coefficients calculated for rainfall were considered significantly negative or positive when taking values below or above the 2.5^th and 97.5^th percentiles of null values, respectively.

All analyses were performed in the R statistical environment v.4.4.0 (R Core Team. 2024). We provide the climate and phenological data, and R code to reproduce our analyses (including citations for R packages used), in Appendices S4-S7.

Data from: Seed production and 22 years of climatic changes in an everwet Neotropical forest

Data files

Abstract

README: Metadata describing the supporting information of the article “Seed production and 22 years of climatic changes in an everwet Neotropical forest” (ELE-00319-2024).

Methods

Works referencing this dataset