Centennial recovery of recent human-disturbed forests
Data files
Jul 30, 2024 version files 757.40 KB
-
code_forest_recovery_metaanalysis.R
38.99 KB
-
dataset_forest_recovery_metaanalysis.csv
714.98 KB
-
README.md
3.43 KB
Oct 27, 2025 version files 1.42 MB
-
02_analysis_resub_clean.R
203.16 KB
-
dataset_forest_recovery_metaanalysis_posneg.csv
605.87 KB
-
dataset_forest_recovery_metaanalysis.csv
604.09 KB
-
README.md
3.74 KB
May 20, 2026 version files 1.68 MB
-
02_analysis_resub_clean.R
210.01 KB
-
03_exploratory_results.R
18.25 KB
-
dataset_forest_recovery_metaanalysis_posneg.csv
714.48 KB
-
dataset_forest_recovery_metaanalysis_similarity.csv
15.31 KB
-
dataset_forest_recovery_metaanalysis.csv
713.35 KB
-
README.md
6 KB
Abstract
International initiatives to restore degraded forests would benefit from assessments of recovery timescales and trajectories of forest attributes to inform restoration strategies. We combine 125 chronosequences mostly from naturally regenerating forests to reconstruct past and model future trajectories of forests recovering from agriculture and logging impacts. While metrics like species diversity or carbon cycling showed relevant levels of recovery, others differed from undisturbed ones after at least 150 years. Nitrogen stocks or species similarity were projected to differ from references for 218 (38-745) or 494 (92-2,039) years, respectively. These conservative recovery metrics, however, fail to capture the complexity of forests, suggesting longer recovery timescales. Global restoration initiatives should engage in planning for a restored world incorporating ecologically meaningful (>100 years) implementation timescales and monitoring frameworks.
https://doi.org/10.5061/dryad.rv15dv4h8
This dataset includes the information compiled in a meta-analysis about global long-term recovery trajectories of forest ecosystems, in terms of their biodiversity (i.e., organism abundance, species diversity, and Morisita-Horn species similarity) and biogeochemical functions (i.e., cycling of carbon, nitrogen stock, and phosphorus stock); and the response ratios computed to estimate the recovery completeness of each of these metrics.
Description of the data and file structure
Data are provided in three .csv data files:
- dataset_forest_recovery_metaanalysis.csv: dataset needed to run all the previous and new analysis of this study, updated after the review process.
- dataset_forest_recovery_metaanalysis_posneg.csv: dataset needed to run a sensitivity analysis to separately model the recovery of trajectories that had starting points over 100% recovery completeness from those with starting points below 100%.
Both files have the same column names, which are defined below:
Columns related to each selected article in the meta-analysis:
code: code given to each selected article in the meta-analysis
citation: an abbreviated version of each selected article's citation
Columns related to each forest chronosequence:
chrono: number to identify each chronosequence in each article (this value is typically "1", as there was usually only one chronosequence per article)
latitude: latitude (decimal degrees)
longitude: longitude (decimal degrees)
climate: main group in Köppen climate classification
koppen: complete Köppen climate classification category
anualp: mean annual precipitation (mm)
meant: mean annual temperature (°C)
arid: Lang aridity index
disturb_category: main anthropogenic disturbance ocurring prior to forest recovery
disturb_specific: specific anthropogenic disturbance ocurring prior to forest recovery
restoration_activity: restoration strategy followed (i.e., natural regeneration vs. active restoration)
reference: “yes” if the reference value was from nearby old-growth forests or the same forest ecosystem in the pre-disturbance state and “no” if a secondary long-term recovery stage (last available data-point beyond 95 years on the trajectory) was used as a reference.
no_reference_sites: number of study sites in each reference forest
chrono_duration: time since recovery started
study_area: size of the study area
Columns related to each recovery trajectory inside each chronosequence:
no_points: number of data-points along each trajectory, defined as the value of the ecosystem metric at different times since the end of the disturbance
no_plots: number of plots sampled per trajectory
n: number of subplots (i.e., replicates within forest plots) measured in each trajectory. It was used to estimate the study precision.
subplot_size: subplot area (m2). It was used to estimate the study precision.
response_variable: the specific variable measured along a recovery trajectory (as it was literally called in the primary article)
n_variable: code used to differentiate each response variable from the same chronosequence
metric_type: organism abundance, species diversity, Morisita-Horn species similarity, cycling of carbon, nitrogen stock and phosphorus stock:
life_form: the type of life form characterized with each response variable related to organism abundance, species diversity and Morisita-Horn species similarity.
Columns related to each data-point inside each recovery trajectory:
age: time since forest recovery started (years)
y: the value of the response ratio, computed as ln (Xres/Xref), where Xres is the value of each response variable at each age and Xref is the reference value of the response variable in the reference forest.
age1_log: natural logarithm [time since forest recovery started + 1] (years)
age_sqrt: square root [time since forest recovery started] (years)
-
dataset_forest_recovery_metaanalysis_similarity.csv": dataset needed to run a Pearson correlation test among the three similarity indices calculated (i.e., Morisita-Horn, Bray-Curtis and Jaccard). Its columns are defined below:
citation: an abbreviated version of each selected article's citation
study: type of organism (defined as “life form”) studied to compute each similarity index in each data-point of each trajectory
age: time since forest recovery started (years)
Bray-Curtis: the value of the response ratio, computed as ln(Simres/Simref ) where Simres is the value of Bray-Curtis similarity index at a certain recovery time and Simref is the average value of Bray-Curtis similarity index of all reference data-points in that chronosequence.
Jaccard: the value of the response ratio, computed as ln(Simres/Simref ), where Simres is the value of Jaccard similarity index at a certain recovery time and Simref is the average value of Jaccard similarity index of all reference data-points in that chronosequence.
Morisita-Horn: the value of the response ratio, computed as ln(Simres/Simref ), where Simres is the value of Morisita-Horn similarity index at a certain recovery time and Simref is the average value of Morisita-Horn similarity index of all reference data-points in that chronosequence.
Code/Software
- 02_analysis_resub_clean.R: This R code enables to fit the meta-analytic models and to predict the time forests need to recover their biodiversity (i.e., organism abundance, species diversity and Morisita-Horn species similarity) and functions (i.e., cycling of carbon, nitrogen stock and phosphorus stock); and the effect of aridity, disturbance category, restoration strategy and life form on their recovery.
- 03_exploratory results.R: This R code enables to run all exploratory data analyses.
Database construction
We collected data from 16,882 plots from 125 chronosequences of forest ecosystems recovering for 50 to 295 years in 110 published primary studies. From these 125 chronosequences, we extracted 635 recovery trajectories of quantitative measures of ecosystem attributes along time, related to the six most widely included recovery metrics with enough representation to be statistically meaningful. These included biodiversity metrics (organism abundance, species diversity, and species similarity) and biogeochemical functioning metrics (carbon cycling, nitrogen stock, and phosphorus stock). We also extracted factors related to the context of where recovery and restoration happened and included the restoration strategy (passive and active), the disturbance type [agriculture (including land recovering from cultivation, grazing or combinations of both), logging and mining], the latitude, and the climatic condition (i.e., aridity index).
The trajectories related to organism abundance mainly contained biomass and density measurements. Species diversity included measurements of species richness and diversity indices. Species similarity trajectories contained information about species composition along the recovery trajectory, which were used to calculate pairwise compositional similarity at specific recovery times compared to a reference value. We used the Morisita-Horn similarity index, which accounts for species relative abundance. Abundance, diversity, and composition trajectories included five life forms: plants (including trajectories of woody plants, non-woody plants, and all plants combined), invertebrates, microorganisms other than fungi, fungi, and birds. Carbon cycling included pools and fluxes in soil, plants, litter, and microorganisms, whereas nitrogen and phosphorus stocks included bioavailable pools in soil, plants, and litter. From each trajectory, we collected all available recovery measures and compared them with a reference value.
Statistical analysis
To estimate the trajectory of forest recovery overtime, we fitted a separate linear mixed model (LMM) for the RR of each recovery metric. We included the recovery time as a fixed factor and as a random slope, and the trajectory identity as a random intercept, enabling a different slope and intercept for each trajectory. As the recovery process over time may result in a wide range of trajectories from linear to more saturating shapes, we consider three functions to include the recovery time variable: one linear and two decelerating trends [ln(recovery time + 1) and √recovery time]. We then selected among the three options the one that best fit to the data of each recovery metric according to the minimum AICc. Their absolute values were square root transformed to meet the assumptions of general linear models and then multiplied by -1 to facilitate interpretation.
Using the resulting LMMs, we predicted the RR after 73, 146, and 219 years of recovery [i.e., one, two, and three times the global life expectancy in 2019, 73 years]. We then predicted the time needed for forest ecosystems to recover to 90% of reference values for each trajectory and recovery metric and calculated the median by metric.
Also using the resulting LMMs, we predicted the RR after 50 and 100 years of recovery for each metric and trajectory (1) to know if the recovery completeness is dependent on the metric and (2) to understand the main explanatory variables underlying the recovery process for each metric. Predictions were performed by using function predict() from stats package. We fitted linear models (LM) to analyse the difference in the RR after 50 years and after 100 years of recovery among recovery metrics.
We then fitted a separate LM for the effect of each explanatory variable studied (i.e., aridity, disturbance type or life form) on the RR predictions after 50 and 100 years of all recovery metrics together, and then for each recovery metric individually. In all the cases, the intercept of the LMMs for each trajectory was also included as a fixed factor to account for the effect of the initial state of degradation when recovery started. For the models fitted for the disturbance type and the life form, we excluded the categories with <1% of the values (i.e., “mining” for disturbance and “bird” for life form) or those including data with mixing information from other categories (i.e., “cultivation and grazing and logging” for disturbance and “woody and non-woody” for life form). We could not test the effects of the restoration strategy on the RR predictions as 88% of the recovery trajectories belong to passively restored forests.
To check the effect of the uncertainty in the recovery estimations, we first predicted the level of recovery after 50 and 100 years by using 999 random coefficients within the CI given by the models from the first stage and assuming a normal distribution. We then compared the original coefficients from the second-stage models with the resulting average coefficient calculated from 999 models coming from the randomizations. We did the comparison to check the effect of the recovery metric and of the recovery predictors (i.e. disturbance, aridity, and life form) after 50 and 100 years.
Changes after Jul 30, 2024:
16-oct-2025: Changes done to meet the suggestions of the revision process.
-
Updated "dataset_forest_recovery_metaanalysis.csv": Corrected values in column "y", "age", "age1_log" and "age_sqrt" for the category "Morisita-Horn" of the variable "metric_type".
-
Updated "02_analysis_resub_clean.R": script updated to include new analyses suggested by the reviewers, mainly including the following:
a) a sensitivity analysis to separately model the recovery of trajectories that had starting points over 100% recovery completeness from those with starting points below 100%
b) another sensitivity analysis to model all the recovery estimations after removing 34% of studies in which the reference forest was the last point in the chronosequence over 100 years, rather than an old growth forest
c) new models to test the effect of latitude on all the recovery estimations
-
New file added "dataset_forest_recovery_metaanalysis_posneg.csv": dataset needed to run a new sensitivity analysis to separately model the recovery of trajectories that had starting points over 100% recovery completeness from those with starting points below 100%
Changes after Oct 27, 2025:
06-may-2026: Changes done to meet the second-round of suggestions of the revision process.
-
Updated "dataset_forest_recovery_metaanalysis.csv": new columns included which were need for the exploratory analyses.
-
Updated "dataset_forest_recovery_metaanalysis_posneg.csv": new columns included to match "dataset_forest_recovery_metaanalysis.csv".
-
Updated "02_analysis_resub_clean.R": script updated to include new analyses suggested by the reviewers, mainly including the following:
a) New models included to estimate the effect of disturbance together with aridity or latitude on the recovery completeness (globally and for each metric)
b) improvement of existing figures
-
New script added "03_exploratory results.R": exploratory data analyses have been removed from script "02_analysis_resub_clean.R" and included in this new script.
-
New file added "dataset_forest_recovery_metaanalysis_similarity.csv": dataset needed to run a Pearson correlation test among the three similarity indices calculated (i.e., Morisita-Horn, Bray-Curtis and Jaccard).
