Skip to main content
Dryad logo

Data from: Analysing the dynamics and relative influence of variables affecting ecosystem responses using functional PCA and boosted trees: a seagrass case study

Citation

Wu, Paul P-Y et al. (2019), Data from: Analysing the dynamics and relative influence of variables affecting ecosystem responses using functional PCA and boosted trees: a seagrass case study, Dryad, Dataset, https://doi.org/10.5061/dryad.j943d68

Abstract

1. Understanding the relative influence of variables on ecosystem responses and the dynamics of their effect is necessary for effective ecosystem monitoring and management. Also known as causal pathways anlaysis, we develop an approach using functional Principal Components Analysis (fPCA) and machine learning within a scenario analysis framework. 2. fPCA is used to identify most influential variables for correlated, non-homogenoeus and non-linear time series data characteristic of complex ecosystems. Hierarchical clustering of fPCA scores reveals groups of more homogeneous scenarios and similarly influential variables. The resultant subset of variables helps to overcome model identifiability problems when analysing time-lagged effects using Boosted Regression Trees (BRT). 3. We use simulated data generated by a Dynamic Bayesian Network (DBN) of ecological windows for seagrass ecosystems given dredging stressors; 3024 scenarios with 75 state variables are analysed. The BRT demonstrated a high level of fit ((R^2≈0.97,MSE≈0.16), supporting the validity of influential variables identified by fPCA. Influential variables identified included genus, location type, light, growth and seed. Six consecutive months of positive growth and adequate light were important for predicting states of high or moderate population. 4. Compared to traditional scenario analysis and sensitivity analysis approaches, our approach simultaneously enabled capture of n-way interactions while accounting for time correlations. Although some variables and their dynamics agreed with existing knowledge, new variables and/or time lags of their effects were identified, corresponding to opportunities for further investigation as well as informing monitoring and management. Although our method was demonstrated on state variables with DBN simulated data, it is equally applicable to general time series data.

Usage Notes