Code and data for Bayesian joint species distribution model selection for communitylevel prediction
Data files
Nov 24, 2023 version files 1.13 MB

README.md

understory_comm_dat.csv
Abstract
Code and data for reproducing the analysis in the manuscript “Bayesian joint species distribution model selection for communitylevel prediction.” Provided data include percent cover observations for 39 modeled vascular plant species within boreal forest understory communities and environmental model covariates. R code is provided to generate model inputs, apply alternative models, generate outofsample predictions, and calculate associated community and species log scores and alternative model evaluation metrics. Further, R source code is provided to implement the multinomial joint species distribution model defined in the manuscript. Details on the data, its processing, and the alternative model definitions and structure can be found in the main text of the manuscript. Provided data are currently being used in ongoing analyses and coordination with authors may be warranted to avoid duplicate publication. Potential users are encouraged to consider collaboration with authors when useful and appropriate. Misinterpretation of data may occur if used outside the context of the original analysis. All data are made available in their current state. While significant efforts have been made to ensure data accuracy, complete accuracy cannot be guaranteed. Data may be updated periodically. It is the responsibility of the data user to check for updated versions of the data.
README
Code and data for Bayesian joint species distribution model selection for communitylevel prediction
Boreal forest understory community data and associated environmental variables for reproducing the analysis in the manuscript "Bayesian joint species distribution model selection for communitylevel prediction." Data include percent cover observations of 39 vascular plant species across 1,700 unique vegetation survey sites sampled in 19851986, 1995, and 2006 in conjunction with the 8th Finnish National Forest Inventory. Environmental variables characterizing the conditions for each siteyear combination are provided. These are the same variables used to predict species relative abundances at each site within the associated manuscript.
Description of the data and file structure
Understory community data is provided in a single commaseparated text file named "understory_comm_dat.csv". The rows of the data file correspond to unique sitebyinventory year combinations. The columns provide percent cover observations for vascular plant species along with several identifier and environmental variables. Metadata for each variable is provided below.
Identifier variables
 Site: Unique numeric identifier for each vegetation survey site
 Year: The year in which the survey site was measured (1985, 1995, 2006)
 Site_year: Unique combinations of sitebyyear
 BZ: Bioclimatic zone in which the site is located (SB = south boreal; MB = mid boreal; NB = north boreal) based on Ahti et al., (1968)
Environmental variables
 aveGDD74_84: Growing degree days over the 19741984 period estimated as the 10year moving average of the total number of days with a daily mean temperature exceeding +5 deg. C per site over the reported decadal period based on 10 sq. km interpolated daily temperature values modeled by the Finnish Meteorological Institute (Venäläinen et al., 2005)
 aveGDD84_94: Growing degree days over the 19841994 decadal period (defined as under the aveGDD74_84 variable)
 aveGDD95_05: Growing degree days over the 19952005 decadal period (defined as under the aveGDD74_84 variable)
 fertility: Soil fertility based on sitelevel indicator vegetation observed during the inventory year and broken into six ordinal classes (1 indicates highest fertility; 6 indicates lowest fertility) based on Cajander (1949) and described in Tomppo et al., (2011)
 shrub_cover: Projected percent cover of shrubs and 0.51.5 m tall trees located within a 9.8 m radius circular plot centered on the vegetation survey site
 ba: The basal area reported in m^2 per ha of live overstory trees derived from measurements of stem diameter at 1.3 m collected during the inventory year (Tomppo et al., 2011)
Percent cover of vascular plants
Remaining columns in the data file report the mean percent cover of vascular plants across four 2 m^2 quadrats located 5 m apart within each vegetation survey site. Column names correspond to species abbreviations defined below.
 AGROCAPI: Agrostis capillaris
 BETUPUB3: Betula pubescens
 CALAARUN: Calamagrostis arundinacea
 CALLVULG: Calluna vulgaris
 CAREDIGI: Carex digitata
 CAREGLOB: Carex globularis
 CONVMAJA: Convallaria majalis
 DESCCESP: Deschampsia cespitosa
 DESCFLEX: Deschampsia flexuosa
 DRYOCART: Dryopteris carthusiana
 EMPENIGR: Empetrum nigrum
 EPILANGU: Epilobium angustifolium
 EQUISYLV: Equisetum sylvaticum
 FRAGVESC: Fragaria vesca
 GYMNDRYO: Gymnocarpium dryopteris
 JUNICOM3: Juniperus communis
 LEDUPALU: Ledum palustre
 LINNBORE: Linnaea borealis
 LUZUPILO: Luzula pilosa
 LYCOANNO: Lycopodium annotinum
 MAIABIFO: Maianthemum bifolium
 MELAPRAT: Melampyrum pratense
 MELASYLV: Melampyrum sylvaticum
 MELINUTA: Melica nutans
 ORTHSECU: Orthilia secunda
 OXALACET: Oxalis acetosella
 PICEABI3: Picea abies
 PINUSYL3: Pinus sylvestris
 POPUTRE3: Populus tremula
 PTERAQUI: Pteridium aquilinum
 RUBUIDA4: Rubus idaeus
 RUBUSAXA: Rubus saxatilis
 SOLIVIRG: Solidago virgaurea
 SORBAUC3: Sorbus aucuparia
 TRIEEURO: Lysimachia europaea
 VACCMYRT: Vaccinium myrtillus
 VACCULIG: Vaccinium uliginosum
 VACCVITI: Vaccinium vitisidaea
 VIOLRIVI: Viola riviniana
Code for processing the data and reproducing the analysis in the corresponding manuscript are described under Code/Software below.
Sharing/Access information
Data and code were derived from the following sources.
 Ahti, T., HämetAhti, L., and Jalas, J. (1968). Vegetation zones and their sections in northwestern Europe. Annales Botanici Fennici, 5(3):169–211.
 Venäläinen, A., Tuomenvirta, H., Pirinen, P., and Drebs, A. (2005). A basic Finnish climate data set 1961–2000–description and illustrations. Finnish Meteorological Institute, Reports, 5:1–27.
 Cajander, A. K. (1949). Forest types and their significance. Acta For. Fenn., 56:1–71.
 Tomppo, E., Heikkinen, J., Henttonen, H. M., Ihalainen, A., Katila, M., Mäkelä, H., et al. (2011). Designing and conducting a forest inventorycase: 9th National Forest Inventory of Finland, volume 22. Springer Science & Business Media.
 Tikhonov, G., Ovaskainen, O., Oksanen, J., de Jonge, M., Opedal, O., and Dallas, T. (2021). Hmsc: Hierarchical Model of Species Communities. R package version 3.011.
Code/Software
One R script file and several R functions are provided to reproduce the analysis in the corresponding manuscript. All provided R files are described below.
"applied_model_selection.R": R script file that loads the data file described above, processes the data to generate inputs associated with the models described in the corresponding manuscript, demonstrates the implementation of each model including model postprocessing and outofsample prediction, and generates log scores and alternative scoring metrics for each applied model. All source files needed to run the R script are provided and described below.
"HmscUnderstory": Repository including R source code for a modified version of the Hierarchical Model of Species Communities R package (Tikhinov et al., 2021) used to implement the Poission approximation to the multinomial described in the corresponding manuscript. Calls to the modified source files are demonstrated in the "applied_model_selection.R" file.
"ls_approx_pois_multinom.R": R source code approximating the joint community log score for a multinomial data model fit using the Poisson approximation to the multinomial.
"ls_approx_poisson.R": R source code approximating the joint community log score for a log normal Poisson data model (equivalent to the independent community log score).
"ls_approx_spp_poisson.R": R source code approximating individual species log scores for models fit applying either a multinomial or log normal Poisson data model.
"spp_diversity_rmse.R": R source code estimating the total squared error for the Shannon true diversity index for models fit applying either a multinomial or log normal Poisson data model.
"jaccard_idx.R": R source code estimating the sum of the posterior mean Jaccard community dissimilarity index across all sample sites for models fit applying either a multinomial or log normal Poisson data model.
"spp_rmse.R": R source code estimating the total squared error for specieslevel predictions for models fit applying either a multinomial or log normal Poisson data model.
"spp_pred_var.R": R source code estimating the total posterior predictive variance for specieslevel predictions for models fit applying either a multinomial or log normal Poisson data model.
Methods
Data include percent cover observations of 39 vascular plant species across 1,700 unique vegetation survey sites sampled in 19851986, 1995, and 2006 in conjunction with the 8th Finnish National Forest Inventory. Values report the mean percent cover of vascular plants across four 2 m^2 quadrats located 5 m apart within each vegetation survey site. Environmental variables characterizing the conditions for each siteyear combination are also provided. These are the same variables used to predict species relative abundances at each site within the associated manuscript.
Usage notes
No special program or software is required to open data. R source code and data processing file require the R statistical computing environment.