This readme file was generated on 2022-10-07 by Aaron Goodman GENERAL INFORMATION Title of Dataset: Dynamic distribution modeling of the Swamp Tigertail dragonfly Synthemis eustalacta (Odonata: Anisoptera: Synthemistidae) over a 20-year bushfire regime Author/Principal Investigator Information Name: Aaron Goodman ORCID: https://orcid.org/0000-0002-7051-8249 Institution: American Museum of Natural History Division of Invertebrate Zoology, NY, NY, 10024 Address: Division of Invertebrate Zoology, NY, NY, 10024 Email: agoodman@amnh.org Author/Associate or Co-investigator Information Name: Jamie M. Kass ORCID: https://orcid.org/0000-0002-9432-895X Institution: Okinawa Institute of Science and Technology Graduate University Address: 1919-1 Tancha, Onna, Kunigami District, Okinawa 904-0495, Japan Email: Jamie Kass Author/Associate or Co-investigator Information Name: Jessica Ware ORCID: https://orcid.org/0000-0002-4066-7681 Institution: American Museum of Natural History Address: Division of Invertebrate Zoology, NY, NY, 10024 Email: jware@amnh.org Date of data collection: September 2021 - August 2022 Geographic location of data collection: -We acquired occurrences of adult S. eustalacta from the Global Biodiversity Information Facility (GBIF, DOI: https://doi.org/10.15468/dl.c256kv) within Southeast Australia. -We generated yearly sets of environmental predictor variables for modeling that included bioclimatic variables, as well as vegetation cover and seasonal burned area for the years 2001-2020 for the continent of Australia. Information about funding sources that supported the collection of the data: -This research was funded by the by National Science Foundation GEODE Award #2002473, as well as the Japan Society for the Promotion of Science (JSPS) Postdoctoral Fellowship for Foreign Researchers. SHARING/ACCESS INFORMATION Licenses/restrictions placed on the data: None Links to publications that cite or use the data: -Jones, D.A., Wang, W., & Fawcett, R. (2009) High-quality spatial climate data-sets for Australia. Australian Meteorological and Oceanographic Journal, 58, 233. -Busetto, L. & Ranghetti, L. (2016) MODIStsp : An R package for automatic preprocessing of MODIS Land Products time series. Computers & Geosciences, 97, 40-48. -Hijmans, R.J., Phillips, S., Leathwick, L., & Elith, J. (2017) Package ‘dismo’. Circles, 9.1, 1-68. -The Biodiversity and Climate Change Virtual Laboratory (BCCVL, https://bccvl.org.au/) -Global terrestrial ecoregions from the World Wildlife Fund (https://www.worldwildlife.org). Links to other publicly accessible locations of the data: Occurrence data of termite species can be found using GBIF (https://www.gbif.org/) Links/relationships to ancillary data sets: None Was data derived from another source? If yes, list source(s): We acquired monthly minimum and maximum temperature and precipitation rasters for Australia at 2.5 arcminutes resolution (approx. 5 km at the equator), produced by the Australia Bureau of Meteorology (Jones et al., 2009). From these rasters, we created a set of 19 bioclimatic variables representing means, variabilities, and extremes for temperature and precipitation using the dismo package in R Recommended citation for this dataset: None DATA & FILE OVERVIEW This file contains the occurrence points for S. eustalacta (as a GBIF Link), all yearly environmental variables. LINK TO GBIF OCCURRENCES https://doi.org/10.15468/dl.c256kv -This link contains all occurrences of S. eustalacta from the years 2001 - 2020 with all metadata GBIF provides. -See Dynamic_Model.R, and Yearly_Model.R for subsequent processing of data including spatial thinning, and extraction of environmental values Yearly Environmental Data_2001_2020.zip -The Yearly Environmental Data_2001_2020 file contains yearly folders of all environmental data (including 19 bioclim variables, landcover variables, and categorical ecoregions and vegetative group variables) for the years 2001-2020 File Structure 2001 - file containing environmental variables for each year (in this case it is the year 2001) bio01.tif - Raster file for bioclim variable bio01 bio02.tif - Raster file for bioclim variable bio02 bio03.tif - Raster file for bioclim variable bio03 bio04.tif - Raster file for bioclim variable bio04 bio05.tif - Raster file for bioclim variable bio05 bio06.tif - Raster file for bioclim variable bio06 bio07.tif - Raster file for bioclim variable bio07 bio08.tif - Raster file for bioclim variable bio08 bio10.tif - Raster file for bioclim variable bio10 bio11.tif - Raster file for bioclim variable bio11 bio12.tif - Raster file for bioclim variable bio12 bio13.tif - Raster file for bioclim variable bio13 bio14.tif - Raster file for bioclim variable bio14 bio15.tif - Raster file for bioclim variable bio15 bio16.tif - Raster file for bioclim variable bio16 bio17.tif - Raster file for bioclim variable bio17 bio18.tif - Raster file for bioclim variable bio18 bio19.tif - Raster file for bioclim variable bio19 biome_01.tif - Raster file for global terrestrial ecoregions from the World Wildlife Fund evapotranspiration.tif - Raster file for Annual evapotranspiration derived from MODIS fires.tif - Raster file Annual burned area derived from MODIS NDVI.tif - Raster file for annual NDVI cover derived from MODIS nontreecover.tif - Raster file for Percent Non-tree Cover derived from MODIS nonvegcover.tif - Raster file for Percent Non-vegetated Cover derived from MODIS treecover.tif - Raster file for Percent Tree Cover derived from MODIS vegedata.tif - Raster file for major vegetation groups from The Biodiversity and Climate Change Virtual Laboratory Folders 2002 - 2020 follow the same format as above Dynamic_Model.R -The Dynamic_Model.R code contains R code for performing dynamic species distribution modeling for our occurrences and environmental variables for the years 2001-2020 Yearly_Model.R -The Yearly_Model.R code contains R code for performing yearly static species distribution models for our occurrences and environmental variables for the years 2001-2020 Raster_subtract.R -The Raster_subtract.R code contains R code for subtracting dynamic species distribution models from our static species distribution models -The Raster_subtract.R code also contains R code for generating graphs of permutation importance for both our dynamic and static models -The Raster_subtract.R code also contains R code for generating our density distributions of environmental variables of high permutation importance Relationship between files, if important: The Raster_subtract.R code uses csv files and environmental datasets generated from our Dynamic_Model.R code and our Yearly_Model.R code METHODOLOGICAL INFORMATION Before modeling, we processed our occurrences to account for sampling bias, delineated a study extent to sample background points, and omitted highly correlated environmental variables. We spatially thinned occurrences by 10 km to reduce the effects of sampling bias and artificial clustering (Veloz, 2009) using the spThin package (Aiello-Lammens et al., 2015), which resulted in more even sample sizes across years (n = 133 total; Table 1). Synthemis eustalacta is endemic to southern Australia and disperses roughly 500 m from streams in early adulthood, but upon reaching maturity returns to their site of emergence (Theischinger & Hawking, 2006). We thus chose a study extent to include potentially unsampled areas yet exclude large areas outside the species’ dispersal limitations (Peterson & Soberón, 2012), defined as a minimum convex polygon around all localities (2001-2020) buffered by 1 degree (approx. 111 km at equator). Within this extent, we randomly sampled 50,000 background points for modeling and extracted their yearly environmental values. We used these values to calculate correlations between variables using the ‘vifcor’ and ‘vifstep’ functions in the usdm package v 1.1-18 (Naimi, 2017) and filtered out variables by year with correlation coefficients higher than 0.9 and a VIF threshold of 10. Finally, we retained variables for analysis which were kept among all yearly environmental backgrounds. To model the distribution of S. eustalacta, we used the presence-background algorithm Maxent v3.4.4 (Phillips et al., 2017), which remains one of the top-performing models for fitting SDMs with background data (Valavi et al., 2021). To automate model building and evaluation with different complexity settings and reporting of results, we used the R package ENMeval 2.0.0 (Kass et al., 2021). We constructed both dynamic models that incorporated data across years and static models that used year-specific data (Fig. 1). For the dynamic models, we extracted the environmental predictor values for each year from the occurrence and background points for that year and assembled them into a single training dataset. To construct a single background point dataset for the dynamic models, we extracted yearly environmental values for the same set of background points, then averaged these values across years per background point (we used the mode for categorical variables; Fig. 1). We evaluated models using random k-fold cross-validation, in which occurrences are randomly partitioned into a specified number of groups (i.e., “folds”), then models are sequentially trained on all groups but one (training data) and evaluated on the withheld group (validation data) (Hastie et al., 2009)—we used four folds (k = 4) for our evaluations. As random partitioning can result in spatial autocorrelation due to clustering within folds, spatial block cross-validation techniques are often prescribed to address this (Roberts et al., 2017). However, as our occurrences varied not only in space but also in time, and as we lacked enough records per time bin to additionally separate by temporal block, we chose to use simpler random partitioning for evaluation. In contrast to the single dynamic model, we also constructed one static model per year that used only occurrence and background environmental values for that year. For this approach, we did not make models for years with fewer than five associated occurrences (Phillips et al., 2017; Phillips & Dudík, 2008). For static models, we partitioned our data using the ‘leave-one-out’ strategy (referred to as “jackknife” in ENMeval), whereby one occurrence record is withheld from each model during cross-validation (k = n, or the number of occurrences). This cross-validation technique is most appropriate for small sample sizes, as it allows for the largest training datasets possible for model-building during cross-validation (Pearson et al., 2006; Shcheglovitova & Anderson, 2013). All final models were fitted to the full datasets. As Maxent models can lead to very different results when complexity settings are changed (Radosavljevic & Anderson, 2014; Warren & Seifert, 2011), we compared dynamic and static models with settings representing “simple” and “complex”. Feature classes determine the shape of the model fit, while regularization multipliers control how much complexity is penalized—high regularization multipliers can result in predictor variable coefficients shrinking to zero and thus dropping out of the model (Phillips et al., 2017). Model tuning consists of varying complexity settings, running models with combinations of these settings, and selecting optimal settings based on performance metrics. For tuning, we chose a simple model with a single linear feature class (L) and a complex model with a combination of linear, quadratic, and hinge feature classes (LQH). Hinge features provide flexible linear fits that can change direction and can considerably inflate the number of model parameters when included (Phillips & Dudík, 2008). To both allow for flexibility in the growth of model complexity and simplify comparisons between models, we set the regularization multiplier to one (the Maxent software default) for each model. We assessed the dynamic and static models using averages of threshold-dependent (omission rate) and threshold-independent (AUC) discrimination performance metrics calculated on withheld validation data, but also checked the results against the Akaike Information Criterion with correction for small samples sizes (AICc; calculated on the full dataset) (Warren & Seifert, 2011). Omission rate measures the proportion of withheld occurrence localities (i.e., those in the validation dataset) that fall outside a designated model prediction threshold. We assessed the 10-percentile omission rate, which sets this threshold as the lowest suitability value for occurrences after removing the lowest 10% suitability values—this avoids choosing very low thresholds that may represent outliers (Kass et al., 2021; Radosavljevic & Anderson, 2014). Validation AUC is a measure of discrimination accuracy that can be used to make relative comparisons between SDMs with different settings fit on the same data (Radosavljevic & Anderson, 2014), but is problematic to use as an absolute performance measure for presence-background models (Lobo et al., 2008). To investigate model behavior, we examined predictor variable importance values and compared background density distributions of important variables for intense fire years between models. First, we recorded the permutation importance values for predictor variables reported by Maxent. These values are calculated by randomly permuting the values of all variables but one, building a new model, then calculating the difference between each model’s training AUC and that of the empirical model (Phillips, 2021). Next, as we attributed summaries of predictor variables to the background points for the dynamic models, it was possible that years with extreme values (i.e., intense fire years) may have environmental distributions that are very different than these summaries, which would indicate that dynamic models could have overly smooth predictions for such years. To determine if any key differences existed between the dynamic model’s averaged background and the static models’ yearly backgrounds, we plotted background density distributions of variables with high importance for the years 2009 and 2019, then compared the shapes of these distributions with pairwise Kolmogorov-Smirov goodness-of-fit (KS) tests. We chose these two years because as they had the largest and most intense bushfires in the past 20 years in southern Australia (Van Eeden et al., 2020), we expected them to have the most extreme environmental differences between yearly and averaged backgrounds. For these plots, we chose the top three variables with the highest permutation importance shared between the simple and complex settings separately for the dynamic and static models. We made habitat suitability predictions for S. eustalacta over the study extent for each year using both the simple and complex settings for the dynamic and static models. We generated predictions by transferring all models onto our environmental variables, creating an envelope of suitability across our study extent. Maxent raw predictions were transformed to a scale of 0-1 to approximate probability of occurrence using the ‘cloglog’ transformation (Phillips et al., 2017). We generated predictions for all years with the dynamic models, but only for years with sufficient (>4) occurrence records for our static models. As an additional check for differences between the two approaches, we mapped the difference in predictions of our simple and complex dynamic and static models for the intense fire years of 2009 and 2019. To better visualize raster subtractions, we generated a binary threshold prediction for our fire-prone years, calculated from the 10-percentile omission rate from our model evaluation.