Data and code from: A bias-robust framework for quantifying community responses to the climate change using the occurrence data
Data files
Mar 19, 2026 version files 1.90 GB
-
01_GeneratedDistributionData.csv
856.48 MB
-
02_ExtractedBiasedOccurrenceData.zip
1 GB
-
Figure_3.eps
68.04 KB
-
Figure_3.tiff
10.71 MB
-
Figure_4.eps
55.29 KB
-
Figure_4.tiff
22.96 MB
-
README.md
8.94 KB
-
SensitivityAnalysis.RData
314.49 KB
-
SimulationCode.R
10.73 KB
-
SimulationCode(Sensitivity_Analysis).R
9.80 KB
Abstract
This repository houses the simulation code and data for "A Bias-robust Framework for Quantifying Community Responses to the Climate Change Using the Occurrence Data." There are two simulation codes. In the first simulation (SimulationCode.R), distribution data for a pseudo-biological community—whose range shifts due to climate warming—is generated, and numerous rounds of biased sampling are conducted from that distribution. The CCDM (Community Change Detection Model) is then applied to the resulting biased occurrence data to evaluate the rate of thermophilization. In the second simulation (SimulationCode_Sensitivity_Analysis.R), sampling and species distribution generation are carried out under different bias conditions to assess the robustness of CCDM across various scenarios.
Dataset DOI: 10.5061/dryad.ksn02v7hj
Description of this repository
This repository houses the code and data for simulations that apply multiple regression analysis models to biased occurrence data to detect thermophilization.
For detailed methods and the mechanism of Community Change Detection Model (CCDM), please refer to the original paper (https://onlinelibrary.wiley.com/doi/10.1111/geb.70223).
Description of the data and file structure
File: SimulationCode.R
Description: R code to conduct the simulation. In the simulation, a fictional species community is generated. And 2 type x 1,000 iteration times samplings are conducted. The samplings include the biases from spatiotemporal variation of observation effort and truncation effect. This code generates Figure_3.eps and Figure_3.tiff.
File: SimulationCode(Sensitivity_Analysis).R
Description: R code to conduct a sensitivity analysis. In the simulation, we conducted sampling that included 231 types of bias, combining 11 types of spatiotemporal variation in observation effort with 11 types of truncation effect. Each sampling was repeated for 100 times, thus 231 x 100 samplings. This generates Figure_4.eps and Figure_4.tiff. This code takes a long time to run, thus we also update the RData (SensitivityAnalysis.RData). You can load this data by typing "load("Your Path/SensitivityAnalysis.RData")" in your R console.
File: 01_GeneratedDistributionData.csv
Description: A fictional species community data. This community is composed of 100 species for 100 years. In each year, 100,000 individuals are included. This code generates Figure S4.
Variables
- IndID: Unique individual identification number
- SpeciesID: Unique identification number for the species to witch the individual belongs.
- Year: Years in which the individual exists.
- LTI: Local Temperature Index (LTI) of the location where the individual occurred.
- SpeciesLTICenter: Central value of the species-specific LTI at the time of its Year
- Prob.BiasToWarm: Value of weighting sampled when Bias to Warm is present.
- Prob.BiasToCold: Value of weighting sampled when Bias to Cold is present.
File: 02_ExtractedBiasedOccurrenceData.zip
Description: The set of 2,000 biased occurrence data.
Variables
- IndID: Unique identification number of the extracted individual.
- SpeciesID: Unique identification number for the species to witch the individual belongs.
- Year: Years in which the individual is extracted
- LTI: Local Temperature Index (LTI) of the location where the individual occurred.
- EstSTI: Species Temperature Index (STI) of the record species calculated on the basis of the occurrence data.
- BiasType: The type of bias
- iter: The identification number of iteration
Reference
Simulation code use the following packages.
- Aphalo P (2024). ggpmisc: Miscellaneous Extensions to 'ggplot2'. R package version 0.5.6, https://CRAN.R-project.org/package=ggpmisc.
- Aphalo P (2024). ggpp: Grammar Extensions to 'ggplot2' R package version 0.5.7, https://CRAN.R-project.org/package=ggpp.
- Bache S, Wickham H (2022). magrittr: A Forward-Pipe Operator for R. R package version 2.0.3, https://CRAN.R-project.org/package=magrittr.
- Barrett T, Dowle M, Srinivasan A, Gorecki J, Chirico M, Hocking T, Schwendinger B, Krylov I (2025). data.table: Extension of
data.frame. R package version 1.17.8, https://CRAN.R-project.org/package=data.table. - Grolemund G, Wickham H (2011). “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40(3), 1-25. https://www.jstatsoft.org/v40/i03/.
- Izrailev S (2024). tictoc: Functions for Timing R Scripts, as Well as Implementations of "Stack" and "StackList" Structures R package version 1.2.1, https://CRAN.R-project.org/package=tictoc.
- Kassambara A (2025). ggpubr: 'ggplot2' Based Publication Ready Plots R package version 0.6.1,
https://CRAN.R-project.org/package=ggpubr. - Knaus J (2023). snowfall: Easier Cluster Computing (Based on 'snow'). R package version 1.84-6.3, https://CRAN.R-project.org/package=snowfall.
- Makowski D, Lüdecke D, Patil I, Thériault R, Ben-Shachar M, Wiernik B (2023). “Automated Results Reporting as a Practical Tool to Improve Reproducibility and Methodological Best Practices Adoption.” CRAN. https://easystats.github.io/report/.
- Müller K, Wickham H (2025). tibble: Simple Data Frames_ R package version 3.3.0, https://CRAN.R-project.org/package=tibble.
- R Core Team (2024). R: A Language and Environment for Statistical Computing R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
- Ren K (2016). pipeR: Multi-Paradigm Pipeline Implementation R package version 0.6.1.3,
https://CRAN.R-project.org/package=pipeR. - Ren K (2021). rlist: A Toolbox for Non-Tabular Data Manipulation. R package version 0.4.6.2, https://CRAN.R-project.org/package=rlist.
- Robinson D, Hayes A, Couch S (2024). broom: Convert Statistical Objects into Tidy Tibbles. R package version 1.0.6, https://CRAN.R-project.org/package=broom.
- Tierney L, Rossini AJ, Li N, Sevcikova H (2021). snow: Simple Network of Workstations. R package version 0.4-4, https://CRAN.R-project.org/package=snow.
- Torchiano M (2020). effsize: Efficient Effect Size Computation. doi:10.5281/zenodo.1480624 https://doi.org/10.5281/zenodo.1480624, R package version 0.8.1, https://CRAN.R-project.org/package=effsize.
- Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org.
- Wickham H (2023). conflicted: An Alternative Conflict Resolution Strategy. R package version 1.2.0, https://CRAN.R-project.org/package=conflicted.
- Wickham H (2023). forcats: Tools for Working with Categorical Variables (Factors). R package version 1.0.0, https://CRAN.R-project.org/package=forcats.
- Wickham H (2023). stringr: Simple, Consistent Wrappers for Common String Operations. R package version 1.5.1, https://CRAN.R-project.org/package=stringr.
- Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” Journal of Open Source Software, 4(43), 1686. doi:10.21105/joss.01686 https://doi.org/10.21105/joss.01686.
- Wickham H, François R, Henry L, Müller K, Vaughan D (2023). dplyr: A Grammar of Data Manipulation. R package version 1.1.4, https://CRAN.R-project.org/package=dplyr.
- Wickham H, Henry L (2023). purrr: Functional Programming Tools. R package version 1.0.2, https://CRAN.R-project.org/package=purrr.
- Wickham H, Hester J, Bryan J (2024). readr: Read Rectangular Text Data. R package version 2.1.5, https://CRAN.R-project.org/package=readr.
- Wickham H, Vaughan D, Girlich M (2024). tidyr: Tidy Messy Data. R package version 1.3.1, https://CRAN.R-project.org/package=tidyr.
Code/software
R 4.4.0
Access information
Actual data was derived from the following sources:
- Occurrence Data was retrieved from Global Biodiversity Information Facility ; GBIF.org (06 June 2025) GBIF Occurrence Download https://doi.org/10.15468/dl.t6dkcd
- Mean Annual Temperature of each collection site was retrieved from WorldClim: https://www.worldclim.org/
- Historical Mean Annual Temperature in Japan was retrieved from Japan Meteorological Agency; https://www.data.jma.go.jp/cpdinfo/temp/list/an_jpn.html (accessed 2025-07-10).
SimulationCode.R
- Generation of true community data: We constructed a fictional community of 100 species that shifted to cooler regions without delay. The optimal LTI for each species at Year 50—the midpoint of the simulation—was randomly assigned from a uniform distribution ranging from −10 °C to 10 °C. The optimal LTI changes from year to year, whereas the STI—also referred to as the climatic niche—remains constant over time. Warming caused a 0.01 °C annual decrease in each species' optimal LTI. Because the LTI is a representative temperature index for a site, it remains constant even under warming conditions. Therefore, decreasing the optimal LTI by 0.01 °C/year represents range shifts at a rate of 0.01 °C/year to colder regions, not a change in climatic niches. Each year, 100,000 individuals were generated and randomly assigned to one of the 100 species. The LTI of each individual's location was drawn from a normal distribution with the mean equal to that species' optimal LTI for that year and a standard deviation of 3 °C. Thus, each individual contains only LTI as locational information. This produced a true community dataset of 10,000,000 individuals (100 years × 100,000 individuals/year).
- Generation of occurrence data with sampling bias: To simulate observations within boundaries that do not encompass species' entire distributions, we created “truncated community data” by excluding individuals in the top and bottom 5 % tails of the LTI—excluding a total of 10 % of records—from the community distribution data. We sampled from the “truncated community data” under two sampling scenarios: “Bias toward Colder”, where the mean LTI of sampling locations shifted annually toward colder regions, and “Bias toward Warmer”, where it shifted toward warmer regions to demonstrate STVOE. The centroid LTI shifted linearly from 1 °C to −1 °C (Bias toward Colder scenario) or from −1 °C to 1 °C (Bias toward Warmer scenario) over the 100 years. Sampling weight for each year and each individual in the truncated community data was assigned based on the probability density of the normal distribution based on the mean as the centroid LTI specific to each year and standard deviation as 5 °C. A total of 10,000 records were extracted using weighted sampling to generate occurrence data. This occurrence data generation process was repeated 1000 times for each scenario (2 scenarios × 10,000 records/dataset × 1000 datasets).
- Regression analysis and evaluation: For each occurrence dataset, the STI for each species was estimated by averaging LTIs of records collected in the first 20 years. For each occurrence dataset, multiple regression analysis was applied, followed by correction using the resulting coefficients . The difference between the estimated thermophilization rate and the true simulated warming rate (0.01 °C/year) was evaluated using Cohen's d effect size.
SimulationCode(Sensitivity_Analysis).R
To evaluate the validity of the CCDM under different bias conditions, we conducted sensitivity analyses by varying the magnitudes of STVOE and the truncation effect. STVOE was simulated by setting the centroid LTI of sampling efforts in the first and final years to differ from −4 °C (+2 °C to −2 °C) to +4 °C (−2 °C to +2 °C) in 0.2 °C increments. Same centroid LTIs of sampling effort in the initial and final year indicated no STVOE, however, a spatial bias remained, as the observations were still concentrated around LTI = 0 °C. For each STVOE setting, the truncation effect was simulated by removing 0 %–7.5 % from each tail of the species distribution (i.e., 0 %–15 % in total) in 0.75 % increments. In total, 231 bias scenarios were generated from the 21 STVOE levels and 11 truncation levels. Each combination was sampled 100 times. For every simulated occurrence dataset, CCDM was applied, and the differences between the estimated values—both uncorrected (c1) and bias-corrected (c1/c2)—and the true warming rate (0.01 °C year−1) were compared.
