Skip to main content

Estimating abundance and phenology from transect count data with GLMs

Cite this dataset

Edwards, Collin; Crone, Elizxabeth (2021). Estimating abundance and phenology from transect count data with GLMs [Dataset]. Dryad.


Estimating population abundance is central to population ecology. With increasing concern over declining insect populations, estimating trends in abundance has become even more urgent. At the same time, there is an emerging interest in quantifying phenological patterns, in part because phenological shifts are one of the most conspicuous signs of climate change. Existing techniques to fit activity curves (and thus both abundance and phenology) to repeated transect counts of insects (a common form of data for these taxa) frequently fail for sparse data, and often require advanced knowledge of statistical computing. These limitations prevent us from understanding both population trends and phenological shifts, especially in the at-risk species for which this understanding is most vital. Here we present a method to fit repeated transect count data with Gaussian curves using linear models and show how robust abundance and phenological metrics can be obtained using standard regression tools. We then apply this method to nine years of Baltimore checkerspot data using generalized linear models (GLMs). This case study illustrates the ability of our method to fit even years with only a few non-zero survey counts, and identifies a significant negative relationship between population size and growing degree days (GDD) each year. We believe our new method provides a key tool to unlock previously-unusable data sets, and may provide a useful middle ground between ad hoc metrics of abundance and phenology, and custom-coded mechanistic models.


Please see the methods description in our associated publication.

Usage notes

See the associated publication for an overview of the data and methods. Here we provide data on the Baltimore checkerspot butterfly, the code and results used to carry out analysis and plotting for the associated publication, as well as a detailed tutorial for implementing the novel statistical approach we present in our publication.

Scripts were last run in R version 4.0.2, and the two key scripts are Rmarkdown files (.Rmd). For those unfamiliar with the file format, it allows the combination of R code with document formatting using the Markdown language; there are considerable resources available online to introduce you to the use of Rmarkdown. Rmarkdown files are most easily viewed and edited in the free IDE Rstudio.

Appendix 2 in the associated publication (a tutorial walking through our methods) was generated from "appendix S2 - methods_demo_v6.Rmd". Appendix 3 (the analysis and figure generation for the main text) was generated from "appendix S3 – analysis_v4.Rmd". Their output files, which are copies of Appendices 2 and 3, can also be found in this directory.

Complete file structure:

gaussian_sharable_directory.Rproj - an Rproject for this directory, to ensure the here() function behaves correctly in the two key scripts.

README.txt - .txt version of this document

1_raw_data/ - directory for data files

BCBcounts v3.csv - data on the baltimore checkerspot from 2012-2020. "year" and "date" columns provide the day of observation, "count" contains the number of butterflies seen on that day, and "DOY" is the day of year (redundant with year and date, but the variable used in our analyses).

daily-measures.csv - daily weather readings from NOAA for nearby weather stations. We use USC00190190, which is the closest weather station.

pop-trends.csv - population estimates using capture-recapture methods. For our analyses, we use the "Total" column, and entries for Sex==Both.

2_data_wrangling/ - directory for intermediary data files/objects

climate-covar.rds - R data object file of yearly Growing Degree Days on July 1, calculated from daily-measures.csv in the script 3_scripts/climate-cleaner.R

3_scripts/ - directory for scripts and compiled Rmarkdown documents.

appendix S2 - methods_demo.Rmd - Rmarkdown script that generated Appendix S2.

appendix S3 - analysis.Rmd - Rmarkdown script that generated Appendix S3.

appendix-S2.bib - bibliography file for the two appendix .Rmd files.

appendix-S2---methods_demo_v6.html - compiled appendix S2 (duplicate of published Appendix S2).

appendix-S3---analysis_v4.html - compiled appendix S3 (duplicate of published Appendix S3).

climate-cleaner.R - script for calculating Growing Degree Days (GDD) from NOAA weather file (daily-measures.csv). Saves results as 2_data_wrangling/climate-covar.rds.

4_res/ - directory for results files

analysis-output.csv – example of saving results from Appendix S2

INCA-summary.csv -  summarized information from applying INCA software to the butterfly count data.

5_figs/ - directory for plots and figures generated in "appendix S3 – analysis_v4.Rmd". Each .jpg has an associated meta-data file in .txt format.

"fig1.jpg", "fig2.jpg", "fig3.jpg" were used in the three figures in the main text. "Example gaussian.jpg" was used in the supplementary figure.

INCA – directory for results from fitting INCA software to the butterfly count data.


National Science Foundation, Award: DEB 19-20834

United States Department of Defense, Award: SERDP RC-2700