Skip to main content

Data For: Herbarium specimens provide reliable estimates of phenological responsiveness to climate at unparalleled taxonomic and spatiotemporal scales

Cite this dataset

Ramirez-Parada, Tadeo; Park, Isaac; Mazer, Susan (2022). Data For: Herbarium specimens provide reliable estimates of phenological responsiveness to climate at unparalleled taxonomic and spatiotemporal scales [Dataset]. Dryad.


Understanding the effects of climate change on the phenological structure of plant communities will require measuring variation in sensitivity among thousands of co-occurring species across regions. Herbarium collections provide vast resources with which to do this, but may also exhibit biases as sources of phenological data. Despite general recognition of these caveats, validation of herbarium-based estimates of phenological sensitivity against estimates obtained using field observations remain rare and limited in scope. Here, we leveraged extensive datasets of herbarium specimens and of field observations from the USA National Phenology Network for 21 species in the United States and, for each species, compared herbarium- and field-based standardized estimates of peak flowering dates and of sensitivity of peak flowering time to geographic and interannual variation in mean spring minimum temperatures (TMIN). We found strong agreement between herbarium- and field-based estimates for standardized peak flowering time (r=0.91, p<0.001) and for the direction and magnitude of sensitivity to both geographic TMIN variation (r=0.88, p <0.001) and interannual TMIN variation (r=0.82, p<0.001). This agreement was robust to substantial differences between datasets in 1) the long-term TMIN conditions observed among collection and phenological monitoring sites and 2) the interannual TMIN conditions observed in the time periods encompassed by both datasets for most species. Our results show that herbarium-based sensitivity estimates are reliable among species spanning a wide diversity of life histories and biomes, demonstrating their utility in a broad range of ecological contexts, and underscoring the potential of herbarium collections to enable phenoclimatic analysis at taxonomic and spatiotemporal scales not yet captured by observational data.



Phenological data

The dataset of field observations consisted of all records of flowering onset and termination available in the USA National Phenology Network database (NPNdb), representing an initial 1,105,764 phenological observations. To ensure the quality of the observational data, we retained only observations for which we could determine that the dates of onset and termination of flowering had an arbitrary maximum error of 14 days. To do this, we filtered the data to include only records for which the date on which the first open flower on an individual was observed was preceded by an observation of the same individual without flowers no more than 14 days prior, and for which the date on which the last flower was recorded was followed by an observation of the same individual without flowers no more than 14 days later. After filtering, field observations in our data had an average maximum error of 6.4 days for the onset of flowering, and of 6.6 days for the termination of flowering. The herbarium dataset was constructed using an initial 894,392 digital herbarium specimen records archived by 72 herbaria across North America. We excluded from analysis all specimens not explicitly recorded as being in flower, or for which GPS coordinates or dates of collection were not available. We further filtered both datasets by only retaining species that were found in both datasets and that were represented by observations at a minimum of 15 unique sites in the NPN dataset. For each species, and to more closely match the geographic ranges covered by each dataset, we filtered the herbarium dataset to include only specimens within the range of latitudes and longitudes represented by the field observations in the NPN data. Finally, we retained only species represented by 70 or more herbarium specimens to ensure sufficient sample sizes for phenoclimatic modeling.

This procedure identified a final set of 21 native species represented in 3,243 field observations across 1,406 unique site-year combinations, and a final sample of 5,405 herbarium specimens across 4,906 unique site-year combinations. For the herbarium dataset, sample sizes ranged from 69 unique sites and 74 specimens for Prosopis velutina, to 1,323 unique sites containing 1,368 specimens for Achillea millefolium. Sample sizes in the NPN dataset ranged from 15 unique sites with 74 observations for Impatiens capensis, 108 unique sites with 321 observations for Cornus florida. These 21 species represented 15 families and 17 genera, spanning a diverse range of life-history strategies and growth forms, including evergreen and deciduous shrubs and trees (e.g., Quercus agrifolia and Tilia americana, respectively), as well as herbaceous perennials (e.g., Achillea millefolium) and annuals (e.g., Impatiens capensis). Our focal species covered a wide variety of biomes and regions including Western deserts (e.g., Fouquieria splendens), Mediterranean shrublands and oak woodlands (e.g., Baccharis pilularis, Quercus agrifolia), and Eastern deciduous forests (e.g., Quercus rubra, Tilia Americana).

To estimate flowering dates in the herbarium dataset, we employed the day of year of collection (henceforth ‘DOY’) of each specimen collected while in flower as a proxy. Herbarium specimens in flower could have been collected at any point between the onset and termination of their flowering period and botanists may preferentially collect individuals in their flowering peak for many species. Therefore, herbarium specimen collection dates are more likely to reflect peak flowering dates than flowering onset dates. To maximize the phenological equivalence of the field and herbarium datasets, we used the median date between onset and termination of flowering for each individual in each year in the NPN data as a proxy for peak flowering time. Due to the maximum error of 14 days for flowering onset and termination dates in the NPN dataset, median flowering dates also had a maximum error of 14 days, with an average maximum error among observations of 6.5 days. To account for the artificial DOY discontinuity between December 31st (DOY = 365 or 366 in a leap year) to January 1st (DOY = 1), we converted DOY in both datasets into a circular variable using an Azimuthal correction.

Climate data

Daily minimum temperatures mediate key developmental processes including the break of dormancy, floral induction, and anthesis. Therefore, we used minimum surface temperatures averaged over the three months leading up to (and including) the mean flowering month for each species (hereafter ‘TMIN’) as the climatic correlate of flowering time in this study; consequently, the specific months over which temperatures were averaged varied among species. Using TMIN calculated over different time periods instead (e.g., during spring for all species) did not qualitatively affect our results. Then, we partitioned variation among sites into spatial and temporal components, characterizing TMIN for each observation by the long-term mean TMIN at its site of collection (henceforth ‘TMIN normals’), and by the deviation between its TMIN in the year of collection (for the three-month window of interest) and its long-term mean TMIN (henceforth ‘TMIN anomalies’). For each site, we obtained a monthly time series of TMIN from January, 1901, and December, 2016, using ClimateNA v6.30, a software package that interpolates 4km2 resolution climate data from PRISM Climate Group from Oregon State University, ( to generate elevation-adjusted climate estimates. To calculate TMIN normals, we averaged observed TMIN for the three months leading up to the mean flowering date of each species across all years between 1901 and 2016 for each site. TMIN anomalies relative to long-term conditions were calculated by subtracting TMIN normals from observed TMIN conditions in the year of collection. Therefore, positive and negative values of the anomalies respectively reflect warmer-than-average and colder-than-average conditions in a given year.


We also provide R code to reproduce all results presented in the main text and the supplemental materials of our study. This code includes 1) all steps necessary to merge herbarium and field data into a single dataset ready for analysis, 2) the formulation and specification of the varying-intercepts and varying-slopes Bayesian model used to generate herbarium- vs. field-based estimates of phenology and its sensitivity to TMINsp, 3) the steps required to process the output of the Bayesian model and to obtain all metrics required for the analyses in the paper, and 4) the code used to generate each figure.

Contributing Herbaria

Data used in this study was contributed by the Yale Peabody Museum of Natural History, the George Safford Torrey Herbarium at the University of Connecticut, the Acadia University Herbarium, the Chrysler Herbarium at Rutgers University, the University of Montreal Herbarium, the Harvard University Herbarium, the Albion Hodgdon Herbarium at the University of New Hampshire, the Academy of Natural Sciences of Drexel University, the Jepson Herbarium at the University of California-Berkeley, the University of California-Berkeley Sagehen Creek Field Station Herbarium, the California Polytechnic State University Herbarium, the University of Santa Cruz Herbarium, the Black Hills State University Herbarium, the Luther College Herbarium, the Minot State University Herbarium, the Tarleton State University Herbarium, the South Dakota State University Herbarium, the Pittsburg State University Herbarium, the Montana State University-Billings Herbarium, the Sul Ross University Herbarium, the Fort Hays State University Herbarium, the Utah State University Herbarium, the Brigham Young University Herbarium, the Eastern Nevada Landscape Coalition Herbarium, the University of Nevada Herbarium, the Natural History Museum of Utah, the Western Illinois University Herbarium, the Eastern Illinois University Herbarium, the Northern Illinois University Herbarium, the Morton Arboretum Herbarium, the Chicago Botanic Garden Herbarium, the Field Museum of Natural History, the University of Wisconsin-Madison Herbarium, the University of Michigan Herbarium, the Indiana University Herbarium, the Universidad de Sonora Herbarium, the Centro de Investigaciones Biológicas del Noroeste, S. C., the Instituto Politécnico Nacional, CIIDIR Unidad Durango, the University of California-Riverside Herbarium, the San Diego State University Herbarium, the Granite Mountains Desert Research Center, the University of South Carolina Herbarium, the Auburn University Museum of Natural History, the Clemson University Herbarium, the Eastern Kentucky University Herbarium, the College of William and Mary Herbarium, the Appalachian State University Herbarium, the University of North Carolina Herbarium, the University of Memphis Herbarium, the Mississippi State University Herbarium, the University of Mississippi Herbarium, the University of Southern Mississippi Herbarium, the Mississippi Museum of Natural Science, the Marshall University Herbarium, the Longwood University Herbarium, the Herbarium of Western Carolina University, the Northern Kentucky University Herbarium, the Salem College Herbarium, the Troy University Herbarium, the Arizona State University Herbarium, the University of Arizona Herbarium, the Desert Botanical Garden, the Deaver Herbarium, the Navajo Nation Department of Fish and Wildlife, the Grand Canyon National Park Herbarium, the University of New Mexico Herbarium, the Western New Mexico University  Herbarium, the Museum of Northern Arizona, the Gil National Forest Herbarium, the Arizona Western College Herbarium, and the Natural History Institute.

Usage notes

The README_Ramirez-Parada_Park_Mazer.txt contains relevant usage notes.


National Science Foundation, Award: DEB‐1556768