Data from: The leading edge matters too: fitness and the expression of adaptive differentiation are greatest at the high-elevation edge of a species range

Abstract

This dataset contains raw and processed data from two common garden experiments testing local adaptation and fitness variation across the elevational range of Erythranthe laciniata (Phrymaceae), an annual monkeyflower endemic to the Sierra Nevada, California, USA. Experiments were conducted in 2009 and 2021 at low-, mid-, and high-elevation field gardens spanning the species’ distribution. The dataset includes survival and lifetime fitness measurements (total flower number), population and maternal line identifiers, geographic coordinates, elevation, climate data (NASA Daymet), and associated metadata. These data support analyses presented in Shay & Pennington et al. (2025), The leading edge matters too: fitness and the expression of adaptive differentiation are greatest at the high-elevation edge of a species’ range (Ecology Letters).

https://doi.org/10.6071/M39T04

Description of the data and file structure

This README file was generated on 2025-12-18 by Jackie Shay

Data Description

Dataset Title

The leading edge matters too: fitness and the expression of adaptive differentiation are greatest at the high-elevation edge of a species range

Date of Data Collection

2008-2009 and 2020–2021

Contributors

Jackie E. Shay, Department of Ecology, Evolution, and Marine Biology, University of California, Santa Barbara, CA, USA
Lillie K Pennington, Department of Genetics, University of Georgia, Georgia, GA, USA
Daniel J. Toews, Environmental Systems Graduate Group, University of California, Merced, CA, USA
Elizabeth Green, Department of Life and Environmental Sciences, University of California, Merced, CA, USA
Jason P. Sexton, Department of Life and Environmental Sciences, University of California, Merced, CA, USA

Overview

This dataset represents lifetime fitness data collected from cut-leaf monkeyflower populations (Erythranthe lanciniata) across 23 populations in 2008–2009 and 9 of those populations in 2020–2021. Fitness data was collected using fruit count and flower count as a proxy for fitness which was later simplified to fruit count. Population and garden for the experimental common garden experiment are also provided. We provide the R Markdown code for the analysis of these data using a generalized linear model, multiple regression analysis, and estimated marginal means. We also provide code for the climate analysis conducted to determine the 2020–2021 growing season as a drought year.

Files and variables

** Materials are divided into "Data Files" and "Code Files"

DATA

monkeys_V2.csv
- Raw data collected from the 2008-2009 field study containing 23 populations' details and lifetime fitness measurements.
- Variables:
  - sample: numerical variable; denoting the sample number to assign a unique identifyer for each row of data
  - garden: categorical variable; denoting the commons garden block: the low- elevation (HWY), central- (GB), and high-elevation (HG) gardens
  - block: grouping variable; accounts for random effects with observations, in this case garden
  - pop: categorical code; seed source population (23 total)
  - total_flowers : numerical variable; total flowers per individual plant used as fitness measure (count)
  - fruit_mass_mg : numerical variable; mean fruit mass per individual used as supplemental fitness measure (milligrams)
  - clim.dist.low, clim.dist.center, clim.dist.high: numerical variable; PCA projections of 2008–2009 climate data with 30-year climate averages
  - Dist.low, Dist.center, Dist.high: numerical variable; pairwise linear geographic distance from each population to the low, central, and high garden
  - Gen.dist.low, Gen.dist.center, Gen.dist.high: numerical variables; genetic distance calculated from multilocus data
  - surv: numerical variable; binary data representing if plant survived to fruiting or not
  - elev: numerical variable; population elevation in meters
dfdata_V2.csv
- Raw data collected from the 2020-2021 field study containing 9 populations' details and lifetime fitness measurements from seeds collected in 2014.
- Variables:
  - sample: numerical variable; denoting the sample number to assign a unique identifyer for each row of data
  - tray: categorical variable; denoting the physical tray plants were grown in
  - garden: categorical variable; denoting the commons garden block: the low- elevation (HWY) , central- (TBD), and high-elevation (TL) gardens
  - garden_e: categorical-nurmerical variable; garden paired with population elevation in meters
  - pop : categorical code; seed source population (9 total)
  - pop_e: numberical variable; popultion elevation in meters
  - generation: plant generation code to track maternal lines
  - flower_count : numerical variable; total flowers per individual plant used as fitness measure
  - final_fruit : numerical variable; fruit count per insividual plant uses as a secondary fitness measure
  - surv: numerical variable; binary data representing if plant survived to fruiting or not
1) mf_lsm_low_30may.csv, mf_lsm_central_30may.csv, mf_lsm_high_30may.csv
```
2) mf_lsm_low_fc_may30.csv, mf_lsm_central_fc_may30.csv, mf_lsm_high_fc_may30.csv
```
1. Metadata calculated from the emmeans of lifetime fitness flower count data.
2. Metadata calculated from the emmeans of survival data.
- Variables:
  - pop : categorical code; seed source population (9 total)
  - elev: numerical variable; population elevation in meters
  - group: categorical code; population edge position, low-edge, central, or high-edge
  - garden: categorical variable; denoting the commons garden block: the low- elevation (HWY), central- (GB), and high-elevation (HG) gardens
  - lsmean: numerical value; calculated lsmeans from emmeans
  - SE: numerical value; standard error from the regression, using the estimated marginal means (EMMs)
  - df: numerical value; degrees of freedom associated with the model estimate
  - asymp.LCL: numerical value; the asymptotic lower confidence limit for the model estimate
  - asymp.UCL: numerical value; the asymptotic upper confidence limit for the model estimate
  - clim.dist.low, clim.dist.center, clim.dist.high: numerical variable; PCA projections of 2008–2009 climate data with 30-year climate averages
  - Dist.low, Dist.center, Dist.high: numerical variable; pairwise linear geographic distance from each population to the low, central, and high garden
  - Gen.dist.low, Gen.dist.center, Gen.dist.high: numerical variables; genetic distance calculated from multilocus data
growing_season_climate_with_normals.csv
- Climate data calculated from Daymet and used to climate estimate distances between populations and from population to each garden.
- Variables:
  - site: numerical variable; seed source population
  - elev: numerical variable; population elevation in meters
  - growing_season: categorical variable; denotes the growing year
  - mean_tmin: nurmerical variable; Daymet annual mean minimum temperature in Celcius (C)
  - mean_tmax: numerical variable; Daymet annual mean maximum temperature in Celcius (C)
  - total_precip: numberical variable; Daymet annual total precipitation in millimeters (mm)
  - mean_25yr_tmin: numerical variable; average minimum temperature between 1980-2005 at the climate sites in Celcius (C)
  - mean_25yr_tmax: numberical variable; average maximum temperature between 1980-2005 at the climate sites in Celcius (C)
  - mean_25yr_precip: numberical variable; average precipiation between 1980-2005 at the climate sites in millimeters (mm)
  - tmin_anomaly: numberical variable; the degree of variation from the average minimum temperature between 1980-2005.
  - tmax_anomaly: numberical variable; the degree of variation from the average maximum temperature between 1980-2005.
  - precip_anomaly: numberical variable; the degree of variation from the average total precipitation between 1980-2005.
climate_distances_to_gardens_pc12.csv
- Climate distance values calculated between each population and the common garden. Euclidean distances were calculated in PC1–PC2 space between each garden and the 25-year climate position of each population.
- Variables:
  - site : categorical variable; seed source population
  - garden: categorical variable; denotes common garden block
  - dist_to_garden: numerical variable; distance values projected into the PCA space using the 2008–2009 growing season values

CODE

NASA_Daymet_Analysis_2008–2021_V5.Rmd
- Purpose: Extracts daily climate data from NASA Daymet for 23 study populations and summarizes it over multiple years (1980–2005).
- Key packages: daymetr, dplyr, lubridate, tibble, purrr, ggplot2, tidyr
- Chunks:
  1. Environment setup
  2. Prepare list of 23 population sites
  3. Define the growing seasons from October to September
  4. Growing season data from Daymet by site and year
  5. Apply data extraction function to all sites
  6. Download the Daymet data for each site and specific date range
  7. Create a bar plot to visualize climate anomolies (Figure S1)
  8. PCA-Based Climate Euclidean distance from garden to all others
  9. Calculate climate distances
  10. Visualize PCA plots (Figure S2)
  11. Calculate PCA correlations scores
  12. Generate PC loading table (Table S2)
- Outputs:
  - CSV of annual means for each variable
  - Principal Component Analysis (PCA) of climate variables
  - Figures of climate trends (Figure S1) and PCA results (Figure S2)
  - PC loading table (Table S2)
shay_pennington_leading_edge_v8.Rmd
- Purpose: Conducts statistical analyses on lifetime fitness using garden transplant experiments.
- Key analyses:
  - Generalized linear mixed models (GLMMs) with zero-inflated negative binomial distributions using glmmTMB
  - Estimated marginal means (EMMs) and pairwise contrasts using emmeans
  - Multiple regressions testing the effects of genetic, geographic, and climate distances on survival and flower production
  - Creation of figures showing relationships between distances and fitness metrics
- Key packages: tidyverse, ggplot2, multcomp, Rmisc, doBy, car, emmeans, glmmTMB, DHARMa, ggeffects, effects, dbplyr
- Chunks:
  1. Environment setup
  2. Dataset formatting for 2009 data
  3. Survival percentage calculations for 2009 data
  4. Zero-inflated regression using glmmTMB for lifetime fitness for 2009 data (Table 1)
  5. Checking for total flowers (lifetime fitness) outliers for 2009 data
  6. Modeled predicted means for garden fitness using the glmmTMB function in the glmmTMB package, version 1.1.10 for 2009 data (Figure 2B)
  7. Survival logisitic regression for 2009 data (Table 1; Figure 3A)
  8. Survival local vs foreign analysis for 2009 data using emmeans package, version 1.11.10 (Table 2)
  9. Survival home vs away context analysis for 2009 data (Table 2)
  10. Survival between-climate edges analysis for 2009 data (Table 2)
  11. Lifetime fitness local vs foreign analysis for 2009 data (Table 2)
  12. Lifetime fitness home vs away analysis for 2009 data (Table 2)
  13. Lifetime fitness between-climate edges analysis for 2009 data (Table 2)
  14. Full tukey post hoc contrasts for 2009 data
  15. Dataset formatting for 2021 data
  16. Survival logistic regression for 2021 data (Table 1; Figure 3B)
  17. Lifetime fitness zero inflated regression for 2021 data (Table 1; Figure 2B)
  18. Garden survival graphs for 2009 (Figure 2A)
  19. Garden survival graphs for 2021 (Figure 2A)
  20. Survival multiple regression analysis with distance data for 2009
  21. Survival multiple regression for 2009 low garden (Table S3)
  22. Survival multiple regression for 2009 central garden (Table S3)
  23. Survival multiple regression for 2009 high garden (Table S3)
  24. Survival probability plot for 2009 high garden (Figure S5)
  25. Survival probability plot for 2009 low garden (Figure S6)
  26. Lifetime fitness multiple regression analysis with distance data for 2009
  27. Lifetime fitness multiple regression for 2009 low garden (Table S3)
  28. Lifetime fitness multiple regression for 2009 central garden (Table S3)
  29. Lifetime fitness multiple regression for 2009 high garden (Table S3)
  30. Significant lifetime fitness partial regression plots (Figure 4)
  31. Reaction norm plots for lifetime fitness (Figure S4A) and survival (Figure S4B)

Outputs:
- Regression plots
- Model summaries
- Graphs and figures

Recommended Software

R version 3.6.3 2020 or higher (The R Foundation for Statistical Computing) – used for all analyses in this study

Access information

Climate data was derived from the following sources:

Daymet: Annual Climate Summaries on a 1-km Grid for North America, Version 4 R1 –– https://daac.ornl.gov/DAYMET/guides/Daymet_Annual_V4R1.html