Data from: The leading edge matters too: fitness and the expression of adaptive differentiation are greatest at the high-elevation edge of a species range
Data files
Oct 27, 2025 version files 711.53 KB
-
climate_distances_to_gardens_pc12.csv
2.43 KB
-
dfdata_V2.csv
19.21 KB
-
growing_season_climate_with_normals.csv
7.40 KB
-
mf_lsm_central_30may.csv
3.91 KB
-
mf_lsm_central_fc_may30.csv
3.87 KB
-
mf_lsm_high_30may.csv
3.81 KB
-
mf_lsm_high_fc_may30.csv
3.78 KB
-
mf_lsm_low_30may.csv
3.72 KB
-
mf_lsm_low_fc_may30.csv
3.67 KB
-
monkeys_V2.csv
571.35 KB
-
multreg_figures.Rmd
13.86 KB
-
NASA_Daymet_Analysis_2008–2021_V4.Rmd
13.97 KB
-
README.md
13.13 KB
-
shay_pennington_leading_edge_v4.Rmd
47.43 KB
Nov 19, 2025 version files 707.79 KB
-
climate_distances_to_gardens_pc12.csv
2.43 KB
-
dfdata_V2.csv
21.36 KB
-
growing_season_climate_with_normals.csv
7.40 KB
-
mf_lsm_central_30may.csv
3.91 KB
-
mf_lsm_central_fc_may30.csv
3.87 KB
-
mf_lsm_high_30may.csv
3.81 KB
-
mf_lsm_high_fc_may30.csv
3.78 KB
-
mf_lsm_low_30may.csv
3.72 KB
-
mf_lsm_low_fc_may30.csv
3.67 KB
-
monkeys_V2.csv
571.35 KB
-
multreg_figures_v2.Rmd
13.86 KB
-
NASA_Daymet_Analysis_2008–2021_V4.Rmd
13.97 KB
-
README.md
13.07 KB
-
shay_pennington_leading_edge_v5.Rmd
41.60 KB
Dec 18, 2025 version files 692.47 KB
-
climate_distances_to_gardens_pc12.csv
2.43 KB
-
dfdata_V2.csv
21.36 KB
-
growing_season_climate_with_normals.csv
7.40 KB
-
mf_lsm_central_30may.csv
3.91 KB
-
mf_lsm_central_fc_may30.csv
3.87 KB
-
mf_lsm_high_30may.csv
3.81 KB
-
mf_lsm_high_fc_may30.csv
3.78 KB
-
mf_lsm_low_30may.csv
3.72 KB
-
mf_lsm_low_fc_may30.csv
3.67 KB
-
monkeys_V2.csv
571.35 KB
-
NASA_Daymet_Analysis_2008–2021_V5.Rmd
13.73 KB
-
README.md
12.85 KB
-
shay_pennington_leading_edge_v6.Rmd
40.59 KB
Dec 19, 2025 version files 692.35 KB
-
climate_distances_to_gardens_pc12.csv
2.43 KB
-
dfdata_V2.csv
21.36 KB
-
growing_season_climate_with_normals.csv
7.40 KB
-
mf_lsm_central_30may.csv
3.91 KB
-
mf_lsm_central_fc_may30.csv
3.87 KB
-
mf_lsm_high_30may.csv
3.81 KB
-
mf_lsm_high_fc_may30.csv
3.78 KB
-
mf_lsm_low_30may.csv
3.72 KB
-
mf_lsm_low_fc_may30.csv
3.67 KB
-
monkeys_V2.csv
571.35 KB
-
NASA_Daymet_Analysis_2008–2021_V5.Rmd
13.73 KB
-
README.md
12.83 KB
-
shay_pennington_leading_edge_v7.Rmd
40.49 KB
Jan 19, 2026 version files 691.47 KB
-
climate_distances_to_gardens_pc12.csv
2.43 KB
-
dfdata_V2.csv
21.36 KB
-
growing_season_climate_with_normals.csv
7.40 KB
-
mf_lsm_central_30may.csv
3.91 KB
-
mf_lsm_central_fc_may30.csv
3.87 KB
-
mf_lsm_high_30may.csv
3.81 KB
-
mf_lsm_high_fc_may30.csv
3.78 KB
-
mf_lsm_low_30may.csv
3.72 KB
-
mf_lsm_low_fc_may30.csv
3.67 KB
-
monkeys_V2.csv
571.35 KB
-
NASA_Daymet_Analysis_2008–2021_V5.Rmd
13.73 KB
-
README.md
12.83 KB
-
shay_pennington_leading_edge_v8.Rmd
39.61 KB
Abstract
This dataset contains raw and processed data from two common garden experiments testing local adaptation and fitness variation across the elevational range of Erythranthe laciniata (Phrymaceae), an annual monkeyflower endemic to the Sierra Nevada, California, USA. Experiments were conducted in 2009 and 2021 at low-, mid-, and high-elevation field gardens spanning the species’ distribution. The dataset includes survival and lifetime fitness measurements (total flower number), population and maternal line identifiers, geographic coordinates, elevation, climate data (NASA Daymet), and associated metadata. These data support analyses presented in Shay & Pennington et al. (2025), The leading edge matters too: fitness and the expression of adaptive differentiation are greatest at the high-elevation edge of a species’ range (Ecology Letters).
https://doi.org/10.6071/M39T04
Description of the data and file structure
This README file was generated on 2025-12-18 by Jackie Shay
Data Description
Dataset Title
The leading edge matters too: fitness and the expression of adaptive differentiation are greatest at the high-elevation edge of a species range
Date of Data Collection
2008-2009 and 2020–2021
Contributors
- Jackie E. Shay, Department of Ecology, Evolution, and Marine Biology, University of California, Santa Barbara, CA, USA
- Lillie K Pennington, Department of Genetics, University of Georgia, Georgia, GA, USA
- Daniel J. Toews, Environmental Systems Graduate Group, University of California, Merced, CA, USA
- Elizabeth Green, Department of Life and Environmental Sciences, University of California, Merced, CA, USA
- Jason P. Sexton, Department of Life and Environmental Sciences, University of California, Merced, CA, USA
Overview
This dataset represents lifetime fitness data collected from cut-leaf monkeyflower populations (Erythranthe lanciniata) across 23 populations in 2008–2009 and 9 of those populations in 2020–2021. Fitness data was collected using fruit count and flower count as a proxy for fitness which was later simplified to fruit count. Population and garden for the experimental common garden experiment are also provided. We provide the R Markdown code for the analysis of these data using a generalized linear model, multiple regression analysis, and estimated marginal means. We also provide code for the climate analysis conducted to determine the 2020–2021 growing season as a drought year.
Files and variables
** Materials are divided into "Data Files" and "Code Files"
DATA
-
monkeys_V2.csv- Raw data collected from the 2008-2009 field study containing 23 populations' details and lifetime fitness measurements.
- Variables:
sample: numerical variable; denoting the sample number to assign a unique identifyer for each row of datagarden: categorical variable; denoting the commons garden block: the low- elevation (HWY), central- (GB), and high-elevation (HG) gardensblock: grouping variable; accounts for random effects with observations, in this case gardenpop: categorical code; seed source population (23 total)total_flowers: numerical variable; total flowers per individual plant used as fitness measure (count)fruit_mass_mg: numerical variable; mean fruit mass per individual used as supplemental fitness measure (milligrams)clim.dist.low,clim.dist.center,clim.dist.high: numerical variable; PCA projections of 2008–2009 climate data with 30-year climate averagesDist.low,Dist.center,Dist.high: numerical variable; pairwise linear geographic distance from each population to the low, central, and high gardenGen.dist.low,Gen.dist.center,Gen.dist.high: numerical variables; genetic distance calculated from multilocus datasurv: numerical variable; binary data representing if plant survived to fruiting or notelev: numerical variable; population elevation in meters
-
dfdata_V2.csv- Raw data collected from the 2020-2021 field study containing 9 populations' details and lifetime fitness measurements from seeds collected in 2014.
- Variables:
sample: numerical variable; denoting the sample number to assign a unique identifyer for each row of datatray: categorical variable; denoting the physical tray plants were grown ingarden: categorical variable; denoting the commons garden block: the low- elevation (HWY) , central- (TBD), and high-elevation (TL) gardensgarden_e: categorical-nurmerical variable; garden paired with population elevation in meterspop: categorical code; seed source population (9 total)pop_e: numberical variable; popultion elevation in metersgeneration: plant generation code to track maternal linesflower_count: numerical variable; total flowers per individual plant used as fitness measurefinal_fruit: numerical variable; fruit count per insividual plant uses as a secondary fitness measuresurv: numerical variable; binary data representing if plant survived to fruiting or not
-
1) mf_lsm_low_30may.csv,mf_lsm_central_30may.csv,mf_lsm_high_30may.csv2) mf_lsm_low_fc_may30.csv, mf_lsm_central_fc_may30.csv, mf_lsm_high_fc_may30.csv- Metadata calculated from the emmeans of lifetime fitness flower count data.
- Metadata calculated from the emmeans of survival data.
- Variables:
pop: categorical code; seed source population (9 total)elev: numerical variable; population elevation in metersgroup: categorical code; population edge position, low-edge, central, or high-edgegarden: categorical variable; denoting the commons garden block: the low- elevation (HWY), central- (GB), and high-elevation (HG) gardenslsmean: numerical value; calculated lsmeans from emmeansSE: numerical value; standard error from the regression, using the estimated marginal means (EMMs)df: numerical value; degrees of freedom associated with the model estimateasymp.LCL: numerical value; the asymptotic lower confidence limit for the model estimateasymp.UCL: numerical value; the asymptotic upper confidence limit for the model estimateclim.dist.low,clim.dist.center,clim.dist.high: numerical variable; PCA projections of 2008–2009 climate data with 30-year climate averagesDist.low,Dist.center,Dist.high: numerical variable; pairwise linear geographic distance from each population to the low, central, and high gardenGen.dist.low,Gen.dist.center,Gen.dist.high: numerical variables; genetic distance calculated from multilocus data
-
growing_season_climate_with_normals.csv- Climate data calculated from Daymet and used to climate estimate distances between populations and from population to each garden.
- Variables:
site: numerical variable; seed source populationelev: numerical variable; population elevation in metersgrowing_season: categorical variable; denotes the growing yearmean_tmin: nurmerical variable; Daymet annual mean minimum temperature in Celcius (C)mean_tmax: numerical variable; Daymet annual mean maximum temperature in Celcius (C)total_precip: numberical variable; Daymet annual total precipitation in millimeters (mm)mean_25yr_tmin: numerical variable; average minimum temperature between 1980-2005 at the climate sites in Celcius (C)mean_25yr_tmax: numberical variable; average maximum temperature between 1980-2005 at the climate sites in Celcius (C)mean_25yr_precip: numberical variable; average precipiation between 1980-2005 at the climate sites in millimeters (mm)tmin_anomaly: numberical variable; the degree of variation from the average minimum temperature between 1980-2005.tmax_anomaly: numberical variable; the degree of variation from the average maximum temperature between 1980-2005.precip_anomaly: numberical variable; the degree of variation from the average total precipitation between 1980-2005.
-
climate_distances_to_gardens_pc12.csv- Climate distance values calculated between each population and the common garden. Euclidean distances were calculated in PC1–PC2 space between each garden and the 25-year climate position of each population.
- Variables:
site: categorical variable; seed source populationgarden: categorical variable; denotes common garden blockdist_to_garden: numerical variable; distance values projected into the PCA space using the 2008–2009 growing season values
CODE
- NASA_Daymet_Analysis_2008–2021_V5.Rmd
- Purpose: Extracts daily climate data from NASA Daymet for 23 study populations and summarizes it over multiple years (1980–2005).
- Key packages:
daymetr,dplyr,lubridate,tibble,purrr,ggplot2,tidyr - Chunks:
- Environment setup
- Prepare list of 23 population sites
- Define the growing seasons from October to September
- Growing season data from Daymet by site and year
- Apply data extraction function to all sites
- Download the Daymet data for each site and specific date range
- Create a bar plot to visualize climate anomolies (Figure S1)
- PCA-Based Climate Euclidean distance from garden to all others
- Calculate climate distances
- Visualize PCA plots (Figure S2)
- Calculate PCA correlations scores
- Generate PC loading table (Table S2)
- Outputs:
- CSV of annual means for each variable
- Principal Component Analysis (PCA) of climate variables
- Figures of climate trends (Figure S1) and PCA results (Figure S2)
- PC loading table (Table S2)
- shay_pennington_leading_edge_v8.Rmd
- Purpose: Conducts statistical analyses on lifetime fitness using garden transplant experiments.
- Key analyses:
- Generalized linear mixed models (GLMMs) with zero-inflated negative binomial distributions using
glmmTMB - Estimated marginal means (EMMs) and pairwise contrasts using
emmeans - Multiple regressions testing the effects of genetic, geographic, and climate distances on survival and flower production
- Creation of figures showing relationships between distances and fitness metrics
- Generalized linear mixed models (GLMMs) with zero-inflated negative binomial distributions using
- Key packages:
tidyverse,ggplot2,multcomp,Rmisc,doBy,car,emmeans,glmmTMB,DHARMa,ggeffects,effects, dbplyr - Chunks:
- Environment setup
- Dataset formatting for 2009 data
- Survival percentage calculations for 2009 data
- Zero-inflated regression using glmmTMB for lifetime fitness for 2009 data (Table 1)
- Checking for total flowers (lifetime fitness) outliers for 2009 data
- Modeled predicted means for garden fitness using the glmmTMB function in the glmmTMB package, version 1.1.10 for 2009 data (Figure 2B)
- Survival logisitic regression for 2009 data (Table 1; Figure 3A)
- Survival local vs foreign analysis for 2009 data using emmeans package, version 1.11.10 (Table 2)
- Survival home vs away context analysis for 2009 data (Table 2)
- Survival between-climate edges analysis for 2009 data (Table 2)
- Lifetime fitness local vs foreign analysis for 2009 data (Table 2)
- Lifetime fitness home vs away analysis for 2009 data (Table 2)
- Lifetime fitness between-climate edges analysis for 2009 data (Table 2)
- Full tukey post hoc contrasts for 2009 data
- Dataset formatting for 2021 data
- Survival logistic regression for 2021 data (Table 1; Figure 3B)
- Lifetime fitness zero inflated regression for 2021 data (Table 1; Figure 2B)
- Garden survival graphs for 2009 (Figure 2A)
- Garden survival graphs for 2021 (Figure 2A)
- Survival multiple regression analysis with distance data for 2009
- Survival multiple regression for 2009 low garden (Table S3)
- Survival multiple regression for 2009 central garden (Table S3)
- Survival multiple regression for 2009 high garden (Table S3)
- Survival probability plot for 2009 high garden (Figure S5)
- Survival probability plot for 2009 low garden (Figure S6)
- Lifetime fitness multiple regression analysis with distance data for 2009
- Lifetime fitness multiple regression for 2009 low garden (Table S3)
- Lifetime fitness multiple regression for 2009 central garden (Table S3)
- Lifetime fitness multiple regression for 2009 high garden (Table S3)
- Significant lifetime fitness partial regression plots (Figure 4)
- Reaction norm plots for lifetime fitness (Figure S4A) and survival (Figure S4B)
- Outputs:
- Regression plots
- Model summaries
- Graphs and figures
Recommended Software
- R version 3.6.3 2020 or higher (The R Foundation for Statistical Computing) – used for all analyses in this study
Access information
Climate data was derived from the following sources:
- Daymet: Annual Climate Summaries on a 1-km Grid for North America, Version 4 R1 –– https://daac.ornl.gov/DAYMET/guides/Daymet_Annual_V4R1.html
Seeds were collected from 23 populations across the elevational range of E. laciniata using stratified random sampling of ≥60 maternal plants per site. To minimize maternal effects, a refresher generation was grown under controlled greenhouse conditions before planting into field common gardens. In 2009, three experimental gardens were established at low (1000 m), mid (1670 m), and high (3095 m) elevations in Fresno County, CA, with ~100–60 replicate individuals per population per garden. In 2021, three gardens were again established (1000 m, 1555 m, and 2500 m), with nine focal populations represented by 15 maternal lines and three replicates each. Blocks were randomized, overwintered naturally, and censused through flowering. Survival was scored as a binary outcome (flowered or not) and lifetime fitness was measured as the total number of flowers produced per plant. Climate data (mean daily minimum and maximum temperature, precipitation) were extracted from the NASA Daymet V4 dataset (1980–2005 normals, 2008–2009, and 2020–2021 growing seasons). Genetic and geographic distances were obtained from previously published data (Sexton et al. 2016). Data processing and analyses were conducted in R (v4.3.1), including generalized linear mixed models (glmmTMB), post-hoc contrasts (emmeans), and multiple regression of distance metrics. All analysis code is provided to allow full reproducibility.
Changes after Oct 27, 2025:
- `addlowmr_linear` is defined in multreg_figures_v2
- "tray" was added back to the dfdata_V2 file
Changes after Nov 19, 2025:
- Updated R Markdown for ease of reader runs, including organization and naming of script chunks.
- Script updated for misaligned code.
- Duplicate and unused script removed.
Changes after Dec 18, 2025:
- shay_pennington_leading_edge_v7.Rmd updated.
Changes after Dec 19, 2025:
- absolute paths removed to prevent breaking code in shay_pennington_leading_edge_v8.Rmd
