Skip to main content

Data for: Combining range and phenology shifts offers a winning strategy for boreal Lepidoptera

Cite this dataset

Hällfors, Maria et al. (2021). Data for: Combining range and phenology shifts offers a winning strategy for boreal Lepidoptera [Dataset]. Dryad.


Species can adapt to climate change by adjusting in situ or by dispersing to new areas, and these strategies may complement or enhance each other. Here, we investigate temporal shifts in phenology and spatial shifts in northern range boundaries for 289 Lepidoptera species by using long-term data sampled over two decades. While 40% of the species neither advanced phenology nor moved northward, nearly half (45%) used one of the two strategies. The strongest positive population trends were observed for the minority of species (15%) that both advanced flight phenology and shifted their northern range boundaries northward. We show that, for boreal Lepidoptera, a combination of phenology and range shifts is the most viable strategy under a changing climate. Effectively, this may divide species into winners and losers based on their propensity to capitalize on this combination, with potentially large consequences on future community composition.


1. Description of defining the systematically collected data for the phenological analyses

To analyze phenological change in adult flight periods of Lepidoptera, we use systematically collected data from butterfly transects and moth traps across 19 and 25 years, respectively (1999-2017 and 1993-2017). For data on butterflies, we combined two similar surveys of butterflies in agricultural landscapes conducted in Finland during 1999-2017: 1) volunteer transects with varying length and sampling frequency and 2) professional transects with standardized length and sampling frequency.

1) In 1999, a butterfly monitoring network based on volunteer transects was initiated by the Finnish Environment Institute (SYKE). During 1999-2017 the network included 101 sites, with on average 47 sites (ranging from 30-59) recorded annually. At each site, the walking route (transect) is kept constant from year to year, and walked repeatedly during the summer. Along each transect, the number of individuals for each encountered species are recorded from a 5x5x5 m3 cube ahead of the observer (Pollard & Yates 1994). The transects are monitored by volunteer butterfly enthusiasts with high expertise in species recognition. The volunteers are asked to conduct a minimum of seven annual visits per transect, approximately once a fortnight from late May to late August. Weekly counts are recommended, and usually carried out on nearly half of the transects. Due to the northern location of Finland, the butterfly season is typically no longer than 16 weeks, and less than ten weeks in the northernmost transects.

2) During 2001-2014 a total of 68 professional transects of standardized 1 km length and sampling frequency (7 counts per summer) were conducted in southern Finland and the Åland islands (Kuussaari et al. 2007).

Together, the two datasets on butterflies contained 394 767 abundance observations on 93 species at 168 separate locations during 1999-2017. The average annual number of species observed within each transect was 34, ranging from a minimum of 5 to a maximum of 58. The median transect length of the combined data is 1.95 kilometers (mean= 2.41 km) with the shortest one being 530 meters and the longest one 5.12 km. 

Data on moths have been collected in the Finnish national moth monitoring scheme (Nocturna) during 1993-2017 (Leinonen et al. 2016, 2017). Moths are collected using “Jalas” light traps (Jalas 1960) that are run by volunteers across Finland each night across the main active season for moths. The trapping period usually spans from April to October but depends on the latitudinal location and annual weather conditions – trapping usually starts when the snow melts and ends by night frosts. The length of the sampling period per year is therefore more or less constant for each site but varies between sites depending on the length of the season, i.e. it is shorter in the north than in the south, but it covers the entire activity period of moths across the country. Traps are usually emptied once a week upon which moth species and the number of individuals per species are recorded by the volunteers. Coordination and quality control is carried out by experts at the Finnish Environment Institute (Pöyry et al. 2011; Leinonen et al. 2016; 2017). The dataset contained 687 819 observations on 958 taxa identified at the genus or species level and collected at 109 separate locations from 1993-2017.

1.1 Defining data for phenological analyses

The available raw data on adult flight periods for moths and butterflies thus amounted to 1 082 586 abundance observations from 277 sites on 1051 taxa at the genus or species level. We utilized data only at the species level. All the observation dates were converted into Julian dates (day of year; DOY), to represent the phenology of adult flight. For the butterflies, these dates are thus more exact, as the individuals were observed on the specific day recorded. For the moths, however, we use as the observation date the first day of the trapping period although the individual could have ended up in a trap on any day between the set-up of the trap and the day of emptying. In most cases, the moth traps are emptied once a week (66%), but in 9.4% of the data, the interval was longer than ten days (Fig S3 in the supplementary materials of Hällfors et al. 2021). To ensure enough phenological resolution, we removed all sampling events in the moth data with an emptying interval longer than ten days. This implied removing 43 016 species abundance observations from the moth data.

To ensure a comparable span of observations between site-year combinations, we excluded short sampling seasons. This meant excluding site-years where the first trap or transect had not been conducted by DOY 120 (moths) or 150 (butterflies), the last transect was conducted before DOY 225, and the number of sampling events were less than 8 for the moths and 6 for the butterflies. This resulted in excluding 118 821 and 50 335 abundance observations from the moth and butterfly data, respectively. Further, to ensure sufficient data on each analyzed species per year, we included only species with observations from at least ten years during the 19 (butterflies) and 25 (moths) year study periods and with at least 30 observations from each of the included years. From the butterfly data, we additionally omitted the species Leptidea sinapis and L. juvernica because they are very difficult to differentiate in the field.

Using all of the above filters, the data were reduced to 406 217 observations of 244 moth species and 333 625 observations of 45 butterfly species. As a consequence of conservative inclusion, our dataset does not contain rare species. For the remaining 289 species, we obtained distributional data (Section 1.2) and calculated their Northern Range Boundaries (NRBs) for the time periods 1992-1996 and 2013-2017. To exclude an effect of potentially differing phenological timing in recently colonized areas, we additionally excluded records of species from sites located further north of the estimated NRB for 1992-1996. This last step in delimitating the phenological data resulted in excluding another 13 662 observations from the moth, and 6910 observations of the butterfly data. Thus we ended up conducting phenological analyzes on 725 106 abundance observations from 251 sites on 289 species (Data_Phenology_Butterflies_Moths.csv). See main text Fig. 2a in Hällfors et al (2021) and data files (Data_Moth_phenology_unique_sites. csv and Data_Butterfly_Phenology_unique_sites.csv) for the spatial distribution of included sites.

1.2 Defining data for population trend estimation

Long-term trends of butterfly populations were estimated from 101 sites monitored by skilled volunteers from 1999 to 2017 (dataset 1 for butterflies, described in section 1). Because the number of adult butterflies observed each week is characterized by species’ phenology, abundance indices derived from series of counts recorded over the season must account and correct for missing observations. We did this by completing the annual time-series recorded at each site with the expected values for each missing count. These values were predicted from a generalized linear model fitted on the number of individuals observed and that accounts for the shape of the abundance distribution observed for each species and each year within the focal region (Dennis et al. 2013). The shape of the abundance distribution, namely the flight curve, was derived from a spline fitted on the weekly counts observed across sites monitored within the same bioclimatic region (Schmucki et al. 2016). These regional annual flight curves were computed from sites that have been visited at least three times in a given season and where the species was recorded at least twice. We then used the annual flight curves to predict the weekly abundance counts expected for each species, at each site and across each year to complete each time-series, imputing expected values where counts were missing. The annual flight curve was also used to estimate the proportions of the flight period that was sampled at each site. These proportions were used as a weighting factor when estimating each species collated annual abundance indices. The long-term trends were computed from these annual abundance indices collated across 101 sites. For the 45 butterfly species analyzed, trend estimates were derived from a 94 402 abundance observations (951 264 observed individuals) recorded over 11 224 visits conducted between 1999 and 2017 (Data_FIBMS_visit.csv; Data_FIBMS_count.csv;  Data_FIBMS_transect.csv; Data_FIBMS_pheno.csv).

Long-term population trends on moths were estimated from a subset of the moth monitoring data, with less temporal turnover in traps. Only data from traps with at least six years of operation were included, complemented with three traps with a shorter record to support sparser areas of the network, amounting to 70 traps in total. There have been substantial changes in the trap network since 2016, which could affect the reliability of the population trend estimates. Thus, data from 2017 was not used in trend calculations. Overall, the dataset used for trend estimation included 261 568 abundance observations (4 516 979 observed individuals) over 1993-2016 (Data_Trends_Moths.txt).

2. Observational data for shift in the northern range boundary (NRB)

To analyze shift in the NRB, we used observational data on the 289 species that were selected for the study through defining data for the phenological analyses (Section 1). Observational data available through the Insect database and National Butterfly Monitoring Scheme (NAFI; Saarinen et al. 2003) were sourced from the Finnish Biodiversity Information Facility (FinBIF) in December 2019. The Insect database and NAFI are mainly based on observations made by voluntary amateur and professional lepidopterists. These data are independent from the data used in the phenological analyses for the moths and partly so for the butterflies (some of the observations from the volunteer transects are also reported in NAFI). These data were used to estimate shifts in NRB as they cover a larger range of both latitudes and habitats within Finland compared to the systematically collected transect and trap data (Fig. 2 in Hällfors et al. 2021), wherefore their use in estimating shifts in the NRB likely better reflect actual patterns in distributional shifts than the rather sparsely and southwards constrained systematically collected data. The data sourcing from FinBIF was conducted on the superfamily or species level and the following batches were downloaded:, 38387 , 38386 , 38404.  2 474 498 observations on the 289 study species were thereafter separated within the R environment to constitute the raw data for the NRB analyses (Data_Distribution_Raw_Butterflies.csv and Data_Distribution_Raw_Moths.csv).

To measure shifts in NRB over time, we followed the methods presented by Thomas and Lennon (1999) where the average of the ten northernmost latitudes are used to define the NRB during a specific period in time. We defined the first period to range from 1992-1996 (hereafter T1) as this period covers the start of intensified climate warming in Finland (Mikkonen et al. 2014) and allows a comparison with a previous study on butterfly range shifts by Pöyry et al. (2009). This period also covers a sufficient number of observations compared to the number of observations collected prior to the end of the 1980’s (Fig. S2 in the Supplementary Materials of Hällfors et al. 2021), thus enabling a robust description of species distribution areas compared to preceding periods. We defined the second five-year period as 2013-2017 (hereafter called T2). The temporal span of the distributional data in relation to the systematic data used to estimate phenological shift is depicted in Fig. S1 in the Supplementary Materials of Hällfors et al. (2021).

The total number of presence cells for T2 was substantially higher than for T1 due to overall increased sampling effort over time (46 602 vs. 147 548 for the butterflies; 135 442 vs. 566 072 for the moths; Fig S2 in the Supplementary Materials of Hällfors et al. 2021). To avoid effects caused by differences in overall observation intensity, we randomly subsampled (without replacement) the observations in T2 across species to equal the number of observations in T1. This was done separately for butterflies and moths, as the observation patterns for the species groups are likely to differ as partly different groups of people observe butterflies and moths, and as their observation is often based on different methods (Script 2a Subsampling_Distributiondata_Moths_Butterflies.R). The resulting data (Data_Distribution_subsampled_Butterflies.csv and Data_Distribution_Subsampled_Moths.csv) were used to measure the change in prevalence and shift of NRB between T1 and T2 (see Analyses).

3. Analyses

3.1 Shift in phenology

We analyzed phenological shift with the following linear mixed effects model:

Yi=a+Xib+ Zu+ εiwi  

Where Y is the day of year that species i was observed, a is the intercept (average day of year for observing species i), X is the year when the observation occurred (continuous, centered variable), b is the effect of Year on day of year(Y), Z is a design matrix for the site-level random effects u, and ɛ are the residuals weighted by the abundance of species i on the observation day. The model was fitted separately for each species, and resulted in a slope coefficient describing the mean annual shift in adult flight timing over the study period (Script 1 Phenology_lmer_Moths_Butterflies.R)

3.2 Shift in NRB

To estimate shifts in NRB, we calculated the average latitude of the ten northernmost grid cells and the prevalence (proportion of occupied 10x10 km cells for each species out of all cells with observations of any of the species) in both in T1 and T2 (following Thomas & Lennon 1999). We then subtracted the NRB in T2 from that in T1, and similarly the change in prevalence by subtracting the prevalence in T2 from that in T1 (Script 2b NRBshift_PrevalenceChange_calculation.R). The statistical significance of an average shift in NRB across all species can be estimated by modeling the change in kilometers between the periods as a function of change in prevalence. This approach was first presented by Thomas and Lennon (1999) and has been used in similar analyses (e.g. Brommer 2004; Pöyry et al. 2009; Mason et al. 2015). If the intercept is positive and significantly different from zero, the inference is that the species group has, overall, shifted their NRBs more towards the north than expected purely from their change in prevalence. We used this approach to obtain a linear effect estimate of shift in km as a function of change in prevalence (Fig. S4a in the Supplementary Materials of Hällfors et al. (2021)). To obtain a corrected measure of NRB shifts per species, where the effect of prevalence change is reduced, we extracted the residuals of the model. This estimate thus describes the residual shift in NRB that is not explained by the linear effect of prevalence across the studied species. These residuals correlate strongly with the raw shift in kilometers (Fig. S4b in the Supplementary Materials of Hällfors et al. 2021), but are a more conservative metric as the linear effect of prevalence change across species has been removed.

3.3 Population trends

We calculated the population trends from their collated annual abundance indices. This was done separately for butterflies and moths.

For the butterflies, at each site, annual indices were computed from weekly counts, following the method described in Dennis et al. (2013, 2016) and implemented in the rbms R package (Schmucki et al. 2020). Missing week counts were derived from a Poisson generalized linear model (GLM) that included the regional flight curve as an offset (Schmucki et al. 2016). Collated annual abundance indices were then estimated with a weighted Poisson regression, accounting for site, transect length, and using the proportion of the flight curve monitored as weight. Thereby, sites with many missing counts during the flight period had lower weight than well monitored sites. For each species, we calculated the long-term trends with a linear model that we fitted on the log10 transformed collated annual indices, starting with the year the species was first recorded until the last within the 1999-2017 period (Script 3a Butterfly_population_trends_FIBMS.R).

For the moth species, population trends were estimated using the TRIM software (Pannekoek & Van Strien 2005), as implemented in the rtrim R package (Boogart et al. 2020). TRIM uses Poisson regression to estimate annual abundance indices, while accounting for missing observations, site differences, overdispersion and temporal autocorrelation. As a long-term trend estimate, TRIM calculates a regression through the annual indices, and this linear trend slope (on the log scale; the “additive” slope in TRIM) was used as a measure of population trend for the moth species over 1993-2016 (Script 3b Moth_population_trends_rtrim.R). Four species appeared in the dataset after 1993, with the first year of occurrence marking the start of the timeframe for trend calculation.

3.4 Conversion of species responses into categories

The slope coefficients from the phenology models and the residuals of NRB shift formed our main results and are hereafter jointly referred to as the responses (as in climate change response). To enable a comparison of directionality in responses, we converted the continuous results into categories (Table 1 in Hällfors et al. 2021). For the sake of visualization in Fig. 3 of Hällfors et al. (2021), and in order to calculate the percentage of species with population trends in different directions, population trends were also assigned into similar categories: a significantly positive trend was assigned into the “Positive” group; an insignificant trend with any sign into the “Stable” group; and a significantly negative trend into the “Negative” group.

See “Data_Results.csv” for a complete table of NRB shifts, prevalence change, phenology estimates, shifts as categories, and species traits. All data management and analyses were conducted in R studio (R version 3.5.3; R Core Team 2019 and 4.0.5; R Core Team 2021).


Jane and Aatos Erkko Foundation

Academy of Finland, Award: 330739