Skip to main content
Dryad logo

Source code for R tutorials and dataset for empirical case study on Malurus elegans (red-winged fairy wren)


van de Pol, Martijn; Brouwer, Lyanne (2021), Source code for R tutorials and dataset for empirical case study on Malurus elegans (red-winged fairy wren), Dryad, Dataset,


Biological processes exhibit complex temporal dependencies due to the sequential nature of allocation decisions in organisms’ life-cycles, feedback loops, and two-way causality. Consequently, longitudinal data often contain cross-lags: the predictor variable depends on the response variable of the previous time-step. Although statisticians have warned that regression models that ignore such covariate endogeneity in time series are likely to be inappropriate, this has received relatively little attention in biology. Furthermore, the resulting degree of estimation bias remains largely unexplored.

We use a graphical model and numerical simulations to understand why and how regression models that ignore cross-lags can be biased, and how this bias depends on the length and number of time series. Ecological and evolutionary examples are provided to illustrate that cross-lags may be more common than is typically appreciated and that they occur in functionally different ways.

We show that routinely used regression models that ignore cross-lags are asymptotically unbiased. However, this offers little relief, as for most realistically feasible lengths of time series conventional methods are biased. Furthermore, collecting time series on multiple subjects–such as populations, groups or individuals—does not help to overcome this bias when the analysis focusses on within-subject patterns (often the pattern of interest). Simulations (R tutorial 1 & 2), a literature search and a real-world empirical example on fairy wrens (data archived here with analyses presented in R-tutorial 3) together suggest that approaches that ignore cross-lags are likely biased in the direction opposite to the sign of the cross-lag (e.g. towards detecting density-dependence of vital rates and against detecting life history trade-offs and benefits of group living). Next, we show that multivariate (e.g. structural equation) models can dynamically account for cross-lags, and simultaneously address additional bias induced by measurement error, but only if the analysis considers multiple time series.

We provide guidance on how to identify a cross-lag and subsequently specify it in a multivariate model, which can be far from trivial. Our tutorials with data and R code of the worked examples provide step‐by‐step instructions on how to perform such analyses.

Our study offers insights into situations in which cross-lags can bias analysis of ecological and evolutionary time series and suggests that adopting dynamical models can be important, as this directly affects our understanding of population regulation, the evolution of life histories and cooperation, and possibly many other topics. Determining how strong estimation bias due to ignoring covariate endogeneity has been in the ecological literature requires further study, also because it may interact with other sources of bias.


The data was part of a long-term study on red-winged fariy wrens (Malurus elegans) in South-west Australia (Pemberton) from 2008-2016. In each year data was collected on group size, offspring production and survival of all group members. See description in Box 4 in the associated paper, and references therein.

Usage Notes

Tutorials (Rmarkdown files), R function (R-file) and empirical data (asci-text file) associated with the paper (5 files in total).

Tutorial1.rmd shows estimation bias in simulated dataset (see Box 1 & 2 in paper for details).

Tutorial2.rmd illustrates bias due to measurement error and how to account for it (see Box 3 in paper for details).

Tutorial3.rmd explains how to analyze the real-world case study of group living benefits in red-winged fairy wrens (see Box 4 in paper for details).

Melegans.txt contains the emprical data for red-winged fairy wrens (Malurus elegans) for each of the 108 groups (SubjectID) across 9 years (time). Presented are 698 values for adult group size (GroupSize), the number of surviving adults till the next year (Survivors) and the group productivity in terms of number of offspring produced in a year that survives till the next year (Offspring), and their 1-timestep-lagged variables (OffspringLagged & GroupSizeLagged, with LaggedUnavailable=1 meaning missing lagged value). There are no further missing values, see description in Box 4 in paper and references therein for details.

simulation_functions.R contains the R functions used in Tutorials 1-3.


Australian Research Council, Award: DE130100174