Data from: Evolvability predicts macroevolution under fluctuating selection

Holstad, Agnes 1 ; Voje, Kjetil 2 ; Opedal, Øystein 3 ; Bolstad, Geir 4 ; Bourg, Salomé1 ; Hansen, Thomas2 ; Pélabon, Christophe1

Research facility: Norwegian University of Science and Technology

Published Aug 16, 2023; Updated Apr 25, 2024 on Dryad. https://doi.org/10.5061/dryad.4j0zpc8hx

Data files

Apr 25, 2024 version files 4.81 MB

Abstract

Heritable variation is a prerequisite for evolutionary change. Yet, whether genetic potential for microevolution is relevant on macroevolutionary timescales is debated. Here we show that evolutionary divergence among populations, and to a lesser extent among species, increases with microevolutionary evolvability in both extant and extinct taxa. We evaluate and reject a number of hypotheses put forward to explain this relationship and propose that an effect of evolvability on population and species divergence can be explained by the influence of genetic constraints on population’s ability to track rapid stationary environmental fluctuations.

This data repository contains the data underlying the article:

Holstad, A., Voje, K. L., Opedal, Ø. H., Bolstad, G. H., Bourg, S., Hansen, T. F. and Pélabon, C. Evolvability predicts macroevolution under fluctuating selection. Science 384, 688–693 (2024)

Contact information on corresponding author:

Name: Agnes Holstad
Affiliation: Department of Biology, Centre for Biodiversity Dynamics, Norwegian University of Science and Technology; Trondheim, Norway
ORCID ID: https://orcid.org/0000-0003-3154-1857
Email: agnes.holstad@ntnu.no
Alternate Email: agnes.holstad@gmail.com

Co-author ORCID IDs:

Kjetil L. Voje: https://orcid.org/0000-0003-2556-3080
Øystein H. Opedal: https://orcid.org/0000-0002-7841-6933
Geir H. Bolstad: https://orcid.org/0000-0003-1356-8239
Salomé Bourg:
Thomas F. Hansen:
Cristophe Pélabon: https://orcid.org/0000-0002-8630-8983

Details on this README file

File format: .md
Author: Agnes Holstad
Date created: 10.08.2023

Description of the data and file structure

The results in the article stems from two separate meta datasets, both gathered from studies in the primary scientific literature. One meta dataset contains contemporary populations and species and the other is comprised of fossil time series.

The contemporary data

This data comprises traits on a ratio scale with requirements of having at least two populations (or species) means and one genetic variation estimate.

The contemporary data consists of 2 files:

contemporary_data.txt: The data underlying the main analysis of the contemporary data.
conditional_evolvability.txt: The data underlying fig S8 and table S3.

Details for: contemporary_data.txt

Contributors: Øystein Opedal and Agnes Holstad
Format: .txt, tab delimited
Size: 1 MB
Dimensions: 2698 rows x 47 columns
Missing data codes: NA
Variables:
- studyID: Unique identifier for all traits from the same study
- trait: Trait name as it is given in original study
- trait_UUID: Universal Unique identifier for traits measured with the same method by the same group. I.e., divergence is estimated on all pop/sp with the same trait_UUID
- trait.type: The type of trait, e.g., morphological, physiological life history
- measure: The measurement as described in the original study
- unit: Units the trait is measured in
- dimension: Trait dimension or type of scale. E.g. linear, area, mass/volume, count, growth rate, ratio
- transformation.G: If the trait values are transformed prior to estimation of Va (additive genetic variance). E.g. log_base, sqrt, Z, mean_centering, mean_std
- transformation.P: If the trait values are transformed prior to estimation of Vp (phenotypic variance). E.g. log_base, sqrt, Z, mean_centering, mean_std
- n.fam: Number of families in the genetic analysis
- n.genetic: Number of individuals in the genetic analysis
- n.pheno: Sample size for the phenotypic data
- h2: Heritability
- se.h2: standard error of h2
- trait.mean: Phenotypic trait mean
- se: Standard error of trait mean
- vp: Phenotypic variance
- se.vp: Standard error of phenotypic variance
- sd: Standard deviation
- va: Genetic variance
- se.va: Standard error of genetic variance
- estim_method: Estimation method of genetic variance, REML/ML/LS/potsmean/postmode
- ve: Environmental variance
- se.ve: Standrad error of environmental variance
- cva: Genetic coefficient of variance
- se.cva: Standard error of cva
- evol: Evolvability, mean standardised or proportional genetic variance
- se.evol: Standard error of evolvability
- x100: If cva and evolvability is multiplied by 100, Y/N
- only_sp: If the data is only for species (Y/N/B) (only species data/only population data/both)
- environment.g: The environment where the measures for the genetic estimates are taken, e.g., field or common_garden
- environment.p: The environment where the measures for the phenotypic estimates are taken, e.g., field or common_garden
- kingdom
- phylum
- taxon
- order
- family
- genus
- species: Written as Genus_species
- population: Name of the population
- sex: Female/Male/both
- reference: FirstAuthor_year
- journal
- vol
- year: In format YYYY
- DOI
- notes

Details for: conditional_evolvability.txt

Contributors: Agnes Holstad
Format: .txt, tab delimited
Size: 9.2 KB
Dimensions: 66 rows x 10 columns
Missing data codes: NA
Variables:
- G_id: Unique identifier for the G matrix
- trait: Trait name as it is given in original study
- trait_UUID: Universal Unique identifier for traits measured with the same method by the same group.
- measure: The measurement as described in the original study
- trait.mean: Phenotypic trait mean
- evol: Evolvability, mean standardised or proportional genetic variance
- c_evol: Evolvability conditioned on a trait that represents the size of the organism
- auto: The autonomy of the two traits (focal trait and trait representing size of the organism)
- trait.type: The type of trait, e.g., morphological, physiological life history
- dimension: Trait dimension or type of scale. E.g. linear, area, mass/volume, count, growth rate, ratio

The fossil data

The fossil data was retrieved from the database curated by Kjetil L. Voje:

K. L. Voje, Phenotypic Evolution Time Series (PETS) Database, version 1.0 (2023). https://pets.nhm.uio.no

The fossil data is comprised of time series that follow one lineage through time, and the samples can be considered as populations sampled from the same lineage through time. We required one or more traits to be measured, with a minimum of two time steps. The trait was also required to be on ratio scale.

The fossil data consists of 5 files:

fossil_data_consecutive.txt: The data underlying the analyses using evolvability to predict the morphological distance to the consecutive sample throughout the time series.
fossil_data_sum.txt: The data underlying the analyses that uses the average evolvability of the time series to predict the total variance of sample means in the time series.
fossil_meta_data.txt: Giving the meta data of the study and time series, linked to the other files by study ID (stID) and time series ID (tsID).
grey_et_al_2012.txt: Used as an example time series in Figure 3A.
res_grid_search.txt: The code to obtain this data is in "run4_supplementary_figures.R". This data frame can be used to avoid running the analyses that takes > 10 min.

Details for: fossil_data_consecutive.txt

Contributors: Kjetil L. Voje and Agnes Holstad
Format: .txt, tab delimited
Size: 2.2 MB
Dimensions: 10594 rows x 18 columns
Missing data codes: NA
Variables:
- stID: The study ID, that is linked to the "stID" column in the "fossil_meta_data.txt" file.
- tsID: The time series ID, that is linked to the "tsID" column in the "fossil_meta_data.txt" file.
- trait.mean: The natural log of the trait mean of the sample i.
- trait.mean2: The natural log of the trait mean of the sample i+1.
- sample.var: The raw sample variance of sample i, estimated on a proportional scale, i.e. as var(ln(x)) or var(x)/x^2.
- sample.var2: The raw sample variance of sample i+1, estimated on a proportional scale.
- diff: The distance to the trait mean of the consecutive sample.
- abs.diff: The absolute distance to the trait mean of the consecutive sample.
- time.diff: The time in million years to the consecutive sample.
- sample.size: The number of individuals in sample i.
- sample.size2: The number of individuals in sample i+1.
- max.duration: The maximum possible duration the samples could span, estimated as the total elapsed time of the time series divided by the number of samples.
- time.elapsed: Time elapsed from the first sample in the time series to sample i.
- distance.to.optimum: Distance of the trait mean to the estimated stationary optimum (fitted with an Ornstein-Uhlenbeck process) of the time series.
- taxa
- species: Written as Genus_species
- trait.type: Type of trait dimension or type of scale, e.g., linear, area, count, ratio, percent.
- microfossil: If microfossil (yes/no)

Details for: fossil_meta_data.txt

Contributors: Kjetil L. Voje and Agnes Holstad
Format: .txt, tab delimited
Size: 340 KB
Dimensions: 589 rows x 28 columns
Missing data codes: NA
Variables:
- stID: The study ID
- tsID: The time series ID
- popID: The population ID
- description: Description of the trait measure as given in the original study
- citation
- URL: DOI of the study
- total_N: Total sample size of all samples in the time series
- steps: Number of steps in the time series
- interval_MY: Time interval of the entire time series in millions of years
- trait_type: Type of trait dimension or type of scale, e.g., linear, area, count, ratio, percent.
- taxa
- species: Written as Genus_species
- microfossil: If microfossil (yes/no)
- sampling: What sampling type is used for collecting samples, e.g. geological fieldwork, sediment core
- age_model: What model is used for aging the samples
- sediment: Type of sediment
- environment: type of environment
- period_start
- period_end
- epoch_start
- epoch_end
- age_start
- age_end
- source
- publication_year
- lat
- lon

Details for: fossil_data_sum.txt

Contributors: Kjetil L. Voje and Agnes Holstad
Format: .txt, tab delimited
Size: 116 KB
Dimensions: 589 rows x 16 columns
Missing data codes: NA
Variables:
- stID: The study ID
- tsID: Time series ID
- div: Divergence among all fossil samples in the time series, estimated as the variance of the natural log trait means.
- div2.corr: Divergence corrected for sampling error, estimated as div-mean(SE^2).
- mean.var: The average sample variance within a time series weighted on sample size. Estimated on a proportional scale.
- stationary.var: The stationary variance estimated from the Ornstein-Uhlenbeck process fitted to the time series with a stationary optimum.
- alpha: Rate of adaptation towards the optimum estimated from the Ornstein-Uhlenbeck process fitted to the time series with a stationary optimum.
- var.obs: Variance among the sample variance estimates within a time series.
- average.n: Average sample size per time series.
- n.steps: Number of samples in the time series.
- length.in.myr: Time span of the entire time series in millions of years.
- maximum.duration: The maximum possible duration the samples could span, estimated as the total elapsed time of the time series (length.in.myr) divided by the number of samples.
- trait.type: Type of trait dimension or type of scale, e.g., linear, area, count, ratio, percent.
- microfossil: If microfossil (yes/no)
- taxa
- species: Written as Genus_species

Details for: grey_et_al_2012.txt

Contributors: Kjetil L. Voje and Agnes Holstad
Format: .txt, tab delimited
Size: 359 B
Dimensions: 14 rows x 5 columns
Missing data codes: NA
Variables:
- N: sample size
- trait_mean: Trait mean
- trait_var: Sample variance
- age_MY: Age of the sample, in million years elapsed since first sample in the time series.
- tsID: Time series ID

Details for: res_grid_search.txt

Contributors: Agnes Holstad and Geir Bolstad
Format: .txt, tab delimited
Size: 870 KB
Dimensions: 10000 rows x 5 columns
Missing data codes: NA
Variables:
- a: The alpha parameter of the two-layered Ornstein-Uhlenbeck process. Measures how fast the optimum returns to its central value.
- r: The r parameter of the two-layered Ornstein-Uhlenbeck process. Measures how fast the trait of the population tracks the optimum.
- a_yr: Half life of the alpha parameter in years, estimated as ln(2)/a.
- r_yr: Half life of the r parameter in years, estimated as ln(2)/r.
- loglik: The log-likelihood for the combination of a and r of the deviance of the observed ln(R) (rate of evolution) from the predicted ln(R).

Details on the script for the analyses

The analyses and figures of the paper can be run in R by using the provided scripts. No other software required. Packages required for the analyses are listed at the top of each script.

The scripts

run1_contemporary_analyses.R: Prepares and runs the main analyses of the contemporary data.
run2_fossil_analyses.R: Prepares and runs the main analyses of the fossil data.
run3_main_figures.R: Plots the main figures of the paper. Need to run both "run1_contemporary_analyses.R" and "run2_fossil_analyses.R" before this.
run4_supplementary_figures.R: Runs the supplementary analyses and plots the supplementary figures of the paper. Need to run both "run1_contemporary_analyses.R" and "run2_fossil_analyses.R" before this.