Data from: Evolvability predicts macroevolution under fluctuating selection
Data files
Apr 25, 2024 version files 4.81 MB

conditional_evolvability.txt

contemporary_data.txt

fossil_data_consecutive.txt

fossil_data_sum.txt

fossil_meta_data.txt

grey_et_al_2012.txt

README.md

res_grid_search.txt
Apr 25, 2024 version files 4.81 MB

conditional_evolvability.txt

contemporary_data.txt

fossil_data_consecutive.txt

fossil_data_sum.txt

fossil_meta_data.txt

grey_et_al_2012.txt

README.md

res_grid_search.txt
Abstract
Heritable variation is a prerequisite for evolutionary change. Yet, whether genetic potential for microevolution is relevant on macroevolutionary timescales is debated. Here we show that evolutionary divergence among populations, and to a lesser extent among species, increases with microevolutionary evolvability in both extant and extinct taxa. We evaluate and reject a number of hypotheses put forward to explain this relationship and propose that an effect of evolvability on population and species divergence can be explained by the influence of genetic constraints on population’s ability to track rapid stationary environmental fluctuations.
README: Evolvability and divergence in contemporary and fossil species
This data repository contains the data underlying the article:
Holstad, A., Voje, K. L., Opedal, Ø. H., Bolstad, G. H., Bourg, S., Hansen, T. F. and Pélabon, C. Evolvability predicts macroevolution under fluctuating selection. Science 384, 688–693 (2024)
Contact information on corresponding author:
 Name: Agnes Holstad
 Affiliation: Department of Biology, Centre for Biodiversity Dynamics, Norwegian University of Science and Technology; Trondheim, Norway
 ORCID ID: https://orcid.org/0000000331541857
 Email: agnes.holstad@ntnu.no
 Alternate Email: agnes.holstad@gmail.com
Coauthor ORCID IDs:
 Kjetil L. Voje: https://orcid.org/0000000325563080
 Øystein H. Opedal: https://orcid.org/0000000278416933
 Geir H. Bolstad: https://orcid.org/0000000313568239
 Salomé Bourg:
 Thomas F. Hansen:
 Cristophe Pélabon: https://orcid.org/0000000286308983
Details on this README file
 File format: .md
 Author: Agnes Holstad
 Date created: 10.08.2023
Description of the data and file structure
The results in the article stems from two separate meta datasets, both gathered from studies in the primary scientific literature. One meta dataset contains contemporary populations and species and the other is comprised of fossil time series.
The contemporary data
This data comprises traits on a ratio scale with requirements of having at least two populations (or species) means and one genetic variation estimate.
The contemporary data consists of 2 files:
 contemporary_data.txt: The data underlying the main analysis of the contemporary data.
 conditional_evolvability.txt: The data underlying fig S8 and table S3.
Details for: contemporary_data.txt
 Contributors: Øystein Opedal and Agnes Holstad
 Format: .txt, tab delimited
 Size: 1 MB
 Dimensions: 2698 rows x 47 columns
 Missing data codes: NA
 Variables:
 studyID: Unique identifier for all traits from the same study
 trait: Trait name as it is given in original study
 trait_UUID: Universal Unique identifier for traits measured with the same method by the same group. I.e., divergence is estimated on all pop/sp with the same trait_UUID
 trait.type: The type of trait, e.g., morphological, physiological life history
 measure: The measurement as described in the original study
 unit: Units the trait is measured in
 dimension: Trait dimension or type of scale. E.g. linear, area, mass/volume, count, growth rate, ratio
 transformation.G: If the trait values are transformed prior to estimation of Va (additive genetic variance). E.g. log_base, sqrt, Z, mean_centering, mean_std
 transformation.P: If the trait values are transformed prior to estimation of Vp (phenotypic variance). E.g. log_base, sqrt, Z, mean_centering, mean_std
 n.fam: Number of families in the genetic analysis
 n.genetic: Number of individuals in the genetic analysis
 n.pheno: Sample size for the phenotypic data
 h2: Heritability
 se.h2: standard error of h2
 trait.mean: Phenotypic trait mean
 se: Standard error of trait mean
 vp: Phenotypic variance
 se.vp: Standard error of phenotypic variance
 sd: Standard deviation
 va: Genetic variance
 se.va: Standard error of genetic variance
 estim_method: Estimation method of genetic variance, REML/ML/LS/potsmean/postmode
 ve: Environmental variance
 se.ve: Standrad error of environmental variance
 cva: Genetic coefficient of variance
 se.cva: Standard error of cva
 evol: Evolvability, mean standardised or proportional genetic variance
 se.evol: Standard error of evolvability
 x100: If cva and evolvability is multiplied by 100, Y/N
 only_sp: If the data is only for species (Y/N/B) (only species data/only population data/both)
 environment.g: The environment where the measures for the genetic estimates are taken, e.g., field or common_garden
 environment.p: The environment where the measures for the phenotypic estimates are taken, e.g., field or common_garden
 kingdom
 phylum
 taxon
 order
 family
 genus
 species: Written as Genus_species
 population: Name of the population
 sex: Female/Male/both
 reference: FirstAuthor_year
 journal
 vol
 year: In format YYYY
 DOI
 notes
Details for: conditional_evolvability.txt
 Contributors: Agnes Holstad
 Format: .txt, tab delimited
 Size: 9.2 KB
 Dimensions: 66 rows x 10 columns
 Missing data codes: NA
 Variables:
 G_id: Unique identifier for the G matrix
 trait: Trait name as it is given in original study
 trait_UUID: Universal Unique identifier for traits measured with the same method by the same group.
 measure: The measurement as described in the original study
 trait.mean: Phenotypic trait mean
 evol: Evolvability, mean standardised or proportional genetic variance
 c_evol: Evolvability conditioned on a trait that represents the size of the organism
 auto: The autonomy of the two traits (focal trait and trait representing size of the organism)
 trait.type: The type of trait, e.g., morphological, physiological life history
 dimension: Trait dimension or type of scale. E.g. linear, area, mass/volume, count, growth rate, ratio
The fossil data
The fossil data was retrieved from the database curated by Kjetil L. Voje:
K. L. Voje, Phenotypic Evolution Time Series (PETS) Database, version 1.0 (2023). https://pets.nhm.uio.no
The fossil data is comprised of time series that follow one lineage through time, and the samples can be considered as populations sampled from the same lineage through time. We required one or more traits to be measured, with a minimum of two time steps. The trait was also required to be on ratio scale.
The fossil data consists of 5 files:
 fossil_data_consecutive.txt: The data underlying the analyses using evolvability to predict the morphological distance to the consecutive sample throughout the time series.
 fossil_data_sum.txt: The data underlying the analyses that uses the average evolvability of the time series to predict the total variance of sample means in the time series.
 fossil_meta_data.txt: Giving the meta data of the study and time series, linked to the other files by study ID (stID) and time series ID (tsID).
 grey_et_al_2012.txt: Used as an example time series in Figure 3A.
 res_grid_search.txt: The code to obtain this data is in "run4_supplementary_figures.R". This data frame can be used to avoid running the analyses that takes > 10 min.
Details for: fossil_data_consecutive.txt
 Contributors: Kjetil L. Voje and Agnes Holstad
 Format: .txt, tab delimited
 Size: 2.2 MB
 Dimensions: 10594 rows x 18 columns
 Missing data codes: NA
 Variables:
 stID: The study ID, that is linked to the "stID" column in the "fossil_meta_data.txt" file.
 tsID: The time series ID, that is linked to the "tsID" column in the "fossil_meta_data.txt" file.
 trait.mean: The natural log of the trait mean of the sample i.
 trait.mean2: The natural log of the trait mean of the sample i+1.
 sample.var: The raw sample variance of sample i, estimated on a proportional scale, i.e. as var(ln(x)) or var(x)/x^2.
 sample.var2: The raw sample variance of sample i+1, estimated on a proportional scale.
 diff: The distance to the trait mean of the consecutive sample.
 abs.diff: The absolute distance to the trait mean of the consecutive sample.
 time.diff: The time in million years to the consecutive sample.
 sample.size: The number of individuals in sample i.
 sample.size2: The number of individuals in sample i+1.
 max.duration: The maximum possible duration the samples could span, estimated as the total elapsed time of the time series divided by the number of samples.
 time.elapsed: Time elapsed from the first sample in the time series to sample i.
 distance.to.optimum: Distance of the trait mean to the estimated stationary optimum (fitted with an OrnsteinUhlenbeck process) of the time series.
 taxa
 species: Written as Genus_species
 trait.type: Type of trait dimension or type of scale, e.g., linear, area, count, ratio, percent.
 microfossil: If microfossil (yes/no)
Details for: fossil_meta_data.txt
 Contributors: Kjetil L. Voje and Agnes Holstad
 Format: .txt, tab delimited
 Size: 340 KB
 Dimensions: 589 rows x 28 columns
 Missing data codes: NA
 Variables:
 stID: The study ID
 tsID: The time series ID
 popID: The population ID
 description: Description of the trait measure as given in the original study
 citation
 URL: DOI of the study
 total_N: Total sample size of all samples in the time series
 steps: Number of steps in the time series
 interval_MY: Time interval of the entire time series in millions of years
 trait_type: Type of trait dimension or type of scale, e.g., linear, area, count, ratio, percent.
 taxa
 species: Written as Genus_species
 microfossil: If microfossil (yes/no)
 sampling: What sampling type is used for collecting samples, e.g. geological fieldwork, sediment core
 age_model: What model is used for aging the samples
 sediment: Type of sediment
 environment: type of environment
 period_start
 period_end
 epoch_start
 epoch_end
 age_start
 age_end
 source
 publication_year
 lat
 lon
Details for: fossil_data_sum.txt
 Contributors: Kjetil L. Voje and Agnes Holstad
 Format: .txt, tab delimited
 Size: 116 KB
 Dimensions: 589 rows x 16 columns
 Missing data codes: NA
 Variables:
 stID: The study ID
 tsID: Time series ID
 div: Divergence among all fossil samples in the time series, estimated as the variance of the natural log trait means.
 div2.corr: Divergence corrected for sampling error, estimated as divmean(SE^2).
 mean.var: The average sample variance within a time series weighted on sample size. Estimated on a proportional scale.
 stationary.var: The stationary variance estimated from the OrnsteinUhlenbeck process fitted to the time series with a stationary optimum.
 alpha: Rate of adaptation towards the optimum estimated from the OrnsteinUhlenbeck process fitted to the time series with a stationary optimum.
 var.obs: Variance among the sample variance estimates within a time series.
 average.n: Average sample size per time series.
 n.steps: Number of samples in the time series.
 length.in.myr: Time span of the entire time series in millions of years.
 maximum.duration: The maximum possible duration the samples could span, estimated as the total elapsed time of the time series (length.in.myr) divided by the number of samples.
 trait.type: Type of trait dimension or type of scale, e.g., linear, area, count, ratio, percent.
 microfossil: If microfossil (yes/no)
 taxa
 species: Written as Genus_species
Details for: grey_et_al_2012.txt
 Contributors: Kjetil L. Voje and Agnes Holstad
 Format: .txt, tab delimited
 Size: 359 B
 Dimensions: 14 rows x 5 columns
 Missing data codes: NA
 Variables:
 N: sample size
 trait_mean: Trait mean
 trait_var: Sample variance
 age_MY: Age of the sample, in million years elapsed since first sample in the time series.
 tsID: Time series ID
Details for: res_grid_search.txt
 Contributors: Agnes Holstad and Geir Bolstad
 Format: .txt, tab delimited
 Size: 870 KB
 Dimensions: 10000 rows x 5 columns
 Missing data codes: NA
 Variables:
 a: The alpha parameter of the twolayered OrnsteinUhlenbeck process. Measures how fast the optimum returns to its central value.
 r: The r parameter of the twolayered OrnsteinUhlenbeck process. Measures how fast the trait of the population tracks the optimum.
 a_yr: Half life of the alpha parameter in years, estimated as ln(2)/a.
 r_yr: Half life of the r parameter in years, estimated as ln(2)/r.
 loglik: The loglikelihood for the combination of a and r of the deviance of the observed ln(R) (rate of evolution) from the predicted ln(R).
Details on the script for the analyses
The analyses and figures of the paper can be run in R by using the provided scripts. No other software required. Packages required for the analyses are listed at the top of each script.
The scripts
 run1_contemporary_analyses.R: Prepares and runs the main analyses of the contemporary data.
 run2_fossil_analyses.R: Prepares and runs the main analyses of the fossil data.
 run3_main_figures.R: Plots the main figures of the paper. Need to run both "run1_contemporary_analyses.R" and "run2_fossil_analyses.R" before this.
 run4_supplementary_figures.R: Runs the supplementary analyses and plots the supplementary figures of the paper. Need to run both "run1_contemporary_analyses.R" and "run2_fossil_analyses.R" before this.
Methods
The data is collected from the primary scientific literature and consists of two independent meta datasets. One meta dataset contains contemporary populations and species and the other is comprised of fossil time series. The contemporary data is comprised of traits on a ratio scale with the requirements of having at least two populations (or species) means and one genetic variation estimate.