Skip to main content
Dryad

Data from: Natural selection drives emergent genetic homogeneity in a century-scale experiment with barley

Cite this dataset

Koenig, Daniel et al. (2024). Data from: Natural selection drives emergent genetic homogeneity in a century-scale experiment with barley [Dataset]. Dryad. https://doi.org/10.5061/dryad.z34tmpgm8

Abstract

Direct observation is central to our understanding of the process of adaptation, but evolution is rarely documented in a large, multicellular organism for more than a few generations. Here, we observe genetic and phenotypic evolution across a century-scale competition experiment, barley composite cross II (CCII). CCII was founded in 1929 with tens of thousands of unique genotypes and has been adapted to local conditions in Davis, CA, USA for 58 generations. We find that natural selection has massively reduced genetic diversity leading to a single clonal lineage constituting most of the population by generation F50. Selection favored alleles originating from similar climates to that of Davis, and targeted genes regulating reproductive development, including some of the most well-characterized barley diversification loci, Vrs1, HvCEN, and Ppd-H1. We chronicle the dynamic evolution of reproductive timing in the population and uncover how parallel molecular pathways are targeted by stabilizing selection to optimize this trait. Our findings point to selection as the predominant force shaping genomic variation in one of the world’s oldest ongoing biological experiments.

README: Data From "Natural selection drives emergent genetic homogeneity in a century-scale experiment with barley"

https://doi.org/10.5061/dryad.z34tmpgm8

Genetic datasets from the sequencing of composite cross II

Description of the data and file structure

-Genotype datasets

CCII_PARENTS_AND_EXOME.vcf.gz

Genotype file for the merged CCII parents and Exome sequencing panel (Russell et al. 2016, Nature Genetics)

FILTERED_PARENTAL_SNPS.vcf

CCII parent SNP calls

FINAL_AFS.txt.gz

Allele counts in CCII progeny pools for each SNP. Columns are in the following order.

Chromosome

Position

Reference_allele

Alternate_allele

Reference_allele_count_Parents

Alternate_allele_count_Parents

Reference_allele_count_F18

Alternate_allele_count_F18

Reference_allele_count_F28

Alternate_allele_count_F28

Reference_allele_count_F58

Alternate_allele_count_F58

SNP_EFF_ANNOTATION

SNP_EFF_EFFECT

SNP_EFF_NEAREST_GENE

GBS_PARENTS_PROGENY.vcf.gz

Inidividual GBS genotype calls for parents and progeny. Pedigree numbers are as follows. Generation numbers of the CCII are indicated as F**

F18: 1_####

F28: 2_####

F58: 7_####

-Phenotypic datasets for the CCIIselect

Columns are as follows:

Genotype name

rows (Count of seed rows)

tiller_number (Count)

spikes mature_spikes (Count)

immature_spikes (Count)

X100_seed_mass (g)

seed_mass (g)

seed_estimate (Count)

days_to_awn_emergence (Count)

days_to_heading (Count)

plant_height (cm)

spike_length (cm)

spike_width (cm)

awn_length (cm)

days_to_heading_2017 (Count)

Parents_pheno.txt.gz

Trait measurements from greenhouse grow outs of the parents of CCII

Prog_pheno.txt.gz

Trait measurements from greenhouse grow outs of progeny selections of CCII. Pedigree numbers are the same as in the GBS dataset.

-The following datasets are derived from the whole genome datasets above and were the specific input files for the whole genome simulations.

afs_for_sim.txt.gz

Allele frequency data used for input for the simulations

parents_for_sim.txt.gz

Input data for sites with no missing data to simulate the CCII

Funding

National Science Foundation, Award: IOS-2046256