Museum genomics of reveals temporal genetic stasis and global genetic diversity in Arabidopsis thaliana
Data files
Jul 09, 2025 version files 61.39 MB
-
Arabidopsis_herbarium-main.zip
24.24 KB
-
README.md
2.05 KB
-
stem_out_GWAS.zip
49.90 MB
-
stem_outLowMiss_allrange.zip
11.47 MB
Abstract
Global patterns of genetic diversity through time offer a window into evolutionary processes that maintain diversity. Over time, lineages may expand or contract their distribution, causing turnover in population genetic composition. At individual loci, drift, environmental changes, and purifying selection may affect allele frequencies. Museum specimens of widely distributed species offer a unique window into genetic composition of understudied populations and changes over time. Here, we sequenced genomes of 130 herbarium specimens and 91 new field collections of Arabidopsis thaliana and combined these with published genomes. We sought a broader view of genomic diversity across the species, and to test if population genomic composition is changing through time. Using herbarium specimen sequences, we documented extensive and previously uncharacterized diversity in a range of populations in Africa, populations that are under threat from anthropogenic climate change. Through time, we did not find dramatic changes in population genomic composition of populations. Instead, we found a pattern of genetic change every 100 years of the same magnitude seen when comparing Eurasian populations that are 185 km apart, potentially due to a combination of drift and changing selection. However, we found only mixed signals of polygenic adaptation at phenology and physiology QTL. We did find that genes conserved across eudicots show altered levels of directional allele frequency change, potentially due to variable purifying and background selection. Our study highlights how museum specimens can reveal new dimensions of population diversity and show how wild populations are evolving in recent history.
https://doi.org/10.5061/dryad.31zcrjdz1
Description of the data and file structure
This contains SNP data for the project. One file contains SNPs filtered for missingness used for population structure and another contains those used for temporal allele frequency change (using essentially a GWAS with year of collection as the 'phenotype'.
Files and variables
File: stem_out_GWAS.zip
Description: SNPs filtered for missingness used for temporal allele frequency change analysis. Includes the three standard PLINK type formatted files, stem_out_GWAS.bed, stem_out_GWAS.bim, and stem_out_GWAS.fam. These can be read in using the PLINK software.
File: stem_outLowMiss_allrange.zip
Description: SNPs filtered for missingness used for population structure. Includes the three standard PLINK type formatted files, stem_outLowMiss_allrange.bed, stem_outLowMiss_allrange.bim, and stem_outLowMiss_allrange.fam. These can be read in using the PLINK software.
Code/software
Scripts for PLINK and R found here https://github.com/jesserlasky/Arabidopsis_herbarium
Also, scripts are archived in this Dryad submission, in this file Arabidopsis_herbarium-main.zip
This directory contains
aDNA_bionf_Lua.sh - a shell script for bioinformatics of the herbarium specimen sequencing.
PLINKcode.4.sh - a shell script for running PLINK analyses.
PLINK_assist.3_S.R - an R script for analyzing population structure.
TimeGWAS5_QTL.R - an R script for testing for allele frequency change over time and enrichment in putative QTL.
regGWAS_conserved_splineFig2.R - an R script testing global and regional patterns of allele frequency change in conserved genes and making a figure.
bgs_lasky2_stepping_flex.slim - a SLiM simulation script to test how background selection affects allele frequency change over time.