Patterns of genotype-environment association in the eastern North American yellow birch (Betula alleghaniensis Britt.)
Data files
Dec 09, 2025 version files 45.48 MB
Abstract
Understanding how genomic adaptation shapes species’ responses to climate change is essential for developing climate-resilient forests, as shifting conditions increasingly drive range shifts and maladaptation. This study investigates adaptive genomic variation in Betula alleghaniensis (yellow birch), a widely distributed hardwood of eastern North America. Genome-wide SNP variation from 27 populations was analyzed using 3D-genotype-by-sequencing and two genotype–environment association methods: redundancy analysis and Gradient Forests. A total of 124 putatively adaptive loci were identified, linked to extreme minimum temperature, degree-days below 0°C, winter precipitation, and snowfall. Functional annotation revealed roles in stress response and transcriptional regulation. Patterns of adaptive variation showed a latitudinal gradient tied to winter severity and spatially heterogeneous responses to snowfall. Two distinct clusters of adaptive loci were identified along climate gradients, suggesting winter climate plays a dominant role in shaping local adaptation. Future climate projections (SSP5-8.5, 2041–2070) predict substantial shifts in adaptive alleles in the Northeastern Appalachians, Maritimes, and St. Lawrence River regions. Nevertheless, genetic offset across the range was relatively low, suggesting genomic resilience potentially supported by yellow birch’s autohexaploid genome and extensive gene flow, including adaptive introgression from hybridization with other Betula species. These findings support integrating genomic data into forest management.
Dataset DOI: 10.5061/dryad.wwpzgmsxc
Description of the data and file structure
This dataset was collected to investigate spatial patterns of genomic variation and climate adaptation in the eastern North American tree Betula alleghaniensis (yellow birch). Leaf tissue or seedlings were sampled from 27 populations across the species’ range, and genomic DNA was extracted and genotyped using a 3D-GBS protocol. The resulting SNP data were used to characterize population structure, identify climate-associated loci through multiple genotype–environment association methods, model adaptive genomic variation under current and future climate scenarios, and evaluate evolutionary capacity using standing genetic variation and population adaptive index metrics. The associated scripts reproduce all filtering, analysis, and mapping steps used in the study.
Files and variables
File: Cummins_etal._PolyRAD_GEA_R_code.txt
Description: R code implementing the complete analysis pipeline used in the manuscript, including:
- polyRAD filtering of paralogous loci
- hybrid detection and removal
- weighted-mean genotype calculation
- PCA of population structure
- dbMEM geographic structure modeling
- climate variable processing
- four GEA methods (RDA-R, RDA-X, GF-R, GF-X)
- adaptive index computation under current and future climates
- genetic offset estimation
- SGV and PAI calculation
Paths are generic and require user modification.
Variables
-
Not applicable — this is a code file.
Cummins:
-
et al. (2025) - Patterns of genotype-environment association in the eastern North American yellow birch (Betula alleghaniensis Britt.) ####:
File: Cummins_etal_gene_ontology_code.txt
Description: Unix-based workflow used to perform functional annotation of putatively adaptive loci. Includes commands for:
- AUGUSTUS gene prediction
- conversion to GFF3 and BED formats
- BEDTools intersections between SNPs and predicted genes
- eggNOG-mapper functional annotation
Used to assign gene functions, GO terms, and pathways to climate-associated SNPs.
Variables
-
Not applicable — this is a code file.
Cummins:
-
et al. (2025) - Patterns of genotype-environment association in the eastern North American yellow birch (Betula alleghaniensis Britt.) ####:
File: second_filters_m4_p80_x0_S3.singleton.unlinked_100k_0.5.imputed_k1.vcf.gz
Description: Compressed VCF file containing 32,295 high-confidence SNPs genotyped from 221 Betula alleghaniensis individuals using the 3D-GBS protocol. This file was generated by the IBIS bioinformatics pipeline (adapter trimming, quality filtering, alignment to Betula pendula, SNP calling with STACKS) and represents the input dataset for all downstream filtering, GEA analyses, adaptive index calculations, and SGV/PAI metrics.
Variables (VCF fields):
#CHROM— Chromosome or scaffold namePOS— Genomic position (base pair)ID— SNP identifierREF— Reference alleleALT— Alternate allele(s)QUAL— Phred-scaled quality scoreFILTER— Filter statusINFO— Site-level metadata (e.g.,NS,AF,DP)FORMAT— Per-sample fields (e.g.,GT,AD,DP,GQ,GL)- Individual columns — Hexaploid genotype calls or allele depths per sample
Missing values:
Missing genotypes indicated by ./. or . according to VCF 4.2 specification.
Code/software
The primary genotype dataset is provided as a compressed VCF (.vcf.gz) and can be viewed with any VCF-compatible tool, including free/open-source software such as bcftools/htslib, vcftools, and IGV. The analyses can also be read directly in R using packages such as VariantAnnotation or vcfR.
All downstream analyses are reproduced in the accompanying scripts:
-
Cummins_etal._PolyRAD_GEA_R_code.txt(R workflow)Cummins_etal._PolyRAD_GEA_R_code
This script was executed in R v4.4.1 and reproduces paralog filtering, hybrid removal, population structure PCA, climate processing, four genotype–environment association (GEA) methods, adaptive index mapping under current and future climates, genetic offset calculation, and SGV/PAI metrics.
Key R packages used (all open source from CRAN/R-Forge/Bioconductor) include:
polyRAD,VariantAnnotation,adegenet,vegan,adespatial,igraph,dplyr,ggplot2,ggforce,terra,sf,raster,ggcorrplot,robust,qvalue,gradientForest,rjags,coda,VennDiagram,rnaturalearth,rnaturalearthdata,tibble,tidyr.
Users should adjust the generic file paths at the top of each section to match their local folder structure. -
Cummins_etal_gene_ontology_code.txt(command-line ontology workflow)Cummins_etal_gene_ontology_code
This file contains Unix command-line scripts used for gene prediction and functional annotation of climate-associated loci. Required free/open tools and versions:- AUGUSTUS v3.5.0 (gene prediction)
- BEDTools v2.30.0 (SNP–gene intersection and nearest-gene assignment)
- eggNOG-mapper v2 (orthology-based functional/GO annotation)
Supporting standard Unix utilities used includegffread,awk,sort, andbedtools closest/intersect.
Together, these scripts fully document the relationship between the raw SNP dataset and all derived results reported in Cummins et al. (2025).
Access information
Other publicly accessible locations of the data:
- N/A
Data was derived from the following sources:
- N/A
