Local adaptation in an alpine finch, GWAS and FST output files

Research facility: Colorado State University

Published Mar 27, 2026 on Dryad. https://doi.org/10.5061/dryad.kwh70rzhs

Data files

Mar 27, 2026 version files 12.37 GB

Abstract

Understanding patterns and mechanisms underlying local adaptation is becoming increasingly important for species conservation amid anthropogenically driven environmental change. Alpine systems are experiencing particularly intense pressure from environmental change resulting from increased rates of warming and corresponding loss of snow and ice. We integrate morphological and genetic analyses to identify traits important for local adaptation in one of the highest elevation breeding birds in North America, the Sierra Nevada Gray-crowned Rosy Finch. We performed an in-depth analysis of how traits with known links to thermoregulation in birds, such as wing length, bill size, and feather microstructure, vary between two populations at sites with contrasting climate and environmental conditions. We identified loci underlying these traits using a genome-wide association study and further examined regions of the genome related to altitude adaptation and cold tolerance using F_ST outlier tests. Together, these results indicate that temperature, food availability, and alpine landscape features may impose multifaceted and potentially conflicting selective pressures on morphological traits important to adaptation in alpine birds. Overall, this work represents one of the first in-depth analyses of the genetic basis of adaptation in an alpine specialist songbird.

Dataset DOI: 10.5061/dryad.kwh70rzhs

Description of the data and file structure

Overview

This repository contains data and analysis outputs associated with the study:

The Genetic and Morphological Basis of Local Adaptation to Elevational Extremes in an Alpine Finch

The dataset stored here includes:

Genome-wide association study (GWAS) outputs
FST outlier analyses
Gene annotation and candidate gene identification outputs

These data were used to investigate patterns of local adaptation in the Gray-crowned Rosy Finch (Leucosticte tephrocotis dawsonii) through the integration of genomic and phenotypic variation.

Study Design Summary

Samples were collected from alpine populations and included:

Morphological measurements (e.g., beak traits, wing chord, body size metrics)
Feather microstructure measurements (e.g., barbule density, node density)
Whole-genome sequencing data

Whole-genome sequencing libraries were prepared using a modified Nextera protocol and sequenced on an Illumina NovaSeq 6000. Reads were processed through a Snakemake pipeline. GWAS analyses were conducted using GEMMA, implementing:

Bayesian sparse linear mixed models (BSLMM)
Univariate linear mixed models (ULMM)

FST outliers were identified using OutFLANK, and candidate genes were identified using BEDTools closest. Functional enrichment was conducted using PANTHER and STRING.

Files and variables

File: bcftools_BCRF.zip

Description: Output from bcftools closest identifying genes nearest to candidate SNPs using the Brown-capped Rosy Finch genome annotation.

Contents include:

SNP position
Closest gene
Distance to gene
Gene annotation

These files were used for candidate gene identification.

Files Inside:

bslmm_0.5_BCRF_soft_distance_new.bed and lmm_snps0.5_BCRF_soft_distance_new.bed are the raw outputs from the bcftools closest function.

genes_trait_bslmm0.5_0.01_25kb_BCRF_soft_new.csv is the output from the bslmm analysis with the associated trait and gene. The "_clean" extension means uncharacterized genes (LOC genes) have been removed.

genes_trait_lmm0.5_25kb_BCRF_soft_new.csv is the same as above.

File: lmm_results.zip

Description: Filtered results from ULMM analysis, including only significant SNPs.

Filtering threshold:

p-value < 5 × 10⁻⁸

Files Inside:

GCRF_merged.SNP.filtered_0.5 file for each trait showing the outlier SNPs for each trait.

lmm_snps_0.5_fst_new.csv shows the fst values for each SNP.

lmm_summary_outliers_0.5_soft.csv is all the outliers for all the traits.

File: bslmm_results.zip

Description: Filtered BSLMM output.

Filtering threshold:

Mean posterior inclusion probability (PIP, gamma) > 0.01

Files Inside:

Here is a file for each trait showing the outlier SNPs given different filtering thresholds (PIP 0.01, 0.1, 0.05).

Summary_.csv* are the summary files for each PIP filtering with all the outlier loci.

hyperparameters are the folder with all the hyperparameter outputs from BSLMM for each trait and each PIP filtering

File: bslmm_output.zip

Description: Raw output files from GEMMA Bayesian sparse linear mixed model (BSLMM).

Includes:

Posterior inclusion probabilities (PIP) (represented as gamma)
Effect size estimates
Model convergence outputs

File: lmm_output.zip

Description: Raw output files from GEMMA univariate linear mixed model (ULMM) analyses.

Includes:

Effect sizes
Standard errors
p-values
SNP identifiers

File: OutFLANK_fst_results_0.005.csv

Description: Results from genome-wide FST outlier detection using OutFLANK (Whitlock & Lotterho,s 2015). OutFLANK identifies loci with unusually high FST values relative to the neutral distribution, suggesting potential selection.

Variables

rs: SNP identifier.
He: Expected heterozygosity at the locus across populations
FST: Corrected FST value estimated by OutFLANK
T1: Numerator of the FST calculation after trimming extreme values.
T2: Denominator of the FST calculation after trimming.
FSTNoCorr: Uncorrected FST value prior to OutFLANK trimming and correction. (Used internally by OUTFLANK)
T1NoCorr: Uncorrected numerator component of FST calculation.
T2NoCorr: Uncorrected denominator component of FST calculation.
meanAlleleFreq: Mean allele frequency across populations.
indexOrder: Original SNP order from input file.
GoodH: Logical flag indicating whether heterozygosity falls within acceptable bounds. TRUE = SNP retained for neutral distribution fitting. FALSE = SNP excluded (often due to low heterozygosity)
qvalues: False discovery rate (FDR) corrected p-values.
pvalues: Two-tailed p-values for each SNP.
pvaluesRightTail: Right-tailed p-values.
OutlierFlag: Logical indicator of whether the SNP is classified as an outlier. TRUE = significant FST outlier. FALSE = not significant

Code/software

The files included in this dataset are outputs from genome-wide association and population differentiation analyses. These results were generated using GEMMA and OutFLANK, both freely available software tools.

Genome-Wide Association Analyses

Genome-wide association analyses were conducted using GEMMA v0.98.1 (Zhou & Stephens 2012; Zhou et al. 2013).

Two models were implemented:

Bayesian Sparse Linear Mixed Model (BSLMM)

BSLMM analyses were performed using:

500,000 burn-in iterations
5,000,000 sampling iterations
Posterior inclusion probability (PIP) threshold: 0.01

These analyses produced the following files:

bslmm_output.zip — raw GEMMA output
bslmm_results.zip — filtered significant SNPs

Univariate Linear Mixed Model (LMM)

Univariate linear mixed models were run using the GEMMA Wald test.

Filtering threshold:

p-value < 5 × 10⁻⁸

These analyses produced:

lmm_output.zip — raw GEMMA output
lmm_results.zip — filtered significant SNPs

GEMMA input files were generated using PLINK v1.9 (Purcell et al. 2007; Chang et al. 2015).

FST Outlier Analysis

Genome-wide FST outlier detection was conducted using OutFLANK (Whitlock & Lotterhos, 2015) implemented in R v4.2.0.

OutFLANK identifies loci under potential selection by:

Fitting a neutral FST distribution
Identifying SNPs with unusually high differentiation

Parameters used:

False discovery rate (FDR): 0.005
Default trimming parameters

This analysis produced:

OutFLANK_fst_results_0.005.csv

Reproducibility

Scripts used to run analyses and generate outputs are available at:

https://github.com/ecnrobertson/GCRF_Pub

This repository contains:

GEMMA workflow scripts
OutFLANK analysis scripts
Filtering and plotting scripts

Software References

Zhou, X. & Stephens, M. (2012). Genome-wide efficient mixed-model analysis. Nature Genetics, 44, 821–824.
Zhou, X., Carbonetto, P., & Stephens, M. (2013). Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genetics, 9, e1003264.
Whitlock, M. C., & Lotterhos, K. E. (2015). Reliable detection of loci responsible for local adaptation. Molecular Ecology, 24, 1039–1051.
Purcell, S. et al. (2007). PLINK: A tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics, 81, 559–575.

Sample Collection

A total of 171 Gray-crowned Rosy Finches were collected using potter traps in the White Mountains (n=98) and Piute Pass (n=73) in June and July, the peak of the species’ breeding season (MacDougall-Shackleton et al., 2000). Birds showing signs of stress were released, resulting in slight variation in the types of data collected for each individual. Blood samples were collected from 150 of the captured birds using the brachial wing vein and stored in Queen’s lysis buffer at room temperature (Owen, 2011; Seutin et al., 1991). At the same time, morphological measurements - tarsus length, wing chord, beak width, beak length, beak depth, and nare length - were also collected from 154 of the birds using an electronic caliper, and 5-10 body feathers were collected from the breast of all individuals. Birds were banded to allow for identification of the individuals and to avoid recapture, then released. All handling and banding of birds was done following the guidelines and protocols of the U.S Geological Survey (USGS) Bird Banding Laboratory (BBL) and complied with permits and permissions from federal and state agencies.

Feather Microstructure Measurements

We measured the microstructure of three feathers for each individual. For each feather, three photos were taken using an Olympus BX51 Microscope: one photo each of an unbroken pennaceous and plumulaceous barb from the center of each region was taken at 4x objective, and one photo of a plumulaceous barbule was taken at 10x objective. Structures were then measured from the photos using the segmented line tool in ImageJ. For the pennaceous and plumulaceous barbule length, we measured one barbule from each region, starting from the distal tip to where the barbule meets the barb. For the pennaceous and plumulaceous barbule density, we drew a 0.5mm line along the middle of a barb, counting the number of barbules along that line. And finally, for plumulaceous node density, we drew a 0.2mm line along the center of a barbule, counting the number of nodes along that line (Fig 2A). Each replicate measure for a feather trait was averaged per individual prior to downstream analysis.

Sequencing and Genotyping

DNA was extracted from blood samples of 150 individuals using Qiagen’s DNeasy Blood and Tissue Extraction Kit protocol and quantified with the Qubit dsDNA HS Assay Kit. We used a modified version of Illumina’s Nextera Library Preparation protocol to prepare WGS libraries and pooled the libraries by equal mass before sequencing. The resulting libraries were sequenced on 2 lanes of an Illumina NovaSeq 6000 at Novogene Corporation.

To process the raw sequence reads and detect variants, we utilized Snakemake (Molder et al. 2021), a workflow management system that provides efficiency, adaptability, and reproducibility. The pipeline we adapted to our species can be found on GitHub (https://github.com/eriqande/mega-non-model-wgs-snakeflow/). Sequence data were trimmed using fastp (S. Chen et al., 2018) to remove adaptor sequences and polyG tails using a sliding window, and then aligned to a Brown-capped Rosy Finch (Leucosticte australis) reference genome (GenBank: GCA_025504685.1) using Burrows-Wheeler Aligner software (H. Li & Durbin, 2009). We marked PCR duplicates using samtools (Li et al., 2009) and read groups (sample, lane, library) were added using picard (http://broadinstitute.githut.io/picard). Individual coverage was estimated using samtools, and given the range of coverage (XX-XX), we downsampled BAM files to 10x using the samtools subsample function (Li et al., 2009). Individual gvcf files were created using GATK’s HaplotypeCaller (Poplin et al., 2018) with high base quality score filters (--min-base-quality-score 33 --minimum-mapping-quality 20) to remove batch effects (Lou and Therkildsen, 2021). We then parallelized the calling of genotypes across 3 million bp regions across the genome using the GenomicsDBImport and GenotypeGVCFs functions (Auwera & O’Connor, 2020; McKenna et al., 2010). GATK versions above 4.0 call missing data as homozygous reference, which can bias downstream analysis. To avoid this, all loci with a depth of zero were manually marked as missing. To remove systematic errors according to GATK variant quality score recalibration (VQSR) best practices, we hard filtered variants with the following parameters: StrandOddsRatio (>3.0), FisherStrand (>60.0), MappingQuality (>40.0), and Quality by Depth (>2.0). We further filtered Single Nucleotide Polymorphisms (SNPs), removing indels, filtering for missingness (<50%), allele frequencies (>0.5 and <0.95), depth (>4x), quality (>30.0) using bcftools ( Li, 2011. After filtering, the resulting high-quality SNPs were passed through BEAGLE 4.1 (Browning & Browning, 2016) to impute missing genotypes.

Local adaptation in an alpine finch, GWAS and FST output files

Data files

Abstract

README: Local adaptation in an alpine finch, GWAS and FST output files

Description of the data and file structure

Overview

Study Design Summary

Files and variables

File: bcftools_BCRF.zip

File: lmm_results.zip

File: bslmm_results.zip

File: bslmm_output.zip

File: lmm_output.zip

File: OutFLANK_fst_results_0.005.csv

Variables

Code/software

Genome-Wide Association Analyses

Bayesian Sparse Linear Mixed Model (BSLMM)

Univariate Linear Mixed Model (LMM)

FST Outlier Analysis

Reproducibility

Software References

Methods

Sample Collection

Feather Microstructure Measurements

Sequencing and Genotyping