Population genomics reveals demographic history and climate adaptation in Japanese Arabidopsis halleri
Data files
Sep 24, 2024 version files 1.20 GB
-
Ah_JP_EU.Q20.MAF005.DP15004300.FM01.vcf.gz
735.70 MB
-
Ah_JP.Q20.MAF005.DP12003600.FM01.vcf.gz
460.24 MB
-
README.md
2.74 KB
Abstract
Climate oscillations in the Quaternary forced species to major latitudinal or altitudinal range shifts. It has been suggested that adaptation concomitant with range shifts plays key roles in species responses during climate oscillations, but the role of selection for local adaptation to climatic changes remains largely unexplored. Here, we investigated population structure, demographic history, and signatures of climate-driven selection based on genome-wide polymorphism data of 141 Japanese Arabidopsis halleri individuals, with European ones as outgroups. Coalescent-based analyses suggested a genetic differentiation between Japanese subpopulations since the Last Glacial Period (LGP), which would have contributed to shaping the current pattern of population structure. Population demographic analysis revealed the population size fluctuations in the LGP, which were particularly prominent since the subpopulations started to diverge (~50 kya). The ecological niche modeling predicted the geographic or distribution range shifts from southern coastal regions to northern coastal and mountainous areas, possibly in association with the population size fluctuations. Through genome-wide association analyses of bioclimatic variables and selection scans, we investigated whether climate-associated loci are enriched in the extreme tails of selection scans, and demonstrated the prevailing signatures of selection, particularly toward a warmer climate in southern subpopulations and a drier environment in northern subpopulations, which may have taken place during or after the LGP. Our study highlights the importance of integrating climate associations, selection scans, and population demographic analyses for identifying genomic signatures of population-specific adaptation, which would also help us predict the evolutionary responses to future climate changes.
README: Population genomics reveals demographic history and climate adaptation in Japanese Arabidopsis halleri
https://doi.org/10.5061/dryad.1jwstqk3s
This dataset includes input SNP data and source code used in analysis of genetic differentiation and climate adaptation in Japanese Arabidopsis halleri.
We obtained re-sequencing data of Japanese (141 individuals) and European(12 Central European and 4 Romanian individuals) A. halleri. We mapped short reads to the A. halleri v2.03 assembly (DOE-JGI, http://phytozome.jgi.doe.gov/) using BWA-MEM 0.7.17-r1188 (Li, 2013), and bcftools v.1.17 mpileup and call (Danecek et al., 2021) pipeline.
Description of the data and file structure
SNP data
Ah_JP_EU.Q20.MAF005.DP15004300.FM01.vcf.gz
- SNP data for Japanese and European individuals.
- SNPs with quality ≤ 20, minor allele frequency ≤ 0.05, total depth ≤ 1,500 or total depth ≥ 4,300, and a fraction of missing individuals > 0.1 were filtered out using bcftools.
- 1,506,831 SNPs were included.
Ah_JP.Q20.MAF005.DP12003600.FM01.vcf.gz
- SNP data for Japanese individuals.
- SNPs with quality ≤ 20, minor allele frequency ≤ 0.05, total depth ≤ 1,200 or total depth ≥ 3,600, and a fraction of missing individuals > 0.1 were filtered out using bcftools.
- 1,013,450 SNPs were included.
Source code
The R scripts used in this study.
- conduct_snmf.R
- Script used for a clustering analysis using the non-negative matrix factorization (sNMF).
- conduct_lfmm.R
- Script used for identifying loci associated with local climate through genome-wide association mapping. Association mapping was performed for elevation, 19 bioclimatic variables, and seven principal components of bioclimatic variables (PC1–PC7) using LFMM.
- conduct_rehh.R
- Script used for performing genome-wide selection scans based on the XP-EHH. In this script, we first calculated the integrated extended haplotype homozygosity of a single nucleotide polymorphism site. We then calculated XP-EHH for each SNP with pairwise comparisons between the four Japanese subpopulations.
- calculate_fold_enrichment.R
- Script used for performing enrichment analyses to test whether environmentally associated peaks are enriched in regions under subpopulation-specific selective sweeps.
- In this script, we conducted permutation tests using the "genome rotation" scheme. For each of the permutations, we calculated fold enrichment between 4-kb window sets of bioclimatic association and randomly shifted 4-kb window sets of XP-EHH.
- plot_fold_enrichment.R
- Script used to plot the result for fold enrichment analysis.
Methods
We used whole genome re-sequencing data of Japanese (141 individuals from 135 populations) and European (12 individuals from 10 Central European populations and 4 individuals from 4 Romanian populations) Arabidopsis halleri individuals. Our analysis includes population structure analysis using non-negative matrix factorization (sNMF) (Frichot et al., 2014; Frichot and François, 2015), genome-wide association mapping for bioclimatic variables using latent factor mixed models (LFMM) (Caye et al., 2019; Gain and François, 2021), genome-wide selection scans based on cross-population extended haplotype homozygosity (XP-EHH) (Gautier et al., 2017), and enrichment analysis of the climate-associated loci in the extreme tails of selection scans with permutation test using the “genome rotation” scheme (Atwell et al., 2010; Horton et al., 2012; Nordborg et al., 2005; Sasaki et al., 2022; Tsuchimatsu et al., 2020).