Skip to main content

Adaptive potential of Coffea canephora from Uganda in response to climate change

Cite this dataset

de Aquino, Sinara et al. (2022). Adaptive potential of Coffea canephora from Uganda in response to climate change [Dataset]. Dryad.


Understanding vulnerabilities of plant populations to climate change could help preserve their biodiversity and reveal new elite parents for future breeding programs. To this end, landscape genomics is a useful approach for assessing putative adaptations to future climatic conditions, especially in long-lived species such as trees. We conducted a population genomics study of 207 Coffea canephora trees from seven forests along different climate gradients in Uganda. For this, we sequenced 323 candidate genes involved in key metabolic and defense pathways in coffee. Seventy-one SNPs were found to be significantly associated with bioclimatic variables, and were thereby considered as putatively adaptive loci. These SNPs were linked to key candidate genes, including transcription factors, like DREB-like and MYB family genes controlling plant responses to abiotic stresses, as well as other genes of organoleptic interest, like the DXMT gene involved in caffeine biosynthesis and a putative pest repellent. These climate-associated genetic markers were used to compute genetic offsets, predicting population responses to future climatic conditions based on local climate change forecasts. Using these measures of maladaptation to future conditions, substantial levels of genetic differentiation between present and future diversity were estimated for all populations and scenarios considered. The populations from the forests Zoka and Budongo, in the northernmost zone of Uganda, appeared to have the lowest genetic offsets under all predicted climate change patterns, while populations from Kalangala and Mabira, in the Lake Victoria region, exhibited the highest genetic offsets. The potential of these findings in terms of ex-situ conservation strategies are discussed.


Study species and sample selection:
Uganda is divided into sixteen climate zones based on precipitation patterns as defined by Basalirwa (1995), five of which host C. canephora stands. Within these five climate zones, 207 georeferenced trees were sampled from seven wild forests in 2012 and 2014 by the National Agricultural Research Organization (NARO, Uganda) and collaborators of the Institut de Recherche pour le Développement (IRD, Montpellier, France). These forests include: Budongo (n=65), Itwara (n=23), Kibale (n=19), Kalangala (n=10), Mabira (n=25), Malabigambo (n=16) and Zoka (n=49). Populations in Zoka, Budongo, Kalangala, Mabira and Malabigambo occurred in distinct climatic envelopes, while the climatic envelopes in Itwara tended to overlap those of Kibale (Kiwuka et al., 2021). In each targeted forest, leaf samples were collected from five sub-sites that were separated by distances of at least 5 km.

Selection of candidate genes and bait design:
The 323 candidate genes (CGs) selected for the present study have been annotated and/or functionally characterized in previous studies. They all code for candidate proteins already reported to play important roles in central metabolism or in plant responses and adaptation to abiotic stress. The CG sequences were retrieved from the whole genome assembly of C. canephora (Denoeud et al., 2014) according to the annotation available on the Coffee Genome Hub ( (Dereeper et al., 2015).
Probes were designed to cover each CG coding region as well as 1 kb upstream and 500 bp downstream flanking regions, so as to include putatively regulatory regions. The 120 bp MyBaits® probes were designed with 2X tiling and synthesized by MYcroarray provider (Ann Arbor, Michigan, USA). A total of 21,306 probes were designed. Each candidate probe was BLASTed against the C. canephora genome (Denoeud et al., 2014) and filtered based on the manufacturer’s stringent criteria (Mariac et al., 2022).

Library preparation and sequencing:
DNA extractions for the 207 samples were performed at the IRD facilities from silica-gel dried leaves according to a previously described protocol (Mariac et al., 2006). Genomic libraries were constructed using the protocols outlined in Rohland & Reich (2012) and Mariac et al. (2014). The 207 individual libraries were then capture-enriched by pools of 48 libraries using the synthetic RNA MyBaits® probes and according to the MYcroarray protocol (Mariac et al., 2022). The enriched pools were quantified using real-time PCR and combined in equimolar ratios prior to sequencing on one lane of 150 bp paired end reads on an Illumina HiSeq 3000 sequencer (GeT-PlaGe Platform, GenoToul, Toulouse, France).

SNP genotyping, calling and filtering:
Sequence analysis was performed using scripts published by Mariac et al. (2014) and Scarcelli et al. (2016) and also available on GitHub (;
The mapping step was carried out using BWA MEM 0.7.5a-r405 (Li & Durbin, 2009) with the default option (-B 4) and the C. canephora assembly ( as reference. SNP calling was done using UnifiedGenotyper in the Genome Analysis Toolkit (GATK v3.6). SNPs located on the selected CG sequences were considered as ‘in-target’ and the other ones as ‘off-target’. Two successive sets of filters were applied to raw SNPs. We first discarded low quality variants according to the quality criteria recommended by GATK, and selected only biallelic SNPs using VCFtools v0.1.13 (Danecek et al., 2011).
We applied additional filters for population genetic analyses and for association analyses, i.e. keeping SNPs with no excess of heterozygous genotypes (< 0.8), a minor allele frequency (MAF) greater than 5% and under linkage equilibrium. For the latter filter, SNPs were processed with PLINK 1.90b4 (Purcell et al., 2007) to prune only SNPs in approximate linkage equilibrium based on the pairwise correlation between the SNP genotype counts for 100 bp sliding windows with 10 bp steps (option -indep-pairwise). The SNPs were considered correlated when r2 > 0.5. These filters led to a total of 5,860 SNPs: 4,753 in-target and 1,107 off-target loci.

Bioclimatic data:
Environmental factors (bioclimatic variables BIO1-19, Table S1) were downloaded from the WorldClim database (, Fick & Hijmans, 2017) at 30 arc-second resolution (~1 km) for ‘Current conditions ~1960-2000’


Agropolis Fondation CAPES EMBRAPA, Award: 1402-003

Centre de Coopération Internationale en Recherche Agronomique pour le Développement, Award: 1502-611