Skip to main content

Spatial and temporal genetic variation in Ethiopian barley (Hordeum vulgare L.) landraces as revealed by simple sequence repeat (SSR) markers

Cite this dataset

Dido, Allo Aman et al. (2022). Spatial and temporal genetic variation in Ethiopian barley (Hordeum vulgare L.) landraces as revealed by simple sequence repeat (SSR) markers [Dataset]. Dryad.


Ethiopia is a center of diversity for barley (Hordeum vulgare L.) and it is grown across different agro-ecologies of the country. Unraveling population structure and gene flow status on temporal scales assists an evaluation of the consequences of physical, demographic as well as overall environmental changes on the stability and persistence of populations. Here, we examine spatial and temporal genetic variation within and among barley landrace samples collected over a period of four decades (1976-2017), using simple sequence repeat (SSR) markers. Our objective was to evaluate spatial and temporal changes in barley population connectivity associated with the closure of geographic origin and time periods. Low to strong genetic diversity was observed among the landraces and STRUCTURE, Neighbour joining tree and Discriminant Analysis of Principal Component analysis revealed three clusters. The cluster analysis revealed a close relationship between landraces along geographic proximity with genetic distance increases along with geographic distance. The grouping of landraces based on altitudinal classes was influenced by geographic proximity. From AMOVA year categories, it was observed that within population genetic diversity much higher than between population genetic diversity and that the temporal differentiation is considerably smaller. The low to strong genetic differentiation between landraces from various geographic origins could be attributed to gene flow across the region as a consequence of seed exchange among farmers. Nevertheless, we found some connectivity between changes in population dynamics as well as contemporary gene flow. The results demonstrate that this set of SSRs was highly informative and was useful in generating a meaningful classification of barley germplasms. Furthermore, our data also suggest that landraces are a source of valuable germplasm for sustainable agriculture in the context of future climate change, and that in-situ conservation strategies based on farmers use can conserve the genetic identity of landraces while allowing adaptation to local-environments.


Plant Material

A total of 384 barley genotypes, including 376 landraces and 8 cultivars were used in this study. The commercial cultivars used in this analysis include Abdanie, Guta, Dafo, HB-1964, HB-1966, HB-42, Ardu-12-60B and Aruso (six-rowed barley). These improved commercial varieties were obtained from Holetta and Sinana Agricultural Research Centers in the central and southeastern highlands of Ethiopia, respectively. On the other hand, the landraces were obtained from the Ethiopian Biodiversity Institute (EBI) along with their passport data. For data analysis, the improved varieties were only used to study the relationship within and among barley genotypes. Landraces from regions with sample size less than five were also included in adjacent regions to reduce experimental error due to small sample size. This reduced the 42 agro-ecological zones from which the landraces were originally drawn to 15 zones viz: Oromia (six zones), Amhara (three zones), Tigray (two zones), Southern Nations, Nationalities and Peoples, SNNP (three zones) and Benishangul Gumuz (one zone).

The 384 collected barley landraces comprised 88 landraces collected from Amhara, 188 from Oromia, 42 from SNNP, 57 from Tigray and 9 from Benishangul Gumuz. Major barley growing highland regions of Ethiopia, the Oromia and Amhara regions, have been represented by more samples. The representative samples were carefully selected among Hordeum accessions available at Ethiopian Biodiversity Institute (EBI) ex-situ Genebank and those representing different geographical locations of the country together with their passport data. All landraces are of spring growth type of which 178 are two-rowed, 186 are six-rowed and 20 are irregular barley type.

Genotyping by SSR markers

Genomic DNA was extracted by the CTAB method (Doyle 1991) from fresh leaves of sampled individuals. A total of 10 single individual per accession were samples and bulked for genomic DNA extraction. A total of 49 SSR markers were selected for analysis, covering the seven chromosomes of barley genome.

Genetic diversity analysis

For each region (locality) and each year, summary statistics, such as allele number per locus (Na), number of effective allele (Ne), Shannon’s information index (I, Keylock, 2005), gene diversity (GD, Nei, 1987), polymorphic information content (PIC, Nagy et al., 2012), observed heterozygosity (Ho) and the expected heterozygosity (He, Berg and Hamrick, 1997, heterozygosity expected under Hardy–Weinberg equilibrium that accounts for both the number and the evenness of alleles), allele richness (Ar, El Mousadik and Petit, 1996), inbreeding coefficient (Fis) and the fixation index  (FST) (Weir and Cockerham, 1984) among populations were calculated using GeneAlEx 6.51b2 software and the hierfstat R package (Goudet and Jombart, 2015). The proportion of the total genetic variance contained in a subpopulations (Fst) relative to the total genetic variance was computed within each year also using hierfstat.

Inter-individual genetic distances

Nei’s genetic distance (1983) was calculated and used for unrooted phylogeny reconstruction based on UPGMA methods as implemented by PowerMarker software and the tree was visualized using MEGA-X version 10.2.2 (Sudhir et al. 2018).  The inter‐individual genetic distances was calculated using principal components analysis (PCA) using adegenet (Jombart, 2008). Principal coordinate analysis (PCoA) was carried out in GeneAlEx version 6.51b2 (Peakall and Smouse, 2012) and analysis of molecular variance (AMOVA) was calculated by R package poppr (Kamvar et al. 2014). Linear regression analysis of the PIC, Shannon Wiener index and PI with altitude and longitude was conducted using Excel. By inverting Wright's formula (Wright, 1951), the value of Nm can be estimated from FST, as Nm = (1- FST)/ 4 FST, where `N` is the size of each population and `m` is the migration rate between populations. This approach is effective to estimate gene flow indirectly.

Spatio-temporal genetic variation

To evaluate the effects of sampling sites and year of sampling on patterns of genetic variation, we performed a permutation-based multivariate analysis of variance by using the function adonis of the vegan package (Oksanen et al., 2017) in R. This method partitions sum of squares for distance matrices in a manner similar to AMOVA, but allows for both nested and crossed factors (Paradis, 2010). We evaluated the effects of sampling sites and year of sampling as cross check factors on the matrix of individual genetic distances. Statistical significance was assessed using 9,999 permutations. Given the signal of temporal variability observed, subsequent analyses were done for each year separately. Due to variation in the number of sampling sites and the number of individuals sampled per site among years, we performed a rarefied bootstrap to normalize for the minimum number of sites per year and the minimum number of individuals per site to ensure that there was no bias due to the unbalanced sampling. We subsampled the data keeping only 12 sites per year and 5 individuals per site and performed the analysis of molecular variance (AMOVA).

Clustering analysis

In this study, we searched for genetic groups using discriminant analysis of principal components (DAPC) implemented in the adegenet (Jombart, 2008) package in R. DAPC maximizes differences among clusters while minimizing variation within but does not rely on a particular population genetic model, such as Hardy–Weinberg equilibrium, which is unrealistic for out breeding populations (Whitlock, 1992). For each year, we used the function find.clusters to determine the number of clusters and also Bayesian information criterion (BIC) was used to identify the most probable number of clusters (K) present in the data. Discriminant analysis of principal component (DAPC) provides membership probabilities to these clusters for each individual, which we examined for geographic structure.

Isolation by distance (IBD)

We evaluated for IBD by testing the correlation between genetic distance and the geographic Euclidean distance between all pairs of individuals. Significance of the correlation between the two distance matrices was assessed by a Mantel test using the mantel.randtest function of the ade4 R package with 9,999 permutations (Dray and Dufour, 2007).

Spatial structure analysis

The combination of genetic and geographic information can improve our ability to identify loosely differentiated populations and can give us precise spatial locations of genetic barriers or hidden clusters (Storfer et al., 2007). Given the weak overall structure (i.e., clusters and IBD; see above), we also tested for cryptic spatial genetic structure within each year using spatial principal component analysis (spca; Jombart et el., 2008). As suggested by Jombart et al. (2008), this spatial multivariate method employs Moran's index (I) of spatial autocorrelation (Moran, 1948) to detect global structures. We used the spca function employed in the adegenet (Jombart et al., 2008) of R package. We used the inverse distance analysis method for testing linkages in the system, given that: (a) Sampling sites were unevenly spread over the study area; (b) we had no a priori hypothesis about their connectivity. Significance was checked using permutation test (n = 9,999) (Jombart et al., 2008).

Usage notes

No missing values. The only thing required is request letter to Ethiopian Biotechnology Institute (EBTi) and Ethiopian Biodiversity Institute (EBI).