Skip to main content

Data from: Genome-wide association studies across environmental and genetic contexts reveal complex genetic architecture of symbiotic extended phenotypes

Cite this dataset

Batstone, Rebecca et al. (2022). Data from: Genome-wide association studies across environmental and genetic contexts reveal complex genetic architecture of symbiotic extended phenotypes [Dataset]. Dryad.


A goal of modern biology is to develop the genotype-phenotype (G→P) map, a predictive understanding of how genomic information generates trait variation that forms the basis of both natural and managed communities. As microbiome research advances, however, it has become clear that many of these traits are symbiotic extended phenotypes, being governed by genetic variation encoded not only by the host’s own genome, but also by the genomes of myriad cryptic symbionts. Building a reliable G→P map therefore requires accounting for the multitude of interacting genes and even genomes involved in symbiosis. Here we use naturally-occurring genetic variation in 191 strains of the model microbial symbiont Sinorhizobium meliloti paired with two genotypes of the host Medicago truncatula in four genome-wide association studies (GWAS) to determine the genomic architecture of a key symbiotic extended phenotype – partner quality, or the fitness benefit conferred to a host by a particular symbiont genotype, within and across environmental contexts and host genotypes. We define three novel categories of loci in rhizobium genomes that must be accounted for if we want to build a reliable G→P map of partner quality; namely, 1) loci whose identities depend on the environment, 2) those that depend on the host genotype with which rhizobia interact, and 3) universal loci that are likely important in all or most environments.

IMPORTANCE: Given the rapid rise of research on how microbiomes can be harnessed to improve host health, understanding the contribution of microbial genetic variation to host phenotypic variation is pressing, and will better enable us to predict the evolution of (and select more precisely for) symbiotic extended phenotypes that impact host health. We uncover extensive context-dependency in both the identity and functions of symbiont loci that control host growth, which makes predicting the genes and pathways important for determining symbiotic outcomes under different conditions more challenging. Despite this context-dependency, we also resolve a core set of universal loci that are likely important in all or most environments, and thus, serve as excellent targets both for genetic engineering and future coevolutionary studies of symbiosis.


Detailed methods are included in the PDF associated with this dataset. We performed four greenhouse experiments to estimate partner quality phenotypes in Sinorhizobium meliloti. In each experiment, plants from one of two host lines (either A17 or DZA) were grown in single inoculation with each of 191 S. meliloti strains, with three to four replicates per strain per experiment (six to eight total replicates for each plant line x strain combination, N = 2,825 plants total). We measured multiple proxies of partner quality, namely leaf chlorophyll A content, plant height, number of leaves, and above-ground dried shoot biomass.

Simutaneously, we sequenced the entire genomes of all 191 S. meliloti strains, called single nucleotide polymorphisms (SNPs, henceforth referred to as variants), and performed genome-wide association tests that accounted for rhizobium population structure and included only unlinked variants. We determined which loci were significantly associated with partner quality using a permutation method, and binned these loci into three categories based on the context-dependency of their phenotypic effects, and thus, their contribution to the layers of the G→P map for each of our symbiotic extended phenotypes, as described in the abstract.

Usage notes

All raw data and code to reproduce analyses is available on GitHub ( While we focus the main text on shoot biomass, we additionally have data on leaf chlorophyl A content, plant height, and leaf number, which is accessible via GitHub.


National Science Foundation, Award: IOS-1645875

National Science Foundation, Award: NPGI-1401864

Joint Genome Institute, Award: CSP-1223795

Carl R. Woese Institute for Genomic Biology