Stanton-Geddes, John1; Paape, Timothy1; Epstein, Brendan1; Briskine, Roman1; Yoder, Jeremy1; Mudge, Joann2; Bharti, Arvind K.2; Farmer, Andrew D.2; Zhou, Peng1; Denny, Roxanne1; May, Gregory D.2; Erlandson, Stephanie1; Yakub, Mohammed1; Sugawara, Masayuki1; Sadowsky, Michael J.1; Young, Nevin D.1; Tiffin, Peter1

Published Jul 12, 2013 on Dryad. https://doi.org/10.5061/dryad.pq143

Abstract

Genome-wide association study (GWAS) has revolutionized the search for the genetic basis of complex traits. To date, GWAS have generally relied on relatively sparse sampling of nucleotide diversity, which is likely to bias results by preferentially sampling high-frequency SNPs not in complete linkage disequilibrium (LD) with causative SNPs. To avoid these limitations we conducted GWAS with >6 million SNPs identified by sequencing the genomes of 226 accessions of the model legume Medicago truncatula. We used these data to identify candidate genes and the genetic architecture underlying phenotypic variation in plant height, trichome density, flowering time, and nodulation. The characteristics of candidate SNPs differed among traits, with candidates for flowering time and trichome density in distinct clusters of high linkage disequilibrium (LD) and the minor allele frequencies (MAF) of candidates underlying variation in flowering time and height significantly greater than MAF of candidates underlying variation in other traits. Candidate SNPs tagged several characterized genes including nodulation related genes SERK2, MtnodGRP3, MtMMPL1, NFP, CaML3, MtnodGRP3A and flowering time gene MtFD as well as uncharacterized genes that become candidates for further molecular characterization. By comparing sequence-based candidates to candidates identified by in silico 250K SNP arrays, we provide an empirical example of how reliance on even high-density reduced representation genomic makers can bias GWAS results. Depending on the trait, only 30–70% of the top 20 in silico array candidates were within 1 kb of sequence-based candidates. Moreover, the sequence-based candidates tagged by array candidates were heavily biased towards common variants; these comparisons underscore the need for caution when interpreting results from GWAS conducted with sparsely covered genomes.

StantonGeddes2013_Medicago_plant_data

Spreadsheet of plant phenotype data collected in greenhouse experiment for genome-wide association study. Column headings are: block, pot (individual plant ID), trtmnt (rhizobia treatment applied to the plant, either "rhz_12" for a mixture of two rhizobia strains, or "control" for no rhizobia), HM_accession (Medicago HapMap accession), height_1 (height recorded at about two weeks), leaves_1 (number of leaves at about two weeks), height_2 (intermediate height measurement), height_3 (final height before harvest), branch_3 (number of branches on plant before harvest), flowering date (date first flower observed), nodule_above (number of nodules counted in top 5 cm of roots), nodule_below (number of nodules below 5 cm of root growth). More information is available in the R markdown script "StantonGeddes2013_script.Rmd" that accompanies this file.

R script for accession phenotypes

R markdown script used to generate the accession means that were used in the genome-wide association analysis using the TASSEL program.

StantonGeddes2013_script.Rmd

StantonGeddes2013_Medicago_leaf_data

Spreadsheet of plant leaf data collected by Mohammed Yakub for genome-wide association analysis. Columns are: pot (unique ID that corresponds to plants in "StantonGeddes2013_Medicago_plant_data.csv" file), Trichomes (number of trichomes in the 2 mm long area immediately below the leaf), width mm (width of the petiole in the area where trichomes were counted), petiole area (petiole width times 2 cm), leaf area (area estimated from leaf scan by ImageJ), leaf weight (weight of each leaf after at least 24 hours of drying)

StantonGeddes2013_Medicago_nodule_occupancy

Spreadsheet of nodule occupancy data, described in more detail in README file.

R script for nodule occupancy accession phenotypes

R script to calculate rhizobia strain nodule occupancy for genome-wide association analysis.

StantonGeddes2013_Medicago_nodule_occupancy_script.r

README for Tassel analysis

PDF file that describes how Tassel analysis was performed, and analysis of resulting data with R scripts

MtHap_GWA_README.pdf

shell script for Tassel analysis

Portable Bash System (PBS) shell script for Tassel analysis run on Minnesota Supercomputing Institute servers

tassel_shell.pbs

R script for Tassel analysis

R script called by PBS shell script to analyse results of Tassel

tassel_results.r

R functions for Tassel analysis

Functions required by R for analysis of Tassel results

tassel_analysis_functions.r

Data from: Candidate genes and genetic architecture of symbiotic and agronomic traits revealed by whole-genome, sequence-based association genetics in Medicago truncatula

Data files

Abstract

StantonGeddes2013_Medicago_plant_data

R script for accession phenotypes

StantonGeddes2013_Medicago_leaf_data

StantonGeddes2013_Medicago_nodule_occupancy

R script for nodule occupancy accession phenotypes

README for Tassel analysis

shell script for Tassel analysis

R script for Tassel analysis

R functions for Tassel analysis

Data from: Candidate genes and genetic architecture of symbiotic and agronomic traits revealed by whole-genome, sequence-based association genetics in Medicago truncatula

Data files

Abstract

Usage notes

StantonGeddes2013_Medicago_plant_data

R script for accession phenotypes

StantonGeddes2013_Medicago_leaf_data

StantonGeddes2013_Medicago_nodule_occupancy

R script for nodule occupancy accession phenotypes

README for Tassel analysis

shell script for Tassel analysis

R script for Tassel analysis

R functions for Tassel analysis

Works referencing this dataset