Skip to main content
Dryad logo

Data from: Haplotype associated RNA expression (HARE) improves prediction of complex traits in maize

Citation

Giri, Anju; Burch, Merritt; Buckler, Edward; Ramstein, Guillaume (2021), Data from: Haplotype associated RNA expression (HARE) improves prediction of complex traits in maize, Dryad, Dataset, https://doi.org/10.5061/dryad.b2rbnzsf9

Abstract

Genomic prediction typically relies on associations between single-site polymorphisms and traits of interest. This representation of genomic variability has been successful for prediction within populations. However, it usually cannot capture the complex effects due to combination of alleles in haplotypes. Therefore, accuracy across populations has usually been low. Here we present a novel and cost-effective method for imputing cis haplotype associated RNA expression (HARE, RNA expression of genes by haplotype), studied their transferability across tissues, and evaluated genomic prediction models within and across populations. HARE focuses on tightly linked cis acting causal variants in the immediate vicinity of the gene, while excluding trans effects from diffusion and metabolism, so it would be more transferrable across different tissues and populations. We showed that HARE estimates captured one-third of the variation in gene expression and were more transferable across diverse tissues than the measured transcript expression. HARE estimates were used in genomic prediction models evaluated within and across two diverse maize panels – a diverse association panel (Goodman Association panel) and a large half-sib panel (Nested Association Mapping panel) – for predicting 26 complex traits. HARE resulted in up to 15% higher prediction accuracy than control approaches that preserved haplotype structure, suggesting that HARE carried functional information in addition to information about haplotype structure. The largest increase was observed when the model was trained in the Nested Association Mapping panel and tested in the Goodman Association panel. Additionally, HARE yielded higher within-population prediction accuracy as compared to measured expression values. The accuracy achieved by measured expression was variable across tissues whereas accuracy using HARE was more stable across tissues. Therefore, imputing RNA expression of genes by haplotype is stable, cost-effective, and transferable across populations.

Methods

For each line in the NAM and Goodman panels, a haplotype ID was obtained in each reference region from the PHG database using function pathsForMethod in the rPHG package in R. Details on the PHG database are presented in Franco JAV, Gage JL, Bradbury PJ, Johnson LC, Miller ZR, Buckler ES, et al. A Maize Practical Haplotype Graph Leverages Diverse NAM Assemblies. Genomics; 2020 Aug. doi:10.1101/2020.08.31.268425.

SNPs data were imputed from the haplotypes for each line in the NAM and Goodman Association panels from the PHG database. The SNPs in the same reference regions as HARE were filtered for minor allele frequency higher than 0.05 and major allele frequency smaller than 0.95, using TASSEL Version 5.0 resulting. The SNPs were then LD filtered to remove SNPs with pairwise R2 > 0.9 within 100 kb windows, using SNPRelate package in R.

Usage Notes

File gbs_282_path_minReads1_minTaxa10_splitProb09999_minTransProb35em6_try2.txt is the haplotype matrix for the maize Goodman Association panel. It includes line names in the first column and reference ranges in the first row.

File haplotypes_nam_6277_path_minReads0_minTaxa10_splitProb09999_minTransProb35em6.txt is the haplotype matrix for maize NAM panel. It also includes line names in the first column and reference ranges in the first row. 

phg_ref_range_genes_v3_v5_nondup.csv is the file to cross map phg reference ranges in v5 and corresponding genes in v3.

File filtered005_hare_goodman_sharedPositions_SNPRelate.txt is the SNP file for the Goodman Association panel. The SNPs are from HARE regions filtered for allele frequency greater than 0.05 and less than 0.95, and LD pruned for R2 > 0.9 in 100kb window. The first column is the line names in the Goodman Association panel.

File filtered005_hare_nam_sharedPositions_SNPRelate.txt is the SNP file for the NAM panel. The SNPs are from HARE regions filtered for allele frequency greater than 0.05 and less than 0.95, and LD pruned for R2 > 0.9 in 100kb window. The first column is the line names in the NAM RILs.

Funding

Agricultural Research Service, Award: 8062-21000-043-02S

National Science Foundation, Award: 1822330