Use of a single reference genome for genome-wide association studies (GWAS) limits the gene space represented to that of a single accession. This limitation can complicate identification and characterization of genes located within presence/absence variations (PAVs). In this study, we present the draft de novo genome assembly of PHJ89, an Oh43-type inbred line. Using three separate reference genome assemblies (B73, PH207, and PHJ89) that represent the predominant germplasm groups of maize, we generated three separate whole-seedling gene expression profile and single nucleotide polymorphism (SNP) matrices from a panel of 942 diverse inbred lines. We identified 34,447 (B73), 39,672 (PH207), and 37,436 (PHJ89) transcripts that are not present in the respective reference genome assembly. GWAS was conducted in the 942 inbred panel using both the SNP and expression data values to map sugarcane mosaic virus (SCMV) resistance. Highlighting the impact of alternative reference genomes in gene discovery, GWAS results for SCMV resistance using expression values as a surrogate measure of PAV resulted in robust detection of the physical location of a known resistance gene when using the B73 reference that contains the gene, but not when using the PH207 reference. This study provides the valuable resource of the Oh43-type PHJ89 genome assembly as well as SNP and expression data for 942 individuals generated using three different reference genomes.
data_dryad_readme_18Jan19.txt
Readme.
942_FPKM_B73_genes_w_feature.txt
Expression abundances of B73 v4 annotated genes (protein coding, tRNA, miRNA, and lincRNA genes).
942_FPKM_B73_RTAs.txt
Expression abundances of B73-derived novel transcripts.
942_FPKM_PH207_genes.txt
Expression abundances of PH207 annotated genes.
942_FPKM_PH207_RTAs.txt
Expression abundances of PH207-derived novel transcripts.
942_FPKM_PHJ89_genes.txt
Expression abundances of PHJ89 annotated genes.
942_FPKM_PHJ89_RTAs.txt
Expression abundances of PHJ89-derived novel transcripts.
942_FPKM_LOCONF_B73_genes_w_feature.txt
Expression abundances of B73 v4 annotated genes (protein coding, tRNA, miRNA, and lincRNA genes).
942_FPKM_LOCONF_B73_RTAs.txt
Expression abundances of B73-derived novel transcripts.
942_FPKM_LOCONF_PH207_genes.txt
Expression abundances of PH207 annotated genes.
942_FPKM_LOCONF_PH207_RTAs.txt
Expression abundances of PH207-derived novel transcripts.
942_FPKM_LOCONF_PHJ89_genes.txt
Expression abundances of PHJ89 annotated genes.
942_FPKM_LOCONF_PHJ89_RTAs.txt
Expression abundances of PHJ89-derived novel transcripts.
B73_plus_RTAs_snp_matrix_995785.txt.gz
SNP calls B73 plus B73-derived novel transcripts.
PH207_plus_RTAs_snp_matrix_988252.txt.gz
SNP calls PH207 plus PH207-derived novel transcripts.
PHJ89_plus_RTAs_snp_matrix_995238.txt.gz
SNP calls PHJ89 plus PHJ89-derived novel transcripts.
Trinity_B73_unmapped_transcriptome_assembly.fasta
B73-derived novel transcripts.
Trinity_PH207_unmapped_transcriptome_assembly.fasta
PH207-derived novel transcripts.
Trinity_PHJ89_unmapped_transcriptome_assembly.fasta
PHJ89-derived novel transcripts.
B73_plus_RTAs_snp_matrix_imputed.zip
widiv_942g_979873SNPs_imputed_filteredGenos_withRTA_AGPv4.hmp.txt - Imputed SNP calls B73 plus B73-derived novel transcripts. SNPs on contigs that are not part of the 10 chromosomes are coded as being on chromosome 11, and SNPs on RTAs are coded as being on chromosome 12.
widiv_942g_column_name_converter.txt - Converts genotype names between unimputed (column 'Original') and imputed (column 'NoSpaces').
PH207_plus_RTAs_snp_matrix_imputed.zip
widiv_942g_971213SNPs_imputed_filteredGenos_withRTA_PH207ref.hmp.txt - Imputed SNP calls PH207 plus PH207-derived novel transcripts. SNPs on RTAs are coded as being on chromosome 11, and SNPs on contigs that are not part of the 10 chromosomes are coded as being on chromosome 12.
widiv_942g_column_name_converter.txt - Converts genotype names between unimputed (column 'Original') and imputed (column 'NoSpaces').
phj89_final_asm.no_desc.min_1k.fa
Genome assembly of Z. mays PHJ89 - min1kb scaffolds.
phj89_final_asm.no_desc.fa
Genome assembly of Z. mays PHJ89 - all scaffolds.
BUSCO_result.zip
Folder contains the output files from running BUSCO on the PHJ89 genome assembly.
phj89_gene_models.hc.gff3
Generic feature format file (gff3) of high-confidence genes in PHJ89.
phj89_gene_models.hc.cdna.fa
Transcript sequences (cDNA) of high-confidence genes in PHJ89.
phj89_gene_models.hc.cds.fa
Coding sequences (CDS) of high-confidence genes in PHJ89.
phj89_gene_models.hc.pep.fa
Protein sequences of high-confidence genes in PHJ89.
phj89_gene_models.hc.func_anno.txt
Functional annotation of high-confidence genes in PHJ89.
phj89_gene_models.lc.gff3
Generic feature format file (gff3) of low-confidence genes in PHJ89.
phj89_gene_models.lc.cdna.fa
Transcript sequences (cDNA) of low-confidence genes in PHJ89.
phj89_gene_models.lc.cds.fa
Coding sequences (CDS) of low-confidence genes in PHJ89.
phj89_gene_models.lc.pep.fa
Protein sequences of low-confidence genes in PHJ89.
phj89_gene_models.lc.func_anno.txt
Functional annotation of low-confidence genes in PHJ89.