Data from: Globally-deployed sorghum aphid resistance gene RMES1 is vulnerable to biotype shifts but being bolstered by RMES2
Data files
Nov 29, 2023 version files 61.91 GB
-
dart_haitipop_associaiton_analysis.zip
189.60 MB
-
geographic_mapping.zip
35.78 KB
-
NIL_genome_characterization.zip
60.23 GB
-
README.md
8.05 KB
-
resequencing_SAPBAP_association_analysis.zip
1.49 GB
Abstract
Durable host plant resistance (HPR) to insect pests is critical for sustainable agriculture. Natural variation exists for aphid HPR in sorghum (Sorghum bicolor) but the genetic architecture and phenotype have not been clarified for most sources. To assess the threat of a sorghum aphid (Melanaphis sorghi) biotype shift, we characterized the phenotype of Resistance to Melanaphis sorghi 1 (RMES1) and contributing HPR architecture in globally-admixed populations selected under severe aphid infestation in Haiti. We developed and sequenced RMES1 near-isogenic lines and found RMES1 reduces sorghum aphid fecundity but not bird cherry-oat aphid (Rhopalosiphum padi) fecundity, suggesting a discriminant HPR response typical of gene-for-gene interaction. Analyzing whole-genome resequencing of a global diversity panel, we found resistant alleles at a second gene, RMES2, were more frequent than RMES1 resistant alleles in landraces and historic breeding lines. RMES2 contributes early and mid-season aphid resistance in a segregating F2 population, however, RMES1 was only significant with mid-season fitness. In a fixed population with high aphid resistance, RMES1 and RMES2 were selected for demonstrating a lack of significant antagonistic pleiotropy. Associations with resistance co-located with cyanogenic glucoside biosynthesis genes support additional HPR sources. Globally, therefore, a vulnerable HPR source (RMES1) is bolstered by a second common source of resistance in breeding programs (RMES2) which may be staving off a biotype shift.
README: Data from: Globally-deployed sorghum aphid resistance gene RMES1 is vulnerable to biotype shifts but being bolstered by RMES2
https://doi.org/10.5061/dryad.rv15dv4f6
Data and scripts relevant for four analyses are included here:
-NIL genome characterization - newly generated resequencing data of near-isogenic lines and their parents was used to determine the genomic contribution of NILs
-Resequencing SAP,BAP association - previously generated phenotype data (Poosapati et al., 2022) and resequencing data (Boatwright et al., 2022, LeBauer et al., 2020) was combined to newly generate marker-trait association data
-Haitian breeding population association - newly generated DART sequencing and phenotype data was used to generate marker-trait association data
-Geographic mapping of RMES1 and RMES2
Description of the data and file structure
NIL genome characterization
- Raw .fastq files are included in NIL_genome_characterization/data. SCAR_1, SCAR_2, and SCAR_5 are RMES1+, SCAR_2 and SCAR_3 are Rmes1-. SCAR_1 and SCAR_3 are used as NIL+ and NIL-, respectively.
- Scripts for processing fastq files through VCF generation are in NIL_genome_characterization/scripts.
- Remaining files in NIL_genome_characterization/scripts are used for parallelization of pipeline. Final filtered output is in NIL_genome_characterization/results.
- The RTx430v2.1 genome required for mapping can be downloaded from https://phytozome-next.jgi.doe.gov/info/SbicolorRTx430_v2_1.
NIL genome characterization
- data
- 1_S1_L003_R1_001.fastq.gz #raw fastq data for sample 1 read 1
- 1_S1_L003_R2_001.fastq.gz #raw fastq data for sample 1 read 2
- (remaining 27 samples continue)
- scripts and resources
- chromosomes.tsv #list of chromosome groups (ID, chrom listing chromosomes 1-10)
- gdb2vcf-chromo-array.sh #generate vcf from gvcf database using gatk, genotypegvcfs
- gdb2vcf-sg-array.sh #generate vcf from gvcf database, scaffold groups using gatk, genotypegvcfs
- genomics-db-array.sh #generate gvcf database with gatk, gdb import
- genomics-db-scaffold-array.sh #generate gvcf database from scaffolds using gatk, gdb import
- gvcf-array-job.sh #generate gvcf file using gatk, haplotype caller
- map-array-job.sh #map reads to reference using BWA
- mark-dups.sh #mark duplicate reads using picard
- numbered-chromosomes_Chr.tsv #list of chromosomes used by scripts (index and chr = ChrXX)
- numbered-chromosomes.tsv #list of chromosomes used by scripts (index and chr = chromosome_XX)
- numbered-sample-bams.tsv #list of sample bams used by scripts (index, sample IDs, bamopts = input field for bam files)
- numbered-samples.tsv #list of samples used by scripts (index 1-28, sample S1-S28)
- numbered-scaff-groups.tsv #list of scaffolds used by scripts (index, sg = scaffold group)
- numbered-units.tsv #metadata relevant for bioinformatic analysis and interpretation (index, sample = SXX in running order, library metadata [library, platform, flowcell, lane], sample ID = genotype name and replicate, adapter barcode, path, kilobyte size of samples)
- scaffold_groups.tsv #list of scaffold groups (ID, chrom = scaffold name, length, and cumulative length)
- trim-samples.sh #trim samples using trimmomatic
Resequencing SAP,BAP association - beagle_PI_Sorghum_d8.665samples.vcf.gz contains genotype data for available SAP and BAP lines in VCF format.
Poosapati_Traits_norm.txt contains phenotype data, previously normalized and published (Poosapati et al., 2022).
The tassel_pca.sh script was used to generate popstr_PC.txt and popstr_PC_rmes2fixed.txt which contain population structure (PC1,PC2,PC3) and fixed covariate for RMES2.
The tassel_glm.sh and tassel_glm_rmes2fixed.sh script was used to generate marker-trait associations.
Top association outputs can be found in the supplementary data of the accompanying article.
Resequencing SAP,BAP association
- tassel_pca.sh #generates PCA data (popstr_PC) used for GWAS using TASSEL
- beagle_PI_Sorghum_d8.665samples.vcf.gz #imputed vcf data for 665 genotypes in the SAP and BAP
- Poosapati_Traits_norm.txt #resistance traits normalized and reported previously in Poosapati et al. 2022 (first column = PI genotype, second column = normalized phenotypes)
- popstr_PC_rmes2fixed.txt #population structure (PC 1,2,3) (same as popstr_PC.txt) (first column = PI genotype, column 2 - 4 = PC1 - PC3)
- popstr_PC.txt #population structure (PC 1,2,3) (first column = PI genotype, column 2 - 4 = PC1 - PC3)
- tassel_glm_rmes2fixed.sh #generates associations for resistance with RMES2 as fixed covariate using TASSEL
- tassel_glm.sh #generates associations for resistance using TASSEL
Haitian breeding population association - Data generated by DArT sequencing of Haitian breeding lines are in Report_SilicoDArT_1.csv, Report_SNP_2.csv, and Report_SNP_mapping_2.csv.
- DArT_data_processing.R was used to process raw data.
- HBP_2021_Dart_Indeldata_filtered_IndCall0.5_SNPcall0.5_Rep0.9_numeric_BeagleImputed.Rdata and
- HBP_2021_Dart_SNPdata_filtered_IndCall0.5_SNPcall0.5_Rep0.9_numeric_BeagleImputed.Rdata contains imputed genotype data.
- HBP_2021_PopulationStructure_PCs_and_Hclust_and_PhenoPC.csv and Kinship_rTassel_2021HBP_SNPs.Rdata contains population structure, phenotype, and kinship matrix for Haitian lines.
- The script ASreml_MTA.R was used to generate marker-trait associations.
- Top association outputs can be found in the supplementary data of the accompanying article.
Haitian breeding population association
- ASreml_MTA.R #generates associations with survival using ASreml
- DArT_data_processing.R #R script for processing raw DART sequencing data
- HBP_2021_Dart_Indeldata_filtered_IndCall0.5_SNPcall0.5_Rep0.9_numeric_BeagleImputed.Rdata #imputed genotype data
- HBP_2021_Dart_SNPdata_filtered_IndCall0.5_SNPcall0.5_Rep0.9_numeric_BeagleImputed.Rdata #imputed genotype data
- HBP_2021_PopulationStructure_PCs_and_Hclust_and_PhenoPC.csv #population structure, phenotype, and kinship data for genotypes (Taxa = genotype ID, PC1 - PC5, ID = running number, plate ID = dart genotyping plate, phenotype data = stand count, rows, number of leaves without colonies, yellowing binary score, survival binary score)
- Kinship_rTassel_2021HBP_SNPs.Rdata #kisnhip data determined tassel
- Report_SilicoDArT_1.csv #raw DART sequencing data recieved
- Report_SNP_2.csv #raw DART sequencing data recieved
- Report_SNP_mapping_2.csv #raw DART sequencing data recieved
Geographic mapping - Chr06_09_SNPS.recode.vcf and Chr09_INDEL_62669680.recode.vcf contain RMES1 and RMES2 allelic data (resequencing, unimputed) for lines with known landrace or breeding origin. The script geographic_mapping.R was used to plot allele information. Metadata columns with NA and sample ID was unused in analysis.
Geographic mapping
- Chr06_09_SNPs.recode.vcf #unimputed marker data for RMES1 and RMES2 resistance associated SNPs
- Chr09_INDEL_62669680.recode.vcf #unimputed marker data for RMES2 resistance associated INDEL
- geographic_mapping.R #R script for generating geographic maps and pie charts
- Sorghum_Metadata.csv #metadata and germplasm origin for genotypes (ISnum = NA, PInumber = NA, Duplicate remove = libraries kept, LIB = sample ID matching vcf genotypes, Sample = alternative sample ID, PlantName = informative name, Germplasm = NA/unused, BotanicalType = NA/unused, CountryOfOrigin = NA/unused, Region = NA/unused, lat = latitude, lon = longitude, category_cv = germplasm origin designation)
Sharing/Access information
Data was derived from the following sources:
Methods
NIL genome characterization - newly generated resequencing data of near-isogenic lines and their parents was used to determine the genomic contribution of NILs
Resequencing SAP,BAP association - previously generated phenotype data (Poosapati et al., 2022) and resequencing data (Boatwright et al., 2022, LeBauer et al., 2020) was combined to newly generate marker-trait association data
Haitian breeding population association - newly generated DART sequencing and phenotype data was used to generate marker-trait association data
Geographic mapping - plotting allele information of RMES1 and RMES2 with geographic coordinates or breeding origin metadata