Skip to main content

Genotyping-by-sequencing data for a Haitian sorghum breeding program

Cite this dataset

Morris, Geoffrey et al. (2021). Genotyping-by-sequencing data for a Haitian sorghum breeding program [Dataset]. Dryad.


Rapid environmental change can lead to extinction of populations or evolutionary rescue via genetic adaptation. In the past several years, smallholder and commercial cultivation of sorghum (Sorghum bicolor), a global cereal and forage crop, has been threatened by a global outbreak of an aggressive new biotype of sugarcane aphid (SCA; Melanaphis sacchari). Here we characterized genomic signatures of adaptation in a Haitian sorghum breeding population, which had been recently founded from admixed global germplasm, extensively intercrossed, and subjected to intense selection under SCA infestation. We conducted evolutionary population genomics analyses of 296 post-selection Haitian lines compared to 767 global accessions at 159,683 single nucleotide polymorphisms. Despite intense selection, the Haitian population retains high nucleotide diversity through much of the genome due to diverse founders and an intercrossing strategy. A genome-wide fixation (FST) scan and geographic analyses suggests that adaptation to SCA in Haiti is conferred by a globally-rare East African allele of RMES1, which has also spread to other breeding programs in Africa, Asia, and the Americas. De novo genome sequencing data for SCA resistant and susceptible lines revealed putative causative variants at RMES1. Convenient low-cost markers were developed from the RMES1 selective sweep and successfully predicted resistance in independent U.S. × African breeding lines and eight U.S. commercial and public breeding programs, demonstrating the global relevance of the findings. Together, the findings highlight the potential of evolutionary genomics to develop adaptive trait breeding technology and the value of global germplasm exchange to facilitate evolutionary rescue.


Genotypes for the 296 Haitian breeding lines were generated with genotyping-by-sequencing. Genomic DNA digestion, ligation and PCR amplification processes were performed according to the methods previously described (Morris et al. 2013). The libraries were sequenced using the single-end 100-cycle sequencing by Illumina HiSeq2500 (Illumina, San Diego CA, USA) at the University of Kansas Medical Center, Kansas City, MO, USA. A total of 220 million reads for the HBP were combined with published data for the GDP (Morris et al. 2013) for SNP calling. TASSEL 5 GBS v2 pipeline (Glaubitz et al. 2014) was used to perform the SNP calling of the sequence data obtained from Illumina sequencing. Reads were aligned to the BTx623 sorghum reference genome v.3.1 (McCormick et al. 2018) with the Burrows-Wheeler Alignment (Li and Durbin 2009). The SNPs were filtered for 20% missingness, then missing data were imputed using BEAGLE 4.0 (Browning and Browning 2016)

Usage notes

Two files are provided:

(1) haiti_global_all_together.hmp.txt

These are SNP genotypes for the Haitian sorghum breeding lines (N=296) and global reference lines (N = 767) in hapmap format generated by the TASSEL GBS pipeline. More information on hapmap format is availble here (

(2) key.csv

This the key file for the TASSEL GBS pipeline, which can be used to regenerate the SNP genotype calls from raw sequencing reads. Note, raw sequencing reads are not provided here on Dryad due to the large file sizes, but can be found on the NCBI Sequence Read Archive ( under accession PRJNA757369. See Glaubitz et al. 2014 PLOS ONE for more information on the use of key files in the TASSEL GBS pipeline.


United States Agency for International Development, Award: AID-OAA-LA-16-00003

United States Department of Energy, Award: DE-AC02-05CH11231