SNP genotypes of the international institute of tropical agriculture Cowpea Core
Data files
Sep 26, 2023 version files 717.98 MB
Abstract
Cowpea (Vigna unguiculata [L.], Walp.) is a member of the family Fabaceae, subfamily Faboideae (a.k.a. Papilionoideae) and tribe Phaseoleae, along with other “warm season” legumes such as soybean, common bean, mung bean, adzuki bean, Bambara groundnut and others. The International Institute of Tropical Agriculture (IITA) in Ibadan, Nigeria maintains the world’s largest collection of cowpea germplasm with a collection size of 16,460 accessions as of September 2023. Information on these accessions is available online (https://my.iita.org/accession2/collection.jspx?id=1), including passport data, characterization descriptors and images. A subset known as the “IITA Cowpea Core” is comprised of nearly 2,100 accessions (Mahalakshmi et al. 2007; doi: 10.1017/S1479262107837166). Here we present single nucleotide polymorphism (SNP) data for most of the accessions in the IITA Cowpea Core, along with a few simple observations resulting from analyses of the SNP data.
The SNP data were generated using the Illumina iSelect Cowpea Consortium Array, described in Muñoz-Amatriaín et al. 2017 (doi: 10.1111/tpj.13404). A characteristic of this platform is that missing data (a.k.a. “nocall”) is nearly always either attributable to a SNP assay that failed technically and is excluded from all samples (“zeroed”) or a result of either the absence of a DNA segment in the genome or a sequence difference near the SNP position that precludes a successful assay. As a consequence, the frequency of “nocall” provides a broad indicator of whether or not a given accession is in the same species as cowpea. A few accessions in the IITA Cowpea Core are outliers, as follows. Three accessions (TVu-14726, TVu-16409, TVu-8383) had more than 31,000 nocalls (after excluding 1,863 zeroed SNPs) and fairly low (181 or 246) or moderate (1,337) heterozygous calls, indicating that these are not in the same species as cowpea. The passport data from IITA notes TVu-8383 as “wild” and TVu-14726 as “landrace”. There is no additional passport information on TVu-16409, but based on its SNP characteristics TVu-16409 clearly also is not in the same species as cowpea. Three other accessions (TVu-5540, TVu-6968, TVu-14935) have from 8,985 to 10,025 nocalls (after excluding 1,863 zeroed SNPs), which indicates that these accessions are more closely related to cowpea, but also are from a different species. Among these three accessions, TVu-14935 also had 18,134 heterozygous SNPs, which indicates that the plant representing this accession was not highly inbred. Residual heterozygosity is a common characteristic among single plant representatives of germplasm accessions, which begin as one or more seeds collected from their original open-pollenating location and then proceed through a variable number of selfed generations to become more inbred in germplasm collections.
Five accessions that are stated in the passport data to be from three V. unguiculata subspecies other than subspecies unguiculata had the same range of nocalls (708 to 943) as accessions reported to be subspecies unguiculata, consistent with the expectation that the cultivated subspecies all are closely related to each other. Two of these (TVu-3661 and TVu-3662) are stated in the passport data to be subspecies dekindtiana, which is generally considered to be the reservoir of variation for subspecies unguiculata, and one (TVu-3657) is stated to be subspecies cylindrica. The other two (TVu-3652, TVu-3656) of these five accessions are stated to be subspecies sesquipedalis, which has been well documented to be readily crossable with subspecies unguiculata. It should be noted also that there are several other accessions from Asia that are not specifically marked as sesquipedalis. Analysis of the overall population structure of the IITA Cowpea Core places these Asian accessions within the same sub-population as the accessions stated to be sub-species sesquipedalis. Based on principle component analysis, five sub-populations are evident among the IITA Cowpea Core, one from West Africa represented by Sanzi, another from West Africa represented by Suvita-2, one from Asia represented by TZ30 and ZN016, one from Northeast Africa, Europe and California represented by CB5-2, and one from South and East Africa represented by UCR779.
It is anticipated that the IITA Cowpea Core SNP dataset can provide a useful resource for a number of genome-wide association studies (GWAS) and decisions related to germplasm management.
README: SNP Genotypes of the International Institute of Tropical Agriculture Cowpea Core
https://doi.org/10.6086/D19Q37
The two-sheet SNP dataset is provided as a .xlsx file, which is a zipped, XML-based file format that can be opened with Microsoft Excel (Office 2007 or later), LibreOffice Calc, Google Sheets, Apache OpenOffice and others. The URLs for the cited references are as follows.
Liang et al. 2023; https://doi.org/10.1002/tpg2.20319
Lonardi et al. 2019; https://doi.org/10.1111/tpj.14349
Mahalakshmi et al. 2007; https://doi.org/10.1017/S1479262107837166
Muñoz-Amatriaín et al. 2017; https://doi.org/10.1111/tpj.13404
Muñoz-Amatriaín et al. 2021; https://doi.org/10.1002/leg3.95
Description of the data and file structure
Missing data is indicated as "--". Of the 51,128 attempted SNP assays using the Illumina iSelect Cowpea Consortium Array, a total of 1,863 SNP assays that failed technically were "zeroed out" in the Illumina GenomeStudio workspace as missing data ("--") for all samples, leaving a total of 49,265 SNP assays whose data were further considered. As noted in the methods, and in reference to Liang et al. 2023 Table S03, 677 additional SNPs were excluded. Most of these 677 excluded SNP assays provided frequencies of heterozygous calls that are impossible given the highly inbred nature of the accessions, leaving a total of 48,588 SNPs with data deemed to be reliable for further analyses. For these filtered 48,588 SNPs, missing data is most often indicative of one of two situations. One is presence/absence variation (PAV), which for many SNPs involves two or more adjacent SNPs (when sorted by position) that are absent in unison across a large portion of accessions. The other is a mismatch between the target sequence abutting a SNP position and the SNP-interrogating oligonucleotide, a situation that precludes determination of the SNP allele. The frequency of this latter situation parallels the genetic distance between a given accession and the set of subspecies unguiculata accessions from which the SNP assays were designed.
The first sheet in the SNP data table (Watson strand for the sheet named “v1.0_Watson”) has column organization as follows. Column A: SNP name, sorted alphanumerically by name. Column B: chromsome number (or unmapped contig name) as per Liang et al. 2023. Column C: nucleotide position within chromosome or unmapped contig as per Liang et al. 2023. Column D: the allele call on the Watson strand of the assembled genome of IT97K-499-35, as per Liang et al. 2023. Column E: the alternate allele on the Watson strand at the SNP position, as per Liang et al. 2023. Column F: the two possible SNP alleles on the Watson strand. Column G: the two possible SNP alleles considering the iSelect Forward Strand. Column H: the orientation of the iSelect Forward Strand for SNP calls relative to the Watson strand. Column I: recommendation to exclude the SNP from data analyses. Columns J through BZX the set of SNP calls for each of 2,043 accessions, which includes 2,036 IITA Cowpea Core accessions and the 7 assembled genomes described in Lonardi et al. 2019 (IT997K-499-35 only) and Liang et al. 2023.
The second sheet in the SNP data table (iSelect Forward Strand for the sheet named “iSelect_FWD”) has column organization as follows. Column A: SNP name, sorted alphanumerically by name. Columns B through BZP the set of SNP calls for each of 2,043 accessions, which includes 2,036 IITA Cowpea Core accessions and the 7 assembled genomes described in Lonardi et al. 2019 (IT997K-499-35 only) and Liang et al. 2023.
Sharing/Access information
No additional Sharing/Access information.
Code/Software
No Code/Software.
Methods
Young, tender leaf tissue was excised from one plant of each accession of the IITA Cowpea Core collection, then dried prior to DNA extraction. A total of 1,789 plants that provided leaf tissue were grown at the International Institute of Tropical Agriculture (IITA) in Ibadan, Nigeria and 15 at the IITA in Kano, Nigeria. Leaves from 1,655 of these plants were dried inside sealable plastic bags containing packets of silica gel, then shipped at ambient temperature from IITA (Ibadan and Kano) to the University of California in Riverside (UCR), California, USA. An additional 232 tissue samples from IITA Cowpea Core accessions that were included in the “UCR Minicore” (Muñoz-Amatriaín et al. 2021; doi: 10.1002/leg3.95) were prepared the same way (young leaves, dried with silica gel packets) from individual plants grown in greenhouses at UCR. DNA was extracted at UCR from each of these 1,887 dried leaf samples using either Qiagen (https://www.qiagen.com/us) DNeasy Plant or Machery-Nagel (https://www.mn-net.com/us/) NucleoMag kits. DNA was prepared at IITA from desiccated leaf tissue of an additional 149 accessions using a CETAB method, then these DNA solutions were sent to UCR at ambient temperature. All of these 2,036 DNA samples (10 µL each) were arranged in 96-well plates at UCR at concentrations ranging from 50 to 450 ng/µL, then transported to the University of Southern California Molecular Genomics Core facility (https://uscnorriscancer.usc.edu/molecular-genomics-core/) for single nucleotide polymorphism (SNP) genotyping using the Illumina (https://www.illumina.com/) iSelect Cowpea Consortium Array, which was described in Muñoz-Amatriaín et al. 2017. The tissue production, DNA extraction and genotyping occurred incrementally over a period of 6.5 years from May 2014 through November 2020. Raw SNP data and sample sheets were transferred to UCR by FTP, then imported into the Illumina GenomeStudio software using a cluster file developed in 2014 to 2015 at UCR for broad cowpea germplasm, then exported as “Forward Strand” in a tab-delimited text file. SNP data from seven diverse cowpea accessions (CB5-2, IT97K-499-35, Sanzi, Suvita-2, TZ30, UCR779, ZN016) that have been assembled as described in Liang et al. 2023 (doi: 10.1002/tpg2.20319) also were included, taking the total number of accessions in this dataset to 2,043. There was no apparent difference in the quality or completeness of the SNP data, regardless of the DNA extraction method or DNA concentration in this range. The orientation of the cowpea iSelect “Forward Strand” is arbitrary relative to the “Watson” strand of the assembled genome sequence of accession IT97K-499-35 (Lonardi et al. 2019; doi: 10.1111/tpj.14349). So, in addition to providing the SNP data as iSelect “Forward Strand”, which has been used for numerous publications, here we provide the SNP data in a spreadsheet containing two sheets, with the first sheet being the SNPs according to the “Watson” strand, and the second sheet being the SNPs as iSelect “Forward Strand”. Additional information in the spreadsheet includes chromosome (or contig if unmapped), nucleotide position in IT97K-499-35, and other information indicating confidence to use or exclude the data for a given SNP, as described in Liang et al. 2023 Table S03.
Usage notes:
The two-sheet SNP dataset is provided as a .xlsx file, which is a zipped, XML-based file format that can be opened with Microsoft Excel (Office 2007 or later), LibreOffice Calc, Google Sheets, Apache OpenOffice and others.