Genetic data of 326 diverse common beans and genetic and phenotypic data of two RIL populations segregating for pod shattering
Data files
Jan 16, 2026 version files 169.82 MB
-
Final-Pv-326-geno-110095-SNP.vcf
166.04 MB
-
PodShatteringAndTwistQTLmappingData.xlsx
3.77 MB
-
README.md
2.76 KB
Jan 19, 2026 version files 169.83 MB
Abstract
Here, we investigate the loss of seed dispersal via pod shattering during common bean (Phaseolus vulgaris L.) domestication. We identified PvMYB26 mutations in all three main gene pools of common bean, including an 8 kb deletion in Middle American lines eliminating the gene’s transcription start site and promoter; a frameshift/truncation deletion in the independently domesticated Andean population; and another frameshift/truncation insertion in the genetic background of the "undomesticated" debouckii population. Mutants with the 8 kb deletion express PvMYB26 at <1% of the level of wild types and produce 44% less pod lignin. RNA in situ hybridization and fluorescence microscopy show that PvMYB26 is expressed in the lignified fiber layer of pods, while mutants show no visible expression and have a greatly reduced fiber layer. Sequencing of 327 accessions revealed that the mutation is nearly diagnostic for domestication status among Middle American common bean and identified a 125 kb hard selective sweep, indicating the gene’s importance in domestication. The main Andean frameshift mutation is found in 84.5% of Andean domesticates but 0% of wild lines, while the debouckii truncation was identified in six domesticated lines of Race Peru, suggesting a third proto-domestication of common bean may have occurred in Ecuador and/or northern Peru. Wild haplotypes most like Middle American domesticates are found in eastern Jalisco, Mexico, strongly suggesting West-Central Mexico as the site of common bean domestication and the rise of agriculture in Middle America.
Dataset DOI: 10.5061/dryad.bnzs7h4qs
Description of the data and file structure
This data deposition includes 1) a vcf of 110,095 genome-wide SNPs of 326 diverse common beans; and 2) genetic and phenotypic data of two RIL populations, which segregate for pod shattering, used for QTL mapping in ASMap and R/qtl.
Dataset 1: Genome-wide SNPs are presented as a vcf, aligned to the 5-593 genome. All 326 samples of common bean were sequenced at 12x-16x depth with 150 bp paired-end sequencing on a Novaseq 6000. Information for the reference sequence can be found here: https://phytozome-next.jgi.doe.gov/info/Pvulgaris5_593_v1_1 \
FASTA for that reference sequence is stored in Pvulgaris5_593_696_v1.0.fa.gz available at: https://data.jgi.doe.gov/refine-download/phytozome?organism=Pvulgaris5-593&expanded=696&phytozome_version=14&_gl=11wkerk2_gaMTQ3Mzg4ODIyLjE3NTkxNjIyNDM._ga_YBLMHYR3C2*czE3NjE4NTc2NzgkbzE2JGcwJHQxNzYxODU3Njc4JGo2MCRsMCRoMA
SNPs were extracted using mpileup, varscan, and GATK.
Dataset 2: The data used for QTL mapping were derived from the Khufu method of skim sequencing, conducted by the Hudson-Alpha Institute. Two biparental populations were involved, one descended from a cross between Canario 707 and PI 638850; the other population was descended from a cross between Orca and PI 638850.
The populations were used to conduct QTL mapping of pod shattering and the related trait of pod twists.
Files and variables
File: PodShatteringAndTwistQTLmappingData.xlsx
Description: Genotype and phenotype data for two populations, Canario 707 x Wild PI 638850 ("CxW"); and Orca x Wild PI 638850 ("OxW"), on separate tabs. Files are ready to read into ASMAP / Rqtl.
Variables
- Pod shattering phenotypes (two-year averages)
- Pod twist phenotypes (two-year averages)
- RIL name
- Genome-wide SNP data
File: Final-Pv-326-geno-110095-SNP.vcf
Description: VCF (Variant Call Format) file of 110,095 genome-wide SNPs of 326 diverse accessions of P. vulgaris.
File: Celebioglu_et_al._VCF_sample_domestication_status_and_gene_pool.xlsx
Description: Sample metadata for Final-Pv-326-geno-110095-SNP.vcf, describing the domestication status and gene pool of origin at PvMYB26.
Code/software
For QTL mapping of the RIL populations, these are formatted for ASMAP and Rqtl. The VCF file can be opened with a wide variety of bioinformatics software.
Access information
Other publicly accessible locations of the data:
- None
Changes after Jan 16, 2026: We have added sample metadata for the domestication status and gene pool of origin at PvMYB26 in the Celebioglu_et_al._VCF_sample_domestication_status_and_gene_pool.xlsx
