A whole-genome resequencing–based SNP dataset for lentil
Data files
May 13, 2026 version files 81.61 GB
-
lentil_wgs_238samples.vcf.gz
81.61 GB
-
README.md
1.03 KB
May 13, 2026 version files 81.61 GB
-
lentil_wgs_238samples.vcf.gz
81.61 GB
-
README.md
1.01 KB
Abstract
Lentil is an important legume crop with significant nutritional value, playing a pivotal role in environmentally sustainable agricultural systems. The lentil whole-genome resequencing data have not been reported. In this study, we resequenced a total of 238 Lens accessions, including 112 cultivated lentils, 71 landraces, and 55 wild species, and generated a comprehensive map of lentil genome variation with 103,290,296 single nucleotide polymorphisms (SNPs). The highest numbers of variants were observed in wild species overall. Population genomic analysis revealed that lentil was first domesticated in the Near East. Among wild species, L. orientalis showed the closest relationship with cultivated lentils. Demographic history analysis demonstrated the divergence time between L. orientalis and L. culinaris was around 12 Kya. Scans for selective sweeps indicated traits including flowering time and disease resistance might have been under continuous selection during domestication. The genetic architecture of Fusarium root rot resistance and other economically and agronomically important traits was also identified. The two genes encoding a toll-interleukin receptor nucleotide-binding site leucine-rich repeat (TIR-NBS-LRR) protein and an auxin-binding protein were predicated as candidate genes associated with Fusarium root rot resistance. This study provides valuable genomic resources for both basic and applied efforts to understand and exploit the genetic basis of important traits for lentil crop improvement via molecular breeding.
Dataset DOI: 10.5061/dryad.n02v6wxbh
Description of the data and file structure
This lentil genomic variants dataset contains ~103 million single nucleotide polymorphisms (SNPs) identified from whole-genome resequencing of 238 lentil samples. The sequencing reads were aligned to the lentil reference genome CDC Redberry v2.0. Variants were called using GATK v4 software and stored in Variant Call Format (VCF, v4.2) for downstream analysis.
Files and variables
File: lentil_wgs_238samples.vcf.gz
Access information
The whole-genome resequencing raw data used to generate the SNP dataset are available at the National Center for Biotechnology Information-Sequence Read Archive (NCBI-SRA) database (https://www.ncbi.nlm.nih.gov/sra) under accession number PRJNA1128027.
Change Log
Version 2: minor edit in README. No file changed.
