NBS-LRR PacBio sequencing from a Glycine max diversity panel
Data files
Oct 23, 2024 version files 16.45 GB
-
Hodge_G.max_NBS-LRR_Pacbio_Reads.tar
16.45 GB
-
README.md
1.15 KB
-
Renseq_Library_Dictionary.csv
1.08 KB
Abstract
Numerous sources of putative novel resistance genes towards Phytophthora sojae (Rps genes) have been identified and loci mapped in soybeans (Glycine max L. Merr.) but cloning has remained elusive. We utilized resistance gene enrichment sequencing (RenSeq) to identify the putative resistance genes in 20 plant introductions (PIs) and differentials of the cultivar Williams with rps, Rps1c, Rps3a, and Rps8. The DNA from these genotypes was enriched and sequenced using more than 25,000 80 nt baits designed for nucleotide-binding leucine-rich repeat (NLR) encoding sequences. Overall, there were greater numbers of variants in the NLR-encoding genes in Rps loci, on Chrs 3, 7, 13, and 18 for the 20 plant introductions as compared to the Williams differentials for rps, Rps1c, Rps1k, Rps3a, and Rps8. Genes encoding Rps1c, Rps3a, and Rps8 were proposed based on sequence differences among the differentials. Among the 20 PIs, there may be additional alleles on Chrs 3, 13, and 18, and PI 399079 may have two new alleles, at Chrs 3 and 7 loci. A unique NLR on Chr 8 was identified in PI 200553. New alleles were also identified on Chrs 3 and 18 when the PI and resistant bulks were compared to susceptible RILs. This study demonstrates the utility of RenSeq as an efficient method to identify and predict specific novel NLR genes in landrace soybean germplasm, which confer resistance to P. sojae and obtain gene-specific markers to facilitate their introgression into modern cultivars.
https://doi.org/10.5061/dryad.v15dv424q
This dataset contains 34 fastq files with Pacbio Hifi reads generated from 34 Glycine max genotypes. These genotypes are sources of known and novel resistance to Phytophthora sojae NBS-LRR genes. The NBS-LRR genes were bait-captured from each genotype individually and prepared as an individual library. All libraries were pooled together and sequenced on the PacBio platform.
Description of the data and file structure
Each fastq file contains the raw PacBio Hifi reads of each sequencing library. The libraries have been packaged together into a gzip/tar compressed file. The compressed file needs to be downloaded and then tar/gzip decompressed to access the fastq files using tar –xvzf filename
. Also included as a separate file is a dictionary that describes which Glycine max genotype (cultivar or plant introduction) each sequencing library corresponds to.
Sharing/Access information
NA
Code/Software
Software used included Minimap2, Clair3, SnpEff, and Haplosaurus.
The data are raw reads of PacBio Hifi long reads sequenced on PacBio Sequel flow cells. This dataset has not been processed, these are the raw unprocessed fastq files from the experiment.