SNP markers used for QTL mapping in the inbred lines

Rahman, Habibur 1 ; Megha, Swati1 ; Buchwaldt, Miles2 ; Parkin, Isobel2 ; Kebede, Berisso 1 ; Nikzad, Azam1

Research facility: University of Alberta

Published Feb 23, 2024 on Dryad. https://doi.org/10.5061/dryad.8sf7m0crw

Data files

Feb 23, 2024 version files 3.16 MB

Brassica_napus_SNP-SNP5743_6.csv

3.15 MB
README.txt

3.30 KB

Feb 23, 2024 version files 3.16 MB

Brassica_napus_SNP-SNP5743_6.csv

3.15 MB
README.md

3.57 KB

Abstract

Young leaves of the 175 inbred lines and their seven parents were collected from seedlings grown in a greenhouse. About 200 mg bulk leaf sample from three plants of a line was placed in 2 ml safe-lock Eppendorf tube and stored at ‒80 ˚C for one night prior to crushing using a Mixer Mill (TissueLyser II, Qiagen, Germany). Genomic DNA was extracted using SIGMA DNA extraction kit (Sigma-Aldrich, St. Louis, MO, USA) following the manufacturer’s instruction. DNA concentration and purity of the samples were assessed using a NanoDrop 2000c spectrophotometer (Thermo Scientific, Wilmington, DE, USA). The samples were processed and sequenced using tunable genotyping-by-sequencing (tGBS®) method by Data2Bio (Ames, IW, USA). Genomic DNA was digested using two restriction enzymes NSpI (5′-RCATG^Y-3′) and BfuCI/Sau3AI (5′-^GATC-3′) which created 3´and 5´overhangs, respectively. Two single-stranded oligos, one containing a sample-specific internal barcode and the other a universal oligo, were ligated to the complementary 3´ and 5´ overhangs, respectively. All 175 inbred lines' and seven parents' treated DNA was pooled for construction of the tGBS library and sequencing. The raw sequence data were demultiplexed by barcode, which was subsequently removed bioinformatically from each sequence. The barcode-trimmed sequence reads of genotype were further trimmed using the trimming software, Lucy (Chou & Holmes, 2001; Li & Chou 2004) to remove low-quality reads based on Phred quality scores of Q15.

Dataset DOI: 10.5061/dryad.8sf7m0crw

Description of the data and file structure

The dataset consists of a Brassica_napus_SNP-SNP5743_6.csv

SNP discovery using genotype by sequencing

Young leaves of the 184 inbred lines and their seven parents were collected from seedlings grown in a greenhouse. About 200 mg bulk leaf sample from three plants of a line was placed in a 2 ml safe-lock Eppendorf tube and stored at 80 °C for one night prior to crushing using a Mixer Mill (TissueLyser II, Qiagen, Germany). Genomic DNA was extracted using the SIGMA DNA extraction kit (Sigma-Aldrich, St. Louis, MO, USA) following manufacture’s instruction. DNA concentration and purity of the samples was assessed using a NanoDrop 2000c spectrophotometer (Thermo Scientific, Wilmington, DE, USA). The samples were processed and sequenced using tunable genotyping-by-sequencing (tGBS®) method by Data2Bio (Ames, IW, USA). Genomic DNA was digested using two restriction enzymes NSpI (5?-RCATG^Y-3?) and BfuCI/Sau3AI (5?-^GATC-3?) which created 3´and 5´overhangs, respectively. Two single-stranded oligos, one containing a sample-specific internal barcode and the other a universal oligo, were ligated to the complementary 3´ and 5´ overhangs, respectively. All 191 (184 lines + 7 parents) treated DNA was pooled for construction of tGBS library and sequencing. The raw sequence data was demultiplexed by barcode, which was subsequently removed bioinformatically from each sequence. The barcode-trimmed sequence reads of genotype were further trimmed using the trimming software, Lucy (Chou & Holmes, 2001; Li & Chou 2004) to remove low-quality reads based on Phred quality scores of Q15.

The quality trimmed reads were aligned to the reference B. napus GCA_000751015.1 (Chalhoub et al. 2014, Brassica_napus_v4.1) with bowtie2 version 2.2.0 using the –local, --sensitive, -k 50 and --score-min L, 0, 0.8 parameters (Langmead & Salzberg, 2012). SNP calling was done based on the reads that align to a single location in the reference genome using the Genome Analysis Toolkit (GATK) version 3.2.0 UnifiedGenotyper tool with parameters -glm BOTH and -ploidy 2 (DePristo et al., 2011). SNPs with minor allele frequency (MAF) less than 5% and heterozygous calls (heterozygous loci) were considered as missing data, and the inbred lines with more than 24% missing data were eliminated from the analysis. Based on this, a total of 5,743 SNP markers were retained and used for association mapping.

Of the 184 inbred lines genotyped by SNP markers, nine lines had missing data greater than 37%; therefore, they were not included in association analysis. Thus, a total of 175 inbred lines were used in GWAS and further analyses; these lines derived from 34 F2 and 22 BC1 plants of the above-mentioned six interspecific crosses. Normally, GWAS is carried out using a larger size population carrying wide diversity of alleles. However, the use of a small size inbred population derived from a single cross, for example, 122 doubled haploid lines of barley (Hordeum vulgare L.) (Hu et al., 2018), has shown the potential for mapping of different agronomic traits following association mapping approach. As mentioned above, the population that we used in this study, was theoretically, expected to be segregating for a part of the C genome and carrying allelic diversity of six B. oleracea accessions; therefore, the use of 175 lines in association mapping could be justified

Code/software

Excel

SNP markers used for QTL mapping in the inbred lines

Data files

Abstract

README: SNP markers used for QTL mapping in the inbred lines

Description of the data and file structure

Works referencing this dataset