Data from: Genotyping analysis of over 130,000 CIMMYT bread wheat breeding lines: A decade-long effort in optimizing wheat genotyping
Data files
Feb 12, 2021 version files 8.09 MB
Feb 18, 2021 version files 8.09 MB
Jan 19, 2026 version files 3.74 GB
-
chi2_quality_test.R
1.47 KB
-
CIMMYT_Filtered.130K.GIDs.hmp.txt.zip
772.59 MB
-
CIMMYT_Filtered.130K.GIDs.vcf.gz
2.95 GB
-
fisher_quality_test.R
1.24 KB
-
genotype_count_vcf.pl
2.40 KB
-
get_SNP_1filter_passed.sh
2.39 KB
-
inbred_quality_test.R
392 B
-
key_file_of_CIMMYT_bread_wheat_breeding_lines_from_years_2013-2023.xlsx
11.31 MB
-
README.md
2.16 KB
-
SRA_fastq_files_CIMMYT_bread_wheat_breeding_lines_2013-2023.xlsx
43.21 KB
-
tassel_pipeline.sh
2.19 KB
Abstract
A total of 130,247 bread wheat breeding lines from the year 2013-2023 developed by the International Maize and Wheat Improvement Center (CIMMYT) were genotyped. We used genotyping-by-sequencing (GBS) to construct 636 GBS libraries and sequenced them in the Illumina platform to generate FASTQ files. The key file consists of metadata such as sample name, flowcell, lane number, and barcode used for multiplexing samples. The FASTQ file of corresponding samples can be identified based on the library. The raw reads are available at the National Center for Biotechnology Information (NCBI) with BioProject accessions PRJNA498085 (2013 – 2020 data), PRJNA901877 (2021), PRJNA901925 and PRJNA901462 (2022) and PRJNA1044425 (2023).
https://doi.org/10.5061/dryad.37pvmcvjq
Description of the data and file structure:
1. This file consists of the name of FASTQ files generated after sequencing respective GBS libraries and also the SRA accession of the files. The Tassel GBS pipeline requires specific naming of FASTQ files for the analysis. So, the downloaded FASTQ files from NCBI can be renamed to the respective file names as listed in column 'file_name_for_Tassel'.
SRA_fastq_files_CIMMYT_bread_wheat_breeding_lines_2013-2023.xlsx
2. This file has information about the samples. The Tassel GBS pipeline uses the first four columns to identify sequencing reads present in the FASTQ files.
key_file_of_CIMMYT_bread_wheat_breeding_lines_from_years_2013-2023.xlsx
3. The steps below shows on how to extract a subset of samples for the analysis:
a. Select all rows with the sample names and also include the header from File 2.
b. Identify all GBS libraries to be downloaded from column "LibraryPlateID" of File 2.
c. Based on GBS libraries (File 1), download FASTQ file (SRA accession) from NCBI.
d. Compress and rename the FASTQ file to the respective filename (File 1).
e. Use the key file from step a. and FASTQ files from step d. as inputs in the Tassel GBS pipeline.
4. The scripts to analyze the data:
chi2_quality_test.R
fisher_quality_test.R
genotype_count_vcf.pl
get_SNP_1filter_passed.sh
inbred_quality_test.R
tassel_pipeline.sh
5. Genotype data of 130,247 bread wheat breeding lines in VCF and hapmat formats.
CIMMYT_Filtered.130K.GIDs.hmp.txt.zip
CIMMYT_Filtered.130K.GIDs.vcf.gz
Version changes
18-Feb-2021: Only abstract was modified.
04-Nov-2025: Additonal data of the bread wheat breeding lines from the year 2021-2023 were added in the files 1 and 2. The scripts to process the sequencing data and filtering have been added. A complete genotype data of the CIMMYT bread wheat breeding lines from the year 2013-2023 have been added.
- Shrestha, Sandesh; Adhikari, Laxman; Crain, Jared et al. (2025). Genotyping analysis of over 130,000 CIMMYT bread wheat breeding lines: A decade‐long effort in optimizing wheat genotyping. The Plant Genome. https://doi.org/10.1002/tpg2.70148
