Phenotypic variation and genome-wide association studies of main culm panicle node number, maximum node production rate, and degree-days to heading in rice
Data files
Jan 12, 2022 version files 417.62 MB
-
854832_SNP_220_sample_matrix_hmp.txt
417.62 MB
-
README_Sanchez_etal_dataset.txt
1.58 KB
Abstract
To understand the genetic basis of main culm panicle node number, maximum node production rate, and degree-days to heading in rice (Oryza sativa), we conducted genome-wide association studies using a diversity panel of 220 rice accessions and 854,832 SNP markers generated using genotyping-by-sequencing (GBS), with 1X coverage. The raw genotype data was filtered, selecting single nucleotide polymorphisms (SNPs) having less than 50% missing data and minimum allele frequency (MAF) >5%. After initial filtering, imputation was conducted using BEAGLE V4.0 in 1,075,302 SNP markers. After imputation, the dataset was filtered a second time by removing SNPs with less than 5% MAF and more than 5% missing data. A total of 854,832 SNPs were used in the genome-wide association analyses. The dataset representing the genotype data of 854,832 SNP markers by 220 rice accessions is presented here.
The SNP markers were generated using genotyping-by-sequencing (GBS), with 1X coverage. The raw genotype data was filtered, selecting single nucleotide polymorphisms (SNPs) having less than 50% missing data and minimum allele frequency (MAF) >5%. After initial filtering, imputation was conducted using BEAGLE V4.0 in 1,075,302 SNP markers. After imputation, the dataset was filtered a second time by removing SNPs with less than 5% MAF and more than 5% missing data. A total of 854,832 SNPs were used in the genome-wide association analyses.