Instituto Nacional de Investigación Agropecuaria (INIA) Rice Breeding Program Historical Dataset
Data files
Aug 07, 2023 version files 70.89 MB
-
GenomicInformation.txt
59.13 MB
-
Lines.txt
1.01 MB
-
Phenotypes.txt
9.29 MB
-
README.md
11.87 KB
-
SNPs.txt
1.39 MB
-
Trials.txt
64.32 KB
Abstract
Breeding programs generate vast amounts of data which are often scattered in separate files. This hinders the application of modern breeding tools such as multi-environment analyses and genomic selection. This dataset is the result of consolidating 23 years of phenotypic, pedigree, and genomic records from the Uruguayan national rice breeding program. All the available data from 1997 to 2020 corresponding to field trials, blast nurseries, laboratory analyses of milling and cooking quality, pedigree information, and genomic information for selected advanced breeding lines are gathered. Records of 996 trials in 12 locations over a span of 23 years, 91,636 field plots with information on 14 phenotypic variables, pedigree for 19,447 genotypes, and genomic information regarding 61,260 SNP markers for 965 genotypes were recovered. The dataset is structured in Trials, Phenotypes, Lines, Genomic Information, and SNP Tables. Genotype identification has been coded.
Methods
A detailed description on data collection and processing is available in the corresponding manuscript submitted to Crop Science journal. Briefly, for data unification a uniform format spreadsheet was designed with standardized names for fields, creating one spreadsheet per trial. An R code was created and used to merge trial's data into a single table. All tables were then unified into one with all the available IRBP information. A relational database structure was defined by identifying the elements that constituted the system and the key fields that connected them. Variable names and levels of all categorical variables were standardized.
Usage notes
Data files are in txt format.