Data from: Whole genome sequencing of elite rice cultivars as a comprehensive information resource for marker assisted selection
Duitama, Jorge, Centro Internacional de Agricultura Tropical
Silva, Alexander, Centro Internacional de Agricultura Tropical
Sanabria, Yamid, Louisiana State University, Louisiana State University Agricultural Center
Cruz, Daniel Felipe, Centro Internacional de Agricultura Tropical
Quintero, Constanza, Centro Internacional de Agricultura Tropical
Ballen, Carolina, Centro Internacional de Agricultura Tropical
Lorieux, Mathias, International Center for Tropical Agriculture, Institut de Recherche pour le Développement
Scheffler, Brian
Farmer, Andrew, National Center for Genome Resources
Torres, Edgar, Centro Internacional de Agricultura Tropical
Oard, James, Louisiana State University, Louisiana State University Agricultural Center
Tohme, Joe, Centro Internacional de Agricultura Tropical
Publication date: April 21, 2016
Publisher: Dryad
https://doi.org/10.5061/dryad.8hg32
Citation
Duitama, Jorge et al. (2016), Data from: Whole genome sequencing of elite rice cultivars as a comprehensive information resource for marker assisted selection, Dryad, Dataset, https://doi.org/10.5061/dryad.8hg32
Abstract
Current advances in sequencing technologies and bioinformatics revealed the genomic background of rice, a staple food for the poor people, and provided the basis to develop large genomic variation databases for thousands of cultivars. Proper analysis of this massive resource is expected to give novel insights into the structure, function, and evolution of the rice genome, and to aid the development of rice varieties through marker assisted selection or genomic selection. In this work we present sequencing and bioinformatics analyses of 104 rice varieties belonging to the major subspecies of Oryza sativa. We identified repetitive elements and recurrent copy number variation covering about 200 Mbp of the rice genome. Genotyping of over 18 million polymorphic locations within O. sativa allowed us to reconstruct the individual haplotype patterns shaping the genomic background of elite varieties used by farmers throughout the Americas. Based on a reconstruction of the alleles for the gene GBSSI, we could identify novel genetic markers for selection of varieties with high amylose content. We expect that both the analysis methods and the genomic information described here would be of great use for the rice research community and for other groups carrying on similar sequencing efforts in other crops.
Usage notes
Structural variants WGSOryza_CIAT_LSU_USDA_NCGR
Structural variation identified for the 104 varieties analyzed in this study in GFF format (one file per sample). Genomic coordinates are relative to IRGSP1.0.
WGSOryza_CIAT_LSU_USDA_NCGR_SV.tar.gz
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr1_0.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 1 from basepairs 1 to 15000000
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr2_0.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 2 from basepairs 1 to 15000000
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr3_0.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 3 from basepairs 1 to 15000000
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr1_1.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 1 from basepairs 15,000,000 to 30,000,000
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr1_2.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 1 from basepairs 30,000,000 to 43,270,923
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr2_1.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 2 from basepairs 15,000,000 to 30,000,000
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr2_2.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 2 from basepairs 30,000,000 to 35,937,250
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr3_1.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 3 from basepairs 15,000,000 to 30,000,000
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr3_2.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 3 from basepairs 30,000,000 to 36,413,819
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr5_0.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 5 from basepairs 1 to 15,000,000
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr5_1.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 5 from basepairs 15,000,000 to 29,958,434
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr6_0.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 6 from basepairs 1 to 16,000,000
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr6_1.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 6 from basepairs 16,000,000 to 31,248,787
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr7_0.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 7 from basepairs 1 to 15,000,000
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr7_1.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 7 from basepairs 15,000,000 to 29,697,621
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr8_0.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 8 from basepairs 1 to 12,000,000
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr8_1.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 8 from basepairs 12,000,000 to 24,000,000
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr8_2.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 8 from basepairs 24,000,000 to 28,443,022
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr4_0.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 4 from basepairs 1 to 10,000,000.
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr4_1.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 4 from basepairs 10,000,000 to 20,000,000
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr4_2.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 4 from basepairs 20,000,000 to 30,000,000
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr4_3.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 4 from basepairs 30,000,000 to 35,502,694
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr9_0.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 9 from basepairs 1 to 12,000,000
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr9_1.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 9 from basepairs 12,000,000 to 23,012,720
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr10_0.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 10 from basepairs 1 to 12,000,000
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr10_1.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 10 from basepairs 12,000,000 to 23,207,287
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr11_0.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 11 from basepairs 1 to 15,000,000
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr11_1.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 11 from basepairs 15,000,000 to 29,021,106
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr12_0.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 12 from basepairs 1 to 12,000,000
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr12_1.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 12 from basepairs 12,000,000 to 24,000,000
WGSOryza_CIAT_LSU_USDA_NCGR_Q40_annotated_Chr12_2.vcf
SNPs and small indels identified for the 104 varieties analyzed in this study. All genotypes have an NGSEP genotyping quality score larger or equal than 40. Genomic coordinates are relative to IRGSP1.0. This file contains chromosome 12 from basepairs 24,000,000 to 27,531,856