A public mid-density genotyping platform for cultivated Blueberry
Data files
Aug 26, 2024 version files 3.67 MB
-
20200819-BI-Blueberry_10K_SNPs_forDArT_3K.txt
-
blueberry_allele_db_v017.fa
-
README.md
Abstract
Small public breeding programs have many barriers to adopting technology, particularly creating, and using genetic marker panels for genomic-based decisions in selection. Here we report the creation of a DArTag panel of 3,000 loci distributed across the tetraploid blueberry genome for use in molecular breeding and genomic prediction. The creation of this marker panel brings cost-effective and rapid genotyping capabilities to public and private breeding programs. The open access provided by this platform will allow genetic data sets generated on the marker panel to be compared and joined across projects, institutions, and countries. This genotyping resource has the power to make routine genotyping a reality for any breeder of blueberry.
README: A public mid-density genotyping platform for cultivated Blueberry
https://doi.org/10.5061/dryad.j6q573nnc
The blueberry 3K DArTag panel contains 3K marker loci evenly distributed throughout the blueberry genome. DArT generates genotyping results in several formats, among which the MADC format (missing allele discovery count) provides all the microhaplotypes (54-81 bp) discovered based on amplicons for the 3K marker loci. These microhaplotypes contain target SNPs per assay design as well as off-target SNPs, which are present in flanking amplicon sequences. BI created a microhaplotype database for the 3K blueberry marker loci.
Description of the data and file structure
This project consists of the following datasets:
- 20200819-BI-Blueberry_10K_SNPs_forDArT_3K.txt: The 3K DArTag panel probe information, which contains the target SNP coordinates, their 180 bp flanking sequences, and reference genome used for the SNP discovery
- blueberry_allele_db_v017.fa: The microhaplotype database v17 contains microhaplotypes detected from genotyping more than 9,000 blueberry samples using the 3K DArTag panel.
Methods
The blueberry 3K DArTag panel was created from whole-genome skim sequencing (WGS) of 31 cultivated blueberry accessions focused on elite North American breeding lines. A total of 600K SNPs were discovered in the WGS. A high-confidence set of 10K SNPs was then identified using the following criteria: 1) not located within 5 bp from an indel; 2) QUAL > 30; 3) minimum and maximum read depths of 20 and 1500, respectively; 4) at each heterozygous site, at least one read supporting the reference allele and two reads supporting the alternative allele; 5) no missing genotype per SNP position; 6) with a minor allele frequency greater than 0.25; 7) not located in transposable elements or within 1 Kb of chromosome termini; 8) even genomic distribution and mostly located in genic regions. The 10K SNPs were submitted for QC to DArT (Diversity Arrays Technology Pty Ltd, www.diversityarrays.com), from which, a 3K SNP set was selected. Additionally, a few experimentally validated SNPs were also force-included in the panel.