Onion (Allium cepa) pseudoreference genome

Labate, Joanne 1 ; Glaubitz, Jeffrey2 ; Havey, Michael 1

Published Aug 13, 2020 on Dryad. https://doi.org/10.5061/dryad.6wwpzgmwg

Data files

Aug 13, 2020 version files 7.23 MB

onion-pseudogenome.fasta

7.23 MB

Abstract

Onion (Allium cepa) is not highly tractable for development of molecular markers due to its large (16 gigbases per 1C) nuclear genome. Single nucleotide polymorphisms (SNPs) are useful for genetic characterization and marker-aided selection of onion because of codominance and common occurrence in elite germplasm. We completed genotyping by sequencing (GBS) to identify SNPs in onion using 46 F2 plants, parents of the F2 plants (Ailsa Craig 43 and Brigham Yellow Globe 15-23), two doubled haploid (DH) lines (DH2107 and DH2110), and plants from 94 accessions in the USDA National Plant Germplasm System (NPGS). SNPs were called using the TASSEL 3.0 Universal Network Enabled Analysis (UNEAK) bioinformatics pipeline. Sequences from the F2 and DH plants were used to construct a pseudo-reference genome against which genotypes from all accessions were scored. Quality filters were used to identify a set of 284 high quality SNPs which were placed onto an existing genetic map for the F2 family. Accessions showed a moderate level of diversity (mean H_e = 0.341) and evidence of inbreeding (mean F = 0.592). GBS is promising for SNP discovery in onion, although lack of a reference genome required extensive custom scripts for bioinformatics analyses to identify high quality markers.

46 F2 plants and parents of the onion (Allium cepa) mapping population Brigham Yellow Globe 15-23 x Ailsa Craig 43 were genotyped, as well as two doubled haploid (DH) onion lines DH2107 and DH2110 which were used as completely homozygous controls. Genotyping by sequencing (GBS) was performed using an Illumina HiSeq 2000 on two to four replicates of every DNA sample. GBS libraries were prepared at Cornell University’s Genomic Diversity Facility using the restriction enzyme EcoT22I and assayed in 96-plex format using standard protocols. SNP calling on the 46 F2 plants, two parents, and the two DH lines was performed using TASSEL 3.0 Universal Network Enabled Analysis (UNEAK) bioinformatics pipeline, which does not require a reference genome. Over 70,000 raw SNPs were scored in these samples. Quality filters were then applied to SNPs as follows: not heterozygous in either DH line, minor allele frequency greater than or equal to 30%, minimum genotypic read depth of seven, maximum missing data of 10%, and conforming to the expected 1:2:1 segregation ratio (goodness-of-fit > 0.01) within the F2 family. For the resulting 752 SNPs, MSTMap software tool was used to construct a genetic linkage map using a grouping LOD criteria of p < 1 x 10^-7. This gave 701 SNPs in 15 linkage groups (LG) with ≥ 15 markers each (the remaining 51 markers were not placed on a linkage group). The number of SNPs per LG ranged from 15 – 90, and the estimated size of LGs ranged from 52 to 327 cM. Because UNEAK treats redundant, reverse complement tags from opposite strands as separate markers, 171 redundant tag pairs were eliminated from this linkage map. A pseudo-reference genome was constructed consisting of one tag from each of the 530 non-redundant, mapped tag pairs concatenated together into a single pseudo-molecule. To prevent spurious alignment across two distinct pseudo-reference tags, each tag in the pseudo-reference was separated by a span of at least 32 A nucleotides. The purpose of the pseudo-reference was to allow discovery of additional SNPs within each tag pair locus in 94 diverse onion accessions that were not segregating in the mapping population, thereby reducing the ascertainment bias that would result from using only SNPs discovered in only one F2 family in a population survey.

Onion (Allium cepa) pseudoreference genome

Data files

Abstract

Methods