FC309 genome assembly and annotation files
Abstract
Sugar beet (Beta vulgaris L.) is a global source of table sugar and animal fodder. Here we report a highly contiguous, haplotype phased genome assembly and annotation for sugar beet line FC309. Both assembled haplomes for FC309 represent the largest and most contiguous assembled beet genomes reported to date, as well as gene annotations sets that capture over 1,500 additional protein-coding loci compared to prior beet genome annotations. These new genomic resources were used to identify novel quantitative trait loci (QTL) for Fusarium yellows resistance from the FC309 genetic background using an F2 mapping-by-sequencing approach. The highest QTL signals were detected on Chromosome 3, spanning approximately 10Mbp in both haplomes. A parallel transcriptome profiling experiment identified candidate genes within the Chromosome 3 QTL with plausible roles in disease response, including NBS-LRR genes with expression trends supporting a role in resistance. Investigation of genetic variants in these candidate genes found 1 major disease-resistance protein containing high-effect variants of interest. Collectively, the genomic resources for FC309 presented here are foundational tools for comparative genomics, mapping other traits in the FC309 background, and as a reference genome for other beet studies due to its contiguity, completeness, and high-quality gene annotations.
https://doi.org/10.5061/dryad.wstqjq2x5
Description of the data and file structure
Genome assembly and annotation files for sugar beet line FC309 developed as described in Todd et al. "A fully phased, chromosome-scale genome of sugar beet line FC309 enables the discovery of Fusarium yellows resistance QTL" published in DNA Research.
Files and variables
File: FC309.zip
Description: Two directories containing genome assembly and annotation files for FC309 haplome 1 (v1.1.0) and FC309 haplome 2 (v1.2.0). Each haplome directory contains the assembly file (in FASTA format, USDA_Bvulg_FC309_v1.X.0.fasta), genome annotation file (GFF3 format, Masked-FC309v1.X.0-extra-contigs-publish.gff3) which contains the genomic coordinates of final protein coding loci, and both nucleotide (Masked-FC309v1.X.0-extra-contigs-publish.CDS.fna and Masked-FC309v1.X.0-extra-contigs-publish.genes.fna) and and protein sequences for each of the annotated protein coding genes (Masked-FC309v1.X.0-extra-contigs-publish.protein.faa). In each directory, there is also a readme file that describes the genome annotation pipeline that was used.
Usage Notes
All files are (fasta and gff) are text files that are able to be opened using any text file viewer by appending the file name with ".txt". These files can also be viewed using open source tools like Artemis (https://sanger-pathogens.github.io/Artemis/Artemis/) or JBrowse (https://jbrowse.org/jb2/).
High molecular weight DNA was isolated from a single plant from the FC309 sugar beet line for PacBio HiFi sequencing. Young, dark treated leaf tissue was collected from the same plant for DoveTail Omni-C proximity ligation library preparation and Illumina sequencing. PacBio HiFi and DoveTail Omni-C reads were assembled using the software package Hifiasm to produce a phased contig level assembly. The two phased contig level assemblies were scaffolded using the DoveTail HiRise method to break mis-joined contigs and anchor/orient contigs into psuedochromosomes. The final assemblies, named USDA_Bvulg_FC309_v1.1 (haplome 1) and USDA_Bvulg_FC309_v1.2 (haplome 2) were independently annotated to identify and mask repetivie regions of the genome and identify protein coding loci using the GenSAS genome annotation pipeline.
