Blueprint for phasing and assembling the genomes of heterozygous polyploids: Application to the octoploid genome of strawberry
Data files
Nov 08, 2021 version files 1.27 GB
Abstract
The challenge of allelic diversity for assembling haplotypes is exemplified in polyploid genomes containing homoeologous chromosomes of identical ancestry, and significant homologous variation within their ancestral subgenomes. Cultivated strawberry (Fragaria × ananassa) and its progenitors are outbred octoploids in which up to eight homologous and homoeologous alleles are preserved. This introduces significant risk of haplotype collapse, switching, and chimeric fusions during assembly. Using third generation HiFi sequences from PacBio, we assembled the genome of the day-neutral octoploid F. × ananassa hybrid ‘Royal Royce’ from the University of California. Our goal was to produce subgenome- and haplotype-resolved assemblies of all 56 chromosomes, accurately reconstructing the parental haploid chromosome complements. Previous work has demonstrated that partitioning sequences by parental phase supports direct assembly of haplotypes in heterozygous diploid species. We leveraged the accuracy of HiFi sequence data with pedigree-informed sequencing to partition long read sequences by phase, and reduce the downstream risk of subgenomic chimeras during assembly. We were able to utilize an octoploid strawberry recombination breakpoint map containing 3.6 M variants to identify and break chimeric junctions, and perform scaffolding of the phase-1 and phase-2 octoploid assemblies. The N50 contiguity of the phase-1 and phase-2 assemblies prior to scaffolding and gap-filling was 11 Mb. The final haploid assembly represented seven of 28 chromosomes in a single contiguous sequence, and averaged fewer than three gaps per pseudomolecule. Additionally, we re-annotated the octoploid genome to produce a custom F. × ananassa repeat library and improved set of gene models based on IsoSeq transcript data and an expansive RNA-seq expression atlas. Here we present ‘FaRR1’, a gold-standard reference genome of F. × ananassa cultivar ‘Royal Royce’ to assist future genomic research and molecular breeding of allo-octoploid strawberry.
Usage notes
***WARNING: THIS DATA SUBMISSION CONTAINS FILES ASSOCIATED WITH THREE SEPARATE GENOME ASSEMBLIES:
- files with the prefix 'farr1.' are associated with the Royal Royce synthetic haploid genome (for most user applications)
- files with the prefix 'farr1_phase1.' are associated with the Royal Royce phase1 (parent haplotype A) genome
- files with the prefix 'farr1_phase2.' are associated with the Royal Royce phase2 (parent haplotype B) genome