Skip to main content

Assembly, annotation, and comparison of Macrophomina phaseolina isolates from strawberry and other hosts

Cite this dataset

Burkhardt, Alyssa et al. (2019). Assembly, annotation, and comparison of Macrophomina phaseolina isolates from strawberry and other hosts [Dataset]. Dryad.


Background: Macrophomina phaseolina is a fungal plant pathogen with a broad host range, but one genotype was shown to exhibit host preference/specificity on strawberry. This pathogen lacked a high-quality genome assembly and annotation, and little was known about genomic differences among isolates from different hosts.

Results: We used PacBio sequencing and Hi-C scaffolding to provide nearly complete genome assemblies for M. phaseolina isolates representing the strawberry-specific genotype and another genotype recovered from alfalfa.  The strawberry isolate had 59 contigs/scaffolds with an N50 of 4.3 Mb. The isolate from alfalfa had an N50 of 5.0 Mb and 14 nuclear contigs with half including telomeres.  Both genomes were annotated with MAKER using transcript evidence generated in this study with over 13,000 protein-coding genes predicted. Unique groups of genes for each isolate were identified when compared to closely related fungal species. Structural comparisons between the isolates reveal large-scale rearrangements including chromosomal inversions and translocations. To include isolates representing a range of pathogen genotypes, an additional 30 isolates were sequenced with Illumina, assembled, and compared to the strawberry genotype assembly. Within the limits of comparing Illumina and PacBio assemblies, no conserved structural rearrangements were identified among the isolates from the strawberry genotype compared to those from other hosts, but some candidate genes were identified that were largely present in isolates of the strawberry genotype and absent in other genotypes.

Conclusions: High-quality reference genomes of M. phaseolina have allowed for the identification of structural changes associated with a genotype that has a host preference toward strawberry and will enable future comparative genomics studies. Having more complete assemblies allows for structural rearrangements to be more fully assessed and ensures a greater representation of all the genes. Work with Illumina data from additional isolates suggests that some genes are predominately present in isolates of the strawberry genotype, but additional work is needed to confirm the role of these genes in pathogenesis. Additional work is also needed to complete the scaffolding of smaller contigs identified in the strawberry genotype assembly and to determine if unique genes in the strawberry genotype play a role in pathogenicity.


The genomes were assembled using the FALCON assembly with PacBio and were polished with Illumina sequencing. Genomes were annotated using MAKER. For a full description of the methods, please see the corresponding manuscript.

Usage notes

The data for two separate genomes of the fungus Macrophomina phaseolina are provided. The Al1 data sets are from an isolate recovered from alfalfa and the 11_12 data sets are from an isolate recovered from strawberry.

Provided for each isolate are the final assembly (also available at NCBI) as well as the corresponding gff annotation files. The fasta files that include all of the protein and transcript data for each assembly are also included. 


California Department of Food and Agriculture’s Specialty Crop Block Grant Program, Award: SCB14052

California Strawberry Commission