Data from: Genome resources of Xanthomonas vasicola strains from various hosts: Reference-guided chromosome and plasmid assemblies for enhanced pathogen genomics
Data files
Dec 06, 2025 version files 55.49 MB
-
BCC1689_pXvvH.fasta
29.65 KB
-
BCC1689.fasta
4.98 MB
-
BCC169_pXvvC.fasta
29.01 KB
-
BCC169.fasta
4.95 MB
-
BCC42_pLMG911.2.fasta
48.11 KB
-
BCC42_pXvvA1.fasta
115.72 KB
-
BCC42_pXvvB1.fasta
23.72 KB
-
BCC42.fasta
4.86 MB
-
BCC448_pXvvD.fasta
33.87 KB
-
BCC448.fasta
5.02 MB
-
BCC510_pXvvA2.fasta
114.01 KB
-
BCC510_pXvvE.fasta
36.13 KB
-
BCC510_pXvvF.fasta
55.03 KB
-
BCC510_pXvvG.fasta
27.48 KB
-
BCC510.fasta
4.86 MB
-
BCC516_pXvhA.fasta
34.68 KB
-
BCC516_pXvhB.fasta
2.10 KB
-
BCC516.fasta
5.01 MB
-
BCC777_pXvhA.fasta
35.26 KB
-
BCC777_pXvhB.fasta
2.10 KB
-
BCC777.fasta
4.98 MB
-
BCC931_pLMG911.2.fasta
48.14 KB
-
BCC931_pXvvB2.fasta
24.64 KB
-
BCC931.fasta
4.93 MB
-
BCC932_pLMG911.2.fasta
48.15 KB
-
BCC932_pXvvB3.fasta
24.76 KB
-
BCC932.fasta
4.94 MB
-
J12_pLMG911.2.fasta
48.10 KB
-
J12_pXvvA4.fasta
115.72 KB
-
J12_pXvvB5.fasta
23.33 KB
-
J12_pXvvF.fasta
55.03 KB
-
J12.fasta
4.86 MB
-
J5_pLMG911.2.fasta
48.11 KB
-
J5_pXvvA3.fasta
115.71 KB
-
J5_pXvvB4_.fasta
23.33 KB
-
J5_pXvvF.fasta
55.03 KB
-
J5.fasta
4.87 MB
-
metadata.csv
616 B
-
README.md
4.69 KB
Abstract
This dataset comprises eleven whole-genome assemblies of Xanthomonas vasicola strains isolated from four economically important hosts: Eucalyptus spp., maize, sorghum, and sugarcane. Included in this dataset are FASTA files of the assembled chromosome and plasmid sequences for each strain. All the assemblies are chromosome-level, with the contigs assembled into a single chromosome using reference-guided scaffolding. The assemblies were generated from Illumina short-read data using a pipeline that included an initial de novo assembly performed with Unicycler, reference-guided scaffolding of chromosomes using Ragout, and iterative plasmid discovery and assembly using BLAST against the PLSDB database and Ragtag.
Chromosome-scale and plasmid genome assemblies of eleven Xanthomonas vasicola strains isolated from Eucalyptus, maize, sorghum and sugarcane
Overview
This dataset contains the whole-genome assemblies for eleven strains of the plant pathogen Xanthomonas vasicola, isolated from four different host plants: Eucalyptus spp., maize, sorghum, and sugarcane. These chromosome-scale and plasmid assemblies provide a resource for comparative genomics, host adaptation studies, virulence gene analysis, and evolutionary research.
These are the final, revised assemblies associated with the accepted manuscript.
Corresponding Manuscript
Genome resources of Xanthomonas vasicola strains from various hosts: reference-guided chromosome and plasmid assemblies for enhanced pathogen genomics
Nomakula Y. Zim, Anna E. J. Yssel, and Teresa A. Coutinho
Journal of Plant Pathology (2025, Accepted)
Data and Code Availability
- Genome Assemblies (this dataset):
This repository contains the final FASTA files for all chromosome and plasmid assemblies. - NCBI GenBank:
Assemblies have been submitted to GenBank under BioProject PRJNA1195417.
At the time of Dryad data submission, the public release on GenBank is pending; therefore, the Dryad files indicate the authoritative versions associated with the publication. - Analysis Code:
All scripts, environments, and workflow materials are permanently archived on Zenodo:
https://doi.org/10.5281/zenodo.17634337
File Manifest and Structure
This dataset includes the chromosome and plasmid assemblies for 11 Xanthomonas vasicola strains.
All sequences are in FASTA format. Filenames correspond to the strain name and plasmid identifier.
Included Files
Chromosome assemblies
BCC42.fastaBCC169.fastaBCC448.fastaBCC510.fastaBCC516.fastaBCC777.fastaBCC931.fastaBCC932.fastaBCC1689.fastaJ5.fastaJ12.fasta
Plasmid assemblies
BCC42_pLMG911.2.fastaBCC42_pXvvA1.fastaBCC42_pXvvB1.fastaBCC169_pXvvC.fastaBCC448_pXvvD.fastaBCC510_pXvvA2.fastaBCC510_pXvvE.fastaBCC510_pXvvF.fastaBCC510_pXvvG.fastaBCC516_pXvhA.fastaBCC516_pXvhB.fastaBCC777_pXvhA.fastaBCC777_pXvhB.fastaBCC931_pLMG911.2.fastaBCC931_pXvvB2.fastaBCC932_pLMG911.2.fastaBCC932_pXvvB3.fastaBCC1689_pXvvH.fastaJ5_pLMG911.2.fastaJ5_pXvvA3.fastaJ5_pXvvB4_.fastaJ5_pXvvF.fastaJ12_pLMG911.2.fastaJ12_pXvvA4.fastaJ12_pXvvB5.fastaJ12_pXvvF.fasta
Metadata file
metadata.csv
Each FASTA file represents a completed, circularised assembly. The plasmid filenames indicate the corresponding plasmid names (e.g., pXvvA1, pLMG911.2).
The metadata.csv file links strain names to host species, genome sizes, N50 values,and plasmid counts.
Column Descriptions for metadata.csv
This file contains key information for each assembled strain:
Strain_name: Strain identifier (matches FASTA prefix).Host_of_isolation: Host plant (Eucalyptus, maize, sorghum, or sugarcane).Genome_size_Mbp: Total length of the assembly in base pairs.GC_mol%: GC content of the assembly, expressed as a percentage.Contig_N50: N50 of the initial Unicycler contigs (before scaffolding).Number_of_plasmids: Number of plasmids identified and assembled.
Methods Summary
Assemblies were generated from Illumina paired-end reads using the following workflow:
- Initial Assembly:
De novo assembly was performed using Unicycler v0.5.1. - Chromosome Scaffolding:
Linear contigs were ordered and oriented with Ragout v2.2 using a reference-guided approach. - Plasmid Discovery & Assembly:
Unplaced contigs were searched against PLSDB with BLASTN.
Significant hits (≥90% identity and ≥500 bp) guided plasmid scaffolding using RagTag v2.1.0. - Circularisation:
Chromosome and plasmid assemblies were circularised and start positions standardised using Circlator (fixstart).
Definitions and Notes
- Chromosome-scale: Assemblies represent a full-length chromosome reconstructed via reference-guided scaffolding.
- Reuse Potential: The datasets support comparative genomics, pangenomics, phylogenetics, and virulence gene discovery.
Contact Information
For questions regarding this dataset, please contact:
Prof. Teresa A. Coutinho
Email: teresa.coutinho@up.ac.za
