Orthogroup alignments of Clostridium spp. isolates
Data files
Jun 17, 2026 version files 35.25 MB
-
astral_alignments.zip
6.34 MB
-
README.md
1.79 KB
-
SNP_alignments_and_trees.zip
17.28 MB
-
supermatrix_alignments.zip
11.62 MB
Abstract
De novo genome assemblies for 200+ Clostridium species isolates were generated by shovill v.1.1.0 and annotated by Bakta v.1.8.1using the db-light v.5 database. Orthofinder v.3.0.1b1 assigned orthology to annotated proteins, which were used for phylogenetic reconstruction. A subset of orthogroups were selected to generate phylogenies. 264 single copy orthologs were aligned by mafft v.7.453 and concatenated together to generate a supermatrix. 900 orthogroups were aligned by mafft for a gene tree species tree reconciliation performed by Astral-Pro. SNP alignments were generated by snp-sites 2.5.1 from consensus genomes of 200+ Clostridium species isolates. Consensus genomes were produced by SAMTools/BCFtools v1.4.1 after read mapping to closely related reference genomes (Hall, Alaska, and CDC_67071) with BWA-MEM v0.7.15-r1142-dirty. Polymorphisms due to recombination were identified and masked by Gubbins v2.3.4. Maximum likelihood phylogenies were generated fromtotal SNP and recombinant-free SNP alignments using IQ-TREE2 v2.4.0 and the TVM+F+ASC+G4 substitution model
This repository contains multisequence alignments and phylogenies produced for, "Whole Genome Sequencing of Neurotoxin-Producing Clostridium Species in New York State to Bolster Epidemiological Investigations and Reveal Patterns of Diversity and Distribution."
astral_alignments.zip
This zipped file contains multifasta alignment files for 900 orthogroups and associated maximum likelihood (ML) phylogenies, generated in IQ-TREE2 under an LG+F substitution model.
File endings:
.aln.fa - multifasta alignment files
.treefile - individual tree files
astral_orthogroup_annotations.txt contains the protein annotations associated with each orthogroup
supermatrix_alignments.zip
This zipped file contains 264 single-copy ortholog alignments in fasta format, which were used to generate a supermatrix, the final supermatrix, and the resulting tree produced by IQ-TREE2 under and LG+F+R4 substitution model.
File endings:
.aln.fa - orthogroup alignments in fasta format
.fasta - supermatrix
.tree - supermatrix ML tree
supermatrix_orthogroup_annotations.txt contains the protein annotations associated with each orthogroup.
SNP_alignments_and_trees.zip
This zipped file contains SNP alignments and resulting trees for Clostridium botulinum Hall, Alaska, and CDC_67071 reference groupings.
File endings:
allSNPs.fasta - SNP alignment that includes all SNP positions (clonal and recombinant)
allSNPs.tree - ML phylogeny built with allSNPS alignment file in IQ-TREE2 under the TVM+F+ASC+G4 substitution model
maskedSNPs.fasta - SNP alignment with recombinant positions masked
maskedSNPs.tree - ML phylogeny built with maskedSNPS alignment file in IQ-TREE2 under the TVM+F+ASC+G4 substitution model
