Phylogenomic analysis of chitinase
Data files
Aug 07, 2022 version files 4.36 MB
Abstract
Supplemental Information: Phylogenomic analysis of chitinase
Picocyanobacterial sequences for genes involved in chitin degradation and peptidoglycan recycling pathways were found nested within branches of cyanobacterial genes, indicating vertical inheritance of peptidoglycan recycling. Picocyanobacterial sequences for chitinase (ChiA and ChiA-like) and N-acetylglucosamine kinase (NagK) were nested within non-cyanobacterial taxa, indicating Horizontal Gene Transfer (HGT) to picocyanobacteria after their divergence from other cyanobacteria. To contextualize the HGT of chitinase genes into ancestors of marine SynPro, we examined their phylogenetic relationships to similar sequences found within other bacteria. Picocyanobacterial chitinases contained two major chitin-binding domains that were homologous to different chitinase sequence variants found within other bacterial genomes. Gene sequence alignments suggest that the marine SynPro variant is likely the product of a fusion of two genes that were both acquired from Planctomycetes via Horizontal Gene Transfer (HGT).
Methods
Phylogenetic analysis
Sequences were collected from the Genbank database for the following chitin degradation pathway proteins: The N-terminal region of ChiA/ChiA-like, the C-terminal region of ChiA/ChiA-like, UgpA, UgpE, NagZ, NagK, NagA, and NagB. Orthologs found within Prochlorococcus MIT1303 were used as protein search queries using BLASTP, with the top 500 or 250 hits recovered in each case. Each set of sequences were then aligned in MAFFT with the automatic algorithm selection option. Aligned sequences were then used for phylogenetic reconstruction using IQTree with automatic best-fitting model selection. All sequence alignment and phylogenetic data files are available in SI data (Data S1, SI Text 1 files), with alignment and tree filenames in each case describing the algorithms and parameters used for these reconstructions. Several BLAST hits for the ChiA and ChiA-like genes in SynPro overlapped, with some protein sequences containing multiple domains homologous to different chitinase orthologs in other bacteria. A phylogenomic analysis of our alignment data showed that the SynPro variant was likely the result of a fusion between genes from Planctomycetes, before or after horizontal gene transfer into SynPro (see SI Text 1 and SI Text 1 files for a detailed analysis of the protein fusion history).
Usage notes
SI Text 1: Phylogenomic analysis of Chitinase Domains within SynPro proteins ChiA and ChiA-like.
Supplemental Information text document (.pdf) containing a detailed phylogenomic analysis of ChiA and ChiA-like protein regions for picocyanobacteria (prepared by G. Fournier).
SI Text 1 files.
Supporting data files for the Phylogenomic analysis of chitinase domains ChiA and ChiA-like (SI Text 1). A README.txt file contains an index of all files in all subfolders.
This zip file contains a folder of files for different BLAST search results, sequences, alignments, trees, analyses performed under different alignment strategies and reconstructions under different IQTree models. Alignment files are in FASTA format (.fasta), IQTree tree files are included in newick format (.tree) along with log files (.log.txt). Figures 1-4 for SI Text 1 are included as .png files. A spreadsheet file (.xlsx) of Planctomycetes taxa strain information is also included. Two .pdf files contain detailed output from MAFFT for alignments of different sequences for ChiA/ChiA-like regions. Also included are two subfolders for the N-terminal and C-terminal regions of the ChiA/ChiA-like alignment: ChiA_ChiA_like_1745-3039_IQtree_Outfiles, ChiA_ChiA_like_6077-7399_IQtree_Outfiles. Within these subfolders are alignments in FASTA format (.fasta), and files from IQTree including: log (.log), report (.iqtree), consensus tree (.contree), and newick tree files (.treefile). Taxonomic names, rankings and sequence identifiers are contained within the nexus formatted treefiles (.figTree), which can be opened using FigTree for visualization of taxonomy.
Data S1: Chitinase pathway final alignment and tree files.
Final data files supporting the phylogenetic analysis of chitin degradation and peptidoglycan recycling pathways. A README.txt file contains an index of all files in all subfolders.
This zip file contains folders labeled by gene/domain: UgpE, UgpA, NagZ, NagB, NagA, NAG_Kinase, ChiA_ChiA_like. Within each labeled folder are a .txt file suffixed 'ReadMe', a .pdf file suffixed 'Rooting' for the rooting, a FASTA formatted alignment file (.fasta), and a final IQTree newick tree file (.tree) with labels based on the algorithms and parameters used. In some cases for NagZ, NagA, the original gene sequence data (.faa) was included along with FASTA formated sequence data (.fasta) for different BLAST search results. For the different N-terminal and C-terminal regions, in the case of ChiA/ChiA-like, IQTree tree files (.tree) and report (.iqtree) files were included. Within each labeled gene/domain folder containing the final alignment is another subfolder suffixed 'Outfiles' containing final files from IQTree including: log (.log), report (.iqtree), consensus tree (.contree), and newick tree files (.treefile). Taxonomic names, rankings and sequence identifiers are contained within the nexus formatted treefiles (.figTree), which can be opened using FigTree for visualization of taxonomy.