Siderophore synthetase-receptor gene coevolution reveals habitat and pathogen-specific bacterial iron interaction networks
Data files
Oct 07, 2024 version files 1.42 GB
-
Dta_Code_V4.zip
1.42 GB
-
README.md
5.80 KB
Abstract
Bacterial social interactions play crucial roles in various ecological, medical, and biotechnological contexts. However, predicting these interactions from genome sequences is notoriously difficult. Here, we developed bioinformatic tools to predict whether secreted iron-scavenging siderophores stimulate or inhibit the growth of community members. Siderophores are chemically diverse and can be stimulatory or inhibitory depending on whether bacteria possess or lack corresponding uptake receptors. We focused on 1928 representative Pseudomonas genomes and developed an experimentally validated co-evolution algorithm to match encoded siderophore synthetases to corresponding receptor groups. We derived community-level iron interaction networks to show that siderophore-mediated interactions differ across habitats and lifestyles. Specifically, dense networks of siderophore sharing and competition were observed among environmental and non-pathogenic species, while small, fragmented networks occurred among human-associated and pathogenic species. Altogether, our sequence-to-ecology approach empowers the analyses of social interactions among thousands of bacterial strains and offers opportunities for targeted intervention to microbial communities.
README: Siderophore synthetase-receptor gene coevolution reveals habitat- and pathogen-specific bacterial iron interaction networks
https://doi.org/10.5061/dryad.z08kprrpg
Description of the data and file structure
Pyoverdine-mediated-interaction
Files and variables
File: Data_Code.zip
Description:
#Programs generating the data and figures for this work are:
The folder "./Figure1/" contains data and programs about the Figure1, Figure S1-S2
* Figure1\data\pure_species.csv stores the species information of 986 single-receptor producers.
* Figure1\data\colorcode17.xlsx mainly stores the color codes of the 17 siderophore receptor groups in the single-receptor producer. Rec_group is the group ID; number refers to the number of single-receptor producer strains in each rec_group; followed by RGB and Hex color codes.
* Figure1\Figure1a_iTOL\phy_1928_midpoint.tre and iTOL_annotation_editor_v1_4_Excel.xlsm are input files for iTOL web tool (http://huttenhower.sph.harvard.edu/galaxy/). The former is the phylogenetic tree file of 1928 strains of Pseudomonas, and the latter is the data file of the four outer circles of the phylogenetic tree. The phylogenetic tree was constructed utilizing the PhyloPhlAn3 pipeline (doi:10.1038/s41467-020-16366-7). PhyloPhlAn is a comprehensive pipeline that encompasses the identification of phylogenetic markers, multiple sequence alignment, and the inference of phylogenetic trees. In this analysis, we employed over 400 universal genes defined by PhyloPhlAn as our selected phylogeny markers.
The folder "./Figure2/" contains data and programs about the Figure2
* Figure2\FigureS4_Rcode\data\phylo_out55.csv is the information of all Pseudomonas strains with whole genomes sequenced in the laboratory: num is the number; strains is the strain name; use1 and w0 in use refer to the strains used in this experiment and the unused strains, respectively; syn_rec, rec, and no in withsyn_rec1 refer to strains with both siderophore synthesis genes and receptor genes, strains with only receptor genes, and strains with neither synthesis genes nor receptor genes based on genome data mining, respectively; synnum is the number of synthetic genes in each strain; syng and syng1 are both grouping information after distance clustering of synthetic gene sequences;
recnum is the number of receptor genes in each strain; sptype and sptype1 are both strain types, pure (A), partial (B), cheater (C), and Rno (D) represent single-receptor, multi-receptor producer, non-producer and Pseudomonas strains that do not have an siderophore system at all; RFU has a blank value here because although there are a total of 55 Pseudomonas strains in our laboratory, we only used 24 Pseudomonas strains for experimental research after deduplication based on the similarity of synthetic genes. The RFU column is the pyoverdine fluorescence measured in the experiment, so there are only 24 values. No experiments were conducted on the other 31 bacteria, so all of them are empty.
The folder "./Figure3/" contains data and programs about the Figure3
* Figure3\data\1928strain_country_infor_1.xlsx records the isolation source information of 1928 strains of Pseudomonas.
* Figure3\data\colorcode_new.csv is the color code. There are 94 siderophore receptor groups in the article, of which there are only 17 siderophore receptor groups in single-receptor producers. In order to ensure the distinguishability of colors, we only assign colors to the 17 receptor groups in single-receptor producers, so the last column Hex has only 17 color codes, and other receptor groups are empty.
* Figure3\data\recgroup_new.csv stores the groups and types of 4547 receptors: rec_groupid is the group ID of the receptor; strainnames is the strain name of the receptor; sptypeid is the type of receptor, 1 represents the self-receptor in a single-receptor producer, 2 represents the self-receptor in a multi-receptor producer, 3 represents the cheating-receptor in a multi-receptor producer, and 4 represents the cheating-receptor in a non-producer.
* Figure3\output\cytoscope_ironnetwork_1013\FigureS7_venn\venn.xlsx is the data file for making the Figure S7 venn figures. s-sg, p-sg, w-sg, h-sg represent the number of siderophore groups in soil, plant, water and human habitats, respectively; s-behiv, p-behiv, w-behiv, h-behiv represent the number of siderophore function groups in soil, plant, water and human habitats, respectively.
The folder "./Figure4/" contains data and programs about the Figure4, Figure S7
* Figure 4\data\1928strain_country_infor_1.xlsx and Figure 3\data\1928strain_country_infor_1.xlsx are the same data file, recording the isolation source information of 1928 strains of Pseudomonas.
The folder "./Figure5/" contains data and programs about the Figure5
* Figure5\rec_num records the receptor number contained in each strain of 1928 Pseudomonas.
Code/software
The main software used is MATLAB (no version requirement). In addition, the phylogenetic tree was constructed utilizing the PhyloPhlAn3 pipeline (doi:10.1038/s41467-020-16366-7). PhyloPhlAn is a comprehensive pipeline that encompasses the identification of phylogenetic markers, multiple sequence alignment, and the inference of phylogenetic trees. In this analysis, we employed over 400 universal genes defined by PhyloPhlAn as our selected phylogeny markers. The network Figures were made using Cytoscope 3.7.1, and the final presentation of the phylogenetic tree in Figure S4 was achieved using R 4.2.1.
Access information
Other publicly accessible locations of the data:
Methods
All genome data were collected from the open dataset (e.g., Pseudomonas Genome Database), the siderophore synthetase and receptor genes were obtained through genomic data mining. Please check the corresponding article for details.