Skip to main content
Dryad

Metagenomic bins and biosynthetic gene clusters in gut bacteria of turtle ants

Cite this dataset

Duplais, Christophe (2021). Metagenomic bins and biosynthetic gene clusters in gut bacteria of turtle ants [Dataset]. Dryad. https://doi.org/10.5061/dryad.s7h44j168

Abstract

Cephalotes are herbivorous ants (>115 species) feeding on low-nitrogen food sources and they rely on gut symbionts to supplement their diet in nutrients by recycling nitrogen food waste into amino acids. These conserved gut symbionts, composed of five bacterial orders, have been studied previously for their primary nitrogen metabolism, however little is known about their ability to biosynthesize specialized metabolites which can play a role in bacterial interactions between communities living in close proximity in the gut. We investigated the diversity of biosynthetic gene clusters (BGCs) producing specialized metabolites in the genomes and metagenomes of conserved gut symbionts by studying 17 Cephalotes species collected across several geographical areas. Our results reveal that (1) mining metagenomes and genomes show complementary results to retrieve BGCs especially when bacterial isolates are difficult to culture, (2) the conserved gut symbionts involved in the nutritional symbiosis have a large diversity of BGCs of different chemical families, (3) the phylogenetic analysis of BGCs encoding the production of arylpolyenes, non-ribosomal peptides (NRP), polyketides (PK), and siderophores shows high similarity between BGCs of a single symbiont across different ant host species, and between BGCs originated from different bacterial orders within a single host species. Additionally, the diversity of BGCs was found in four of the five conserved symbionts co-occurring in the hindgut except for one major player (Opitutales) localized alone in the midgut and lacking BGCs. This spatial isolation prevents direct interaction of Opitutales with other symbionts and suggesting that BGCs have an essential role for symbionts living in close proximity. These findings together pave the way for studying the mechanisms of BGCs conservation and evolution in gut symbionts genomes and the role of bacterial specialized metabolites involved in multipartite mutualism with Cephalotes turtle ants.

Methods

Genomes and metagenomes analysis

The 14 genomes of cultured gut bacteria and 18 metagenomes of Cephalotes gut bacteria were obtained from JGI-IMG version 5.0 (Table S1 and S2 respectively) from the previous projects Gs0085494 (“Cephalotes varians microbial communities from the Florida Keys, USA”), Gs0114286 (“Symbiotic bacteria isolated from Cephalotes varians”), Gs0117930 (“Cephalotes ants gut microbiomes”) and Gs0118097 (“Symbiotic bacteria isolated from Cephalotes rohweri”).

The 14 cultured isolate genomes are all part of the Cephalotes core microbiome: two Cephaloticoccus genomes (order: Opitutales, class: Opitutae; one genome from each ant host),  four Ventosimonas genomes (order: Pseudomonadales, class: Gammaproteobacteria; one genome from C. varians and three genomes from C. rohweri), six Burkholderiales genomes (class: Betaproteobacteria; four genomes from C. varians and two genomes from C. rohweri), one Rhizobiales genome (class: Alphaproteobacteria;  from C. varians), and one Xanthomonadales genome (class: Gammaproteobacteria; from C. rohweri). All these bacteria belong to the Proteobacteria phylum, except Opitutales which are part of the Verrucomicrobia phylum.

The metagenomic data used in this study have two metagenomes from the same C. varians species. However, for the metagenome of C. varians PL010W the sequencing quality is very different from all the other metagenomes (number of reads, GC content) including C. varians PL005W, therefore it was excluded in our analysis. The metagenomes were analyzed via the software Anvi’o version 5.5 to sort the different bacterial families composing each metagenome into distinct bins. In this analysis, the fasta sequence of a metagenome is used to create a contig database and a profile database. Then, this contig database is visualized and bins are manually created to maximize completeness while minimizing contamination. Finally, the software CheckM version 1.1 was used to verify the completeness, contamination and strain homogeneity of each bin, and to identify the taxonomic lineage of each bin.

Bacterial biosynthetic gene clusters analysis

The bacterial biosynthetic gene clusters (BGCs) of each genome and each metagenomic bin were analyzed with antiSMASH version 5.0 with the following analysis options: strict detection, and activation of search for KnownClusterBlast, ClusterBlast, SubClusterBlast and Active site finder. BGCs smaller than 5kb were then filtered out of the data and were not included. The taxonomic classification of each cluster was verified to the genus level using the software Blast+. There was no identification conflict between CheckM and Blast+. NaPDos (https://npdomainseeker.sdsc.edu/, accessed July 2020) was used to classify the ketosynthase (KS) domain and condensation (C) domain sequences of NRP and PK retrieved from the genome and metagenome mining analysis and to infer the KS and C phylogenies.

 

 

Usage notes

The data contain:

(1) the metagenomic bins (name of the file: Chanson_et_al_Cephalotes_metagenomic_bins) in ".fasta" files. The file name corresponds to the name of the ant from which each metagenomic bin are associated.

(2) the biosynthetic gene clusters detected in the genomes of culture isolate bacterial symbionts (name of the file: Chanson_et_al_Cephalotes_genomes_BGCs) in ".gbk" files. The file name corresponds to the name of the ants from which each BGC was obtained.

(3) the biosynthetic gene clusters detected in the genomes of culture isolate bacterial symbionts (name of the file: Chanson_et_al_Cephalotes_genomes_BGCs) in ".gbk" files. The file name corresponds to the name of the ants from which each BGC was obtained follow by the number of the metagenoic bin containing the BGCs.

Funding

Agence Nationale de la Recherche, Award: ANR-10-LABX-25-01

National Science Foundation, Award: NSF DEB 1900357