Diversity, adaptation and metabolic potential of the microbiome in biofilms from a high-temperature hot spring

Goh, KianMau 1 ; Tan, Jia Hao1; Liew, Kok Jun1 ; Sani, Rajesh K.2 ; Pointing, Stephen B.3 ; Chan, Kok-Gan4

Published May 14, 2024 on Dryad. https://doi.org/10.5061/dryad.sn02v6xcd

Abstract

Hot spring microbiomes have garnered significant research attention from exploring the diversity of prokaryotic communities to genes and functional potentials. While cyanobacteria-rich biofilms, characterized by warm temperatures, have been extensively studied, there is limited investigation into high-temperature streamer biofilm communities (SBC) devoid of photosynthetic ability. Here, we studied the biofilm of a Dusun Tua (DT) hot spring with a temperature of 75°C and a pH of 7.6. This grey-tan colored biofilm appeared at sites where water had slowed down following the deposition of plants and inorganic debris along the hot spring after a flood event. Amplicon sequencing of V3-V4 regions of 16S rRNA showed that dominant phyla included the Aquificota, Chloroflexota, and Desulfobacterota together with other abundant amplicon sequence variants from the Bacteroidota, Deinococcota, Hydrothermae and Armatimonadota. These microbial populations appeared to be distinct from other reported SBCs from Yellowstone National Park in the USA and Rehai Hot Springs in China. Additional shotgun sequencing of the DT biofilm revealed functional insights which were compared to counterparts obtained from low-temperature biofilms to identify possible thermophilic traits. GC content of tRNA and amino acid preferences were found to be clear indicators of thermophilicity. However, other signatures such as reverse gyrase, heat shock proteins, and average GC content of the genome may not be reliable indicators. The genome-centric analyses revealed that DT biofilm members were primarily chemo-organoheterotrophic, chemolithoautotrophic, and chemolithoheterotrophic. We speculate that the biofilm could utilize plant litter as carbon sources, but the efficiency of this process is estimated to be low due to rapid water flux that would rapidly remove dissolved organic carbon. The results of this study enhance current understanding of microbial diversity, thermal adaptation and metabolic processes related to carbon, nitrogen, sulfur and other metabolisms for hot springs in tropical climates with high allochthonous plant litter inputs.

# README: Diversity, adaptation and metabolic potential of the microbiome in biofilms from a high-temperature hot spring

https://doi:10.5061/dryad.sn02v6xcd 

Raw reads for amplicon and shotgun sequencing data are publicly available in NCBI with BioProject number PRJNA1082868 and BioSample accessions SAMN40219291 and SAMN40219292. The contigs for medium- and high-quality metagenome-assembled genomes (MAGs) are available in this open-access repository Dryad platform. This dataset are the MAGs fasta files of a biofilm found in a hot spring.  


## Description of the data and file structure

The MAGs information can be used as part of microbiology and bioinformatic analyses. This may include genome-to-genome comparison between individual MAG with closely related MAG or cultured microbial genomes. The fasta files can be used for phylogenetic tree construction. Besides, the files have ample of genes sequences suitable for gene mining.There is no missing data codes. The files are named from bin.1.XXX.fa to bin.61.XXX.fa, in which XXX could be orig, strict, or permissive, a short-note (designation) generated by metaWRAP that describe the approach for binning process. Please refer to metaWRAP (https://github.com/bxlab/metaWRAP) to understand the differences. The phyla or the affiliation for each MAGs is not provided in this dataset. Users who are interested may read the related published article.


## Sharing/Access information

Raw reads for amplicon and shotgun sequencing data are publicly available in NCBI with BioProject number PRJNA1082868 and BioSample accessions SAMN40219291 and SAMN40219292. 

Links to other publicly accessible locations of the data:
https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA1082868




## Code/Software

No special code was used for this work.Standard procedure was made accordance to the steps provided in this link: https://github.com/bxlab/metaWRAP

Methodology

Sampling and DNA extraction

The Dusun Tua (DT) hot spring is located at 3.136838613811952, 101.83446401515266. Sampling trips were performed in July 2022 (labelled DT-7) and November 2022 (DT-11). Samples of the gray-tan biofilm were randomly collected and immediately transported to the laboratory and frozen at −20 °C within 12 h. The bulk genomes of the samples were extracted using SPINeasy^TMDNA extraction kit (MP Biomedicals, Irvin, CA, USA) according to the manufacturer’s recommended protocol. DT biofilm samples were lysed via bead beating, following 3 cycles (20 Hz) for 1 minutes each. The extracted DNA were evaluated using a Nanodrop^TM 1000 spectrophotometer (Thermo Scientific, Wilmington, DE, USA), a Qubit 3.0 Fluorometer (Invitrogen, Merelbeke, Belgium) and with a 1% w/v agarose gel electrophoresis.

Shotgun sequencing and data processing

The duplicated extracted metagenomic DNA from each sampling time were pooled prior to fragmentation by a Covaris sonicator (Covaris, Woburn, USA). Then, the fragmented DNA was used for dual-indexed, paired-end library construction following the Illumina DNA Prep kit protocol (Illumina, San Diego, USA). The constructed library samples were analyzed in Qubit 3.0 Fluorometer and Agilent Bioanalyzer 2100 (Agilent Technologies, Palo Alto, USA). Shotgun metagenome sequencing was carried out in an Illumina NovaSeq 6000 with the running mode of PE 150 (paired-end 150 bp). A minimum of 20 Gb (equivalent to approximately 66.5 million paired end reads) output was reserved for each sample. The resulting raw pair-end reads were trimmed and filtered using seqtk-v1.4. In addition, the identical genomic sample undergone Nanopore long-read sequencing. We conducted library preparation directly following the Nanopore-recommended protocol SQK-NBD112-24 (Q20+ chemistry on R10.4 flow cell), omitting genome shearing to prioritize obtaining longer reads. The Nanopore Ligation Sequencing Kit SQK-LSK112 protocol was adhered to, including AMPure treatment, adapter ligation, purification, and Qubit quantification. Samples were loaded onto a MinION R10.4 flow cell within Mk1C, operating with MinKNOW v.22.05.8. Basecalling was performed using MinKNOW coupled with Guppy v6.1.5, configured with FLO-MIN112-Super-Accurate settings. After basecalling, the process involved the removal of barcodes, trimming and filtering reads, followed by subjecting those meeting Q20 criteria (<5% error) to de novo assembly using Flye v2.9 (parameters: --nano-hq --meta). Co-assembly was then performed with the pooled Illumina clean reads (DT_7 and DT_11) along with Nanopore long reads data using MegaHIT v1.29.

The taxonomy classification of the assembled contigs were carried out using kraken2 v2.1.3. The tRNAs within the metagenome were identified using tRNAscan-SE v2.0.12. The genes or coding region of the metagenome were predicted using Prodigal v2.6.3. Assessment of thermophilic adaptation is performed via comparison between the thermophilic DT sample and the psychrophilic GF_Ru and GF_NZ samples (0.5-7.7°C, pH 7.54-7.68) based on several parameters. The raw reads for GF_Ru and GF_NZ samples were obtained from NCBI and contigs were generated using the procedure described earlier. The overall and tRNA GC content of the samples were obtained by running a python script against the contigs and tRNA sequence. The amino acid composition of the samples were also obtained by running a python script against the coding region of the samples. Sequences of antifreeze proteins were retrieved from UniProt database. The reference sequence of reverse gyrases were also retrieved from the UniProt Reviewed protein database. Genes encoding heat shock proteins were identified from a manually curated database, heat shock protein information resource (HSPIR). The proteins of interest were then identified using DIAMOND v2.1.9 based of the aforementioned databases.

Metagenome-assembled genome and metabolic pathway reconstruction

Co-assembled contigs of Illumina shotgun data and Nanopore data were used to construct MAGs using the binning tools MetaBAT2 v2.12.1, CONCOCT v1.0.0, and MaxBin2 v2.2.6 followed by the bin refinement & reassembly module of metaWRAP v1.3. The quality of the MAGs were assessed using CheckM v1.2.2. Taxonomic classification of the MAGs was performed using GTDB-tk v2.3.2 with the Genome Taxonomy Database (GTDB) as a reference. The phylogenetic trees of the MAGs were constructed using IQ-TREE v2.0.6 with default parameters and using ModelFinder for substitution model selection. The phylogenetic trees generated were subsequently visualized via iTOL. The metabolic functional profiles of the MAGs are predicted using METABOLIC v4.0.

Diversity, adaptation and metabolic potential of the microbiome in biofilms from a high-temperature hot spring

Data files

Abstract

README

Methods

Sampling and DNA extraction

Shotgun sequencing and data processing