Skip to main content

Analysis of nitrous oxide reductase diversity from wastewater: a SINTAX database

Cite this dataset

Schacksen, Patrick; Nielsen, Jeppe (2024). Analysis of nitrous oxide reductase diversity from wastewater: a SINTAX database [Dataset]. Dryad.


This study explores the genetic landscape of nitrous oxide (N2O) reduction in wastewater treatment plants (WWTPs) by profiling 1083 high-quality metagenome-assembled genomes (HQ MAGs) derived from 23 Danish full-scale WWTPs. The analysis focuses on the distribution and diversity of nitrous oxide reductase (nosZ) genes, key players in N2O reduction, and their connection to other nitrogen metabolism pathways. A custom pipeline for clade-specific nosZ gene identification outperformed existing methods, revealing the presence of 503 nosZ sequences in 489 MAGs. Notably, 48.7% of the MAGs harboured nosZ genes, with clade II dominating (92.3%).

Taxonomic profiling reveals the distribution of nosZ clade I and clade II-containing MAGs, emphasizing the dominance of Bacteroidota and Pseudomonadota. Notably, Chloroflexota exhibits unexpected affiliations with nosZ clade I. The taxonomic diversity of non-denitrifying N2O-reducers is also explored, highlighting the presence of these organisms in Bacteroidota, Chloroflexota, and other phyla.

README: Analysis of nitrous oxide reductase diversity from wastewater: a SINTAX database

The data is a SINTAX formatted database containing 443 full-length clade-specific nosZ sequences. The data was sourced from Singleton et al., 2021.

Description of the data and file structure

The data is a SINTAX formatted database (fasta sequences), which contains fasta files and joined taxonomic information.

To use this format of database, software such as ONT-AmpSeq could be used to map nosZ amplicon sequencing data from Oxford Nanopore to the database.

The processing of identifying and filtering the sequences is described in: Unraveling the genetic potential of nitrous oxide reduction in wastewater treatment: Insights from metagenome-assembled genomes - publication in process.

Sharing/Access information

Software available on GitHub (see link in Related Works).


High-quality metagenome-assembled genomes (HQ MAGs; 1083 in total) were obtained from 23 Danish full-scale WWTPs (Singleton et al., 2023; Initially, these HQ MAGs underwent processing with Prodigal v2.6.2 to predict protein-coding genes, which were then isolated and translated into proteins. The identified nucleotide genes were compared to the NCBI GenBank v234 using the BLASTn algorithm and annotated using KEGG elements through EnrichM v0.5.0 to identify nosZ genes. The translated proteins were aligned to high-quality full-length clade I (n=20) and II (n=46) NosZ protein sequences from the Functional Gene Pipeline and Repository (FUNGENE) database version v9.9.11 using the BLASTp algorithm. Subsequently, the translated proteins were aligned to 3 full-length NosZ HMM files (1 clade I (638aa), 2 clades II (765, 656aa)) obtained from FUNGENE using the hmmsearch algorithm. Identified nosZ genes were manually filtered based on length criteria (1050-2200bp or 350-800aa). The respective taxonomy of the individual MAGs was associated with the identified nosZ genes they originated from and retained throughout the analysis. The identified nosZ genes were aligned using MUSCLE v5.0.1428 and used to construct a maximum-likelihood phylogenetic tree with IQ-TREE v2.0. The best-fit model and 1000 ultrafast bootstrap iterations were employed to manually filter out misclassified genes.


Independent Research Fund Denmark, Award: 9041-00367B, Technology and Production Sciences