Decoupling silicon metabolism from carbon and nitrogen assimilation poises diatoms to exploit episodic nutrient pulses in a coastal upwelling system
Data files
Feb 20, 2024 version files 3.63 GB
-
DYEatom_dedup_annot.csv
-
DYEatom_inc_counts_annot.csv
-
DYEatom_Inc_counts.csv
-
dyeatom_inc_noRNA_orfs_filtered.faa
-
dyeatom_inc_noRNA_orfs_filtered.ffn
-
README.md
Abstract
Diatoms serve as the major link between the marine carbon (C) and silicon (Si) biogeochemical cycles through their contributions to primary productivity and requirement for Si during cell wall formation. Although several culture-based studies have investigated the molecular response of diatoms to Si and nitrogen (N) starvation and replenishment, diatom silicon metabolism has been understudied in natural populations. A series of deckboard Si-amendment incubations were conducted using surface water collected in the California Upwelling Zone near Monterey Bay. Steep concentration gradients in macronutrients in the surface ocean coupled with substantial N and Si utilization led to communities with distinctly different macronutrient states: replete (‘healthy’), low N (‘N-stressed’), and low N and Si (‘N- and Si-stressed’). Biogeochemical measurements of Si uptake combined with metatranscriptomic analysis of communities incubated with and without added Si were used to explore the underlying molecular response of diatom communities to different macronutrient availability. Metatranscriptomic analysis revealed that N-stressed communities exhibited dynamic shifts in N and C transcriptional patterns suggestive of compromised metabolism. Expression patterns in communities experiencing both N and Si stress imply that the presence of Si stress may partially ameliorate N stress and dampen the impact on organic matter metabolism. This response builds upon previous observations that the regulation of C and N metabolism is decoupled from Si limitation status, where Si stress allows the cell to optimize the metabolic machinery necessary to respond to episodic pulses of nutrients. Several well-characterized Si-metabolism associated genes were found to be poor molecular markers of Si physiological status; however, several uncharacterized Si-responsive genes were revealed to be potential indicators of Si stress or silica production.
README: Assembly, annotation, and read counts from silicic acid and nitrogen stress experiments of marine microbial eukaryotes
https://doi.org/10.5061/dryad.3bk3j9kr4
More information about the collection, preparation, and analysis of samples can be found within the associated full length article.
Associated article: Decoupling silicon metabolism from carbon and nitrogen assimilation poises diatoms to exploit episodic nutrient pulses in a coastal upwelling system
Michael A. Maniscalco, Mark A. Brzezinski, Jeffrey W. Krause, Kimberlee Thamatrakoln
doi: 10.3389/fmars.2023.1291294
File descriptions:
- DYEatom_Inc_counts.csv: Counts for all unique orfs within the metatranscriptomic assembly.
- columns:
- transcript- id of each transcript in metatrancriptome assembly
- Si2control1_S94 - read counts mapped to each transcript from biological rep 1, control sample at Station 2
- Si2control2_S95 - read counts mapped to each transcript from biological rep 2, control sample at Station 2
- Si2control3_S96 - read counts mapped to each transcript from biological rep 3, control sample at Station 2
- Si2plusSi1_S97 - read counts mapped to each transcript from biological rep 1, Si addition sample at Station 2
- Si2plusSi2_S98 - read counts mapped to each transcript from biological rep 2 Si addition sample at Station 2
- Si3control1_S82 - read counts mapped to each transcript from biological rep 1, control sample at Station 4
- Si3control2_S83 - read counts mapped to each transcript from biological rep 2, control sample at Station 4
- Si3control3_S84 - read counts mapped to each transcript from biological rep 3, control sample at Station 4
- Si3plusSi1_S85 - read counts mapped to each transcript from biological rep 1, Si addition sample at Station 4
- Si3plusSi2_S86 - read counts mapped to each transcript from biological rep 2, Si addition sample at Station 4
- Si3plusSi3_S87 - read counts mapped to each transcript from biological rep 3, Si addition sample at Station 4
- Si8control1_S88 - read counts mapped to each transcript from biological rep 1, control sample at Station 11
- Si8control2_S89 - read counts mapped to each transcript from biological rep 2, control sample at Station 11
- Si8control3_S90 - read counts mapped to each transcript from biological rep 3, control sample at Station 11
- Si8plusSi1_S91 - read counts mapped to each transcript from biological rep 1, Si addition sample at Station 11
- Si8plusSi2_S92 - read counts mapped to each transcript from biological rep 2, Si addition sample at Station 11
- Si8plusSi3_S93 - read counts mapped to each transcript from biological rep 3, Si addition sample at Station 11
- NAs represent transcripts without counts in a given sample. It can not be called a zero as that would represent an absence of that transcript in a sample, and deeper sequencing may identify a low level presence.
- DYEatom_dedup_annot.csv: Taxonomic and functional annotation for all unique orfs within the metatranscriptomic assembly.
- columns:
- taxon_id,
- qseqid- transcript id from metatrancriptome (queryid)
- sseqid- subject id of gene within marineref2 database
- evalue- evalue from blast search of qseq against subject seq (sseq) of gene in marineref2 database
- bitscore- from blast search of qseq against subject seq (sseq) of gene in marineref2 database
- organism- marineref2 taxonomic annotation down to species
- original_annot- marineref2 functional annotation
- sfams_annot_id- marineref2 assigned annotation
- protcluster_annot_id- marineref2 assigned annotation
- subject_id- marineref2 assigned annotation of phylodb database. Values are pipe delimited label then value for gi, tr, kegg_gene_id
- phylodb_protein- from blast search of qseq against subject seq (sseq) of gene in marineref2 database. Protein level functional annotation
- phylodb_taxon_id- from blast search of qseq against subject seq (sseq) of gene in marineref2 database. NCBI taxid.
- tigrfam_accession- from blast search of qseq against subject seq (sseq) of gene in marineref2 database
- tigrfam_description- from blast search of qseq against subject seq (sseq) of gene in marineref2 database
- pfam_name- from blast search of qseq against subject seq (sseq) of gene in marineref2 database
- pfam_accession- from blast search of qseq against subject seq (sseq) of gene in marineref2 database
- pfam_description- from blast search of qseq against subject seq (sseq) of gene in marineref2 database
- sag_name- from blast search of qseq against subject seq (sseq) of gene in marineref2 database
- sag_cluster- from blast search of qseq against subject seq (sseq) of gene in marineref2 database
- superfamily_description- from blast search of qseq against subject seq (sseq) of gene in marineref2 database
- swissprot_name- from blast search of qseq against subject seq (sseq) of gene in marineref2 database
- ,swissprot_description- from blast search of qseq against subject seq (sseq) of gene in marineref2 database
- interpro_number- from blast search of qseq against subject seq (sseq) of gene in marineref2 database
- KO- Assigned from GHOST search
- tax1- Kingdom level taxonomic annotation
- tax2- Phylum level taxonomic annotation
- ncbi_genus- genus level taxonomic annotation
- kegg_gene_id- KEGG gene_id
- GHOSTX_score- GHOST alignment search score. GHOST is used from KEGG server as a faster alternative to BLAST
- phylum.R- taxonomy assigned by NCBI taxid lookup table (https://github.com/marchettilab/metatranscriptomicsPipeline/blob/master/NCBI_keeling_masterlist.csv)
- class.R- taxonomy assigned by NCBI taxid lookup table (https://github.com/marchettilab/metatranscriptomicsPipeline/blob/master/NCBI_keeling_masterlist.csv)
- order.R- taxonomy assigned by NCBI taxid lookup table (https://github.com/marchettilab/metatranscriptomicsPipeline/blob/master/NCBI_keeling_masterlist.csv)
- family.R- taxonomy assigned by NCBI taxid lookup table (https://github.com/marchettilab/metatranscriptomicsPipeline/blob/master/NCBI_keeling_masterlist.csv)
- genus.R- taxonomy assigned by NCBI taxid lookup table (https://github.com/marchettilab/metatranscriptomicsPipeline/blob/master/NCBI_keeling_masterlist.csv)
- species.R- taxonomy assigned by NCBI taxid lookup table (https://github.com/marchettilab/metatranscriptomicsPipeline/blob/master/NCBI_keeling_masterlist.csv)
- keeling.R- taxonomy assigned by NCBI taxid lookup table (https://github.com/marchettilab/metatranscriptomicsPipeline/blob/master/NCBI_keeling_masterlist.csv)
- strain- taxonomy assigned from marineref2 database annotation
- species- taxonomy assigned from marineref2 database annotation
- genus- taxonomy assigned from marineref2 database annotation
- family- taxonomy assigned from marineref2 database annotation
- order- taxonomy assigned from marineref2 database annotation
- class- taxonomy assigned from marineref2 database annotation
- phylum- taxonomy assigned from marineref2 database annotation
- kingdom- taxonomy assigned from marineref2 database annotation
- superkingdom- taxonomy assigned from marineref2 database annotation
- NAs fill cells for transcripts that did have a given type of annotation to assign
- DYEatom_inc_counts_annot.csv: Counts for each orf with taxonomic annotation.
- joined version of the "DYEatom_dedup_annot.csv" and "DYEatom_Inc_counts.csv" files with descriptions matching those of the above tables
- rows with no counts in any samples were removed
- dyeatom_inc_noRNA_orfs_filtered.ffn: Nucleotide sequences for orfs identified from the metatranscriptomic assembly.
- dyeatom_inc_noRNA_orfs_filtered.faa: Amino acid sequences for orfs identified from the metatranscriptomic assembly.
Methods
Metatranscriptome analysis
Seawater samples for metatranscriptome analysis were collected by filtration at <5 psi onto 47 mm, 1.2 µm pore-size polycarbonate membranes, flash frozen in liquid N2, and stored at -80°C. Total RNA was extracted from filters using the Trizol-RNeasy method with an additional bead beating step. In brief, filters were transferred to tubes containing RNase/DNase free 100 µm zirconia/silica beads with 1 mL of Trizol reagent, vortexed for 2 min, incubated at room temperature for 5 min, vortexed again for 2 min before following the standard Trizol RNA extraction protocol. The upper aqueous phase was transferred to a clean 1.5 mL tube with an equal volume of 70% alcohol, mixed, and added to QIAgen RNAeasy column. The standard RNeasy protocol with the optional on-column DNase treatment was followed.
RNA was quantified using a Qubit® RNA HS Assay, and RNA integrity was determined using the Agilent Bioanalyzer RNA 6000 Pico kit eukaryotic assay. Total RNA (500 ng) was used for library prep with Illumina TruSeq RNA Library Prep kit v2 with mRNA purified using Oligo-dT bead capture polyA tails. Library quality was assessed on an Agilent Bioanalyzer and quantified via Qubit before being pooled in equal quantities. The pooled libraries were sequenced at UC-Davis Genome Center on HiSeq 4000 in lanes of paired-end-150bp flow cell.
Primer and adapter sequences were removed from raw sequencing reads with trimmomatic v0.38 (Bolger et al., 2014). Paired-end reads were merged with interleave-reads.py from the khmer package v2.1.2 before removal of rRNA reads with sortmeRNA v2.0 utilizing the built in SILVA 16s, 18s, 23s, and 28s databases (Kopylova et al., 2012). Remaining reads were separated (using the deinterleave_fastq.sh https://gist.github.com/nathanhaigh/3521724) and assembled into contiguous sequences (contigs) with megahit v1.1.3 using ‘meta-large’ setting (Li et al., 2015). Open reading frames (ORFs) were predicted from assembled contigs using FragGeneScan v1.31 (Rho et al., 2010), and orfs shorter than 150 bp were removed. Reads counts for remaining orfs were estimated using salmon v0.6.0 `quant` quasi-mapping with seqBias and gcBias features.
Orfs were annotated based on best homology (lowest E-value) using BLASTP v.2.5.0+ with an E-value threshold of <10-3. Taxonomic and functional annotations were assigned using the MarineRefII (roseobase.org/data), a custom databased maintained by the Moran Lab at the University of Georgia that includes sequences from PhyloDB v.1.076 which contains 24,509,327 peptides from 19,962 viral, 230 archaeal, 4,910 bacterial and 894 eukaryotic taxa, including peptides from KEGG, GenBank, JGI, ENSEMBL, CAMERA and 410 taxa from the Marine Microbial Eukaryotic Transcriptome Sequencing Project (MMETSP), as well as the associated KEGG functional and NCBI taxonomic annotations through a related SQL database. NCBI taxonomic annotations were further curated (Cohen et al., 2017); (https://github.com/marchettilab/metatranscriptomicsPipeline) to ensure the use of consistent taxonomic ranks. KEGG Ortholog (KO) classifications of diatom urea transporters, nitrite reductase, ammonium transporters were manually verified against known gene phylogenies and edited accordingly (Smith et al., 2019).
Silaffin-like genes were identified in Pseudo-nitzschia multiseries (CLN-47 v1) genome, as described previously (Scheffel et al., 2011), using an initial signal peptide screen of translated nucleotide sequences and subsequence sliding window search for amino acid (AA) sequences containing a 100-2000 long segment with ≥10% lysine and ≥18% serine residues. Full length P. multiseries putative silaffin amino acid sequences were subsequently screened for the presence the characteristic silaffin-like motifs using the FIMO program within the Multiple EM for Motif Elicitation (MEME suite v.5.4.1) motif scanning software package (Grant et al., 2011). Additional functional annotation was performed using BLASTP with the NCBI nr database (E-value <10-10) and KEGG GhostKOALA (score <40). This process identified 157 putative unique P. multiseries silaffin-like genes (PNSLs) which were used with BLASTP+ v2.5.0 (E-value <10-5) to identify homologous sequences within our metatranscriptome assembly. To identify the distribution of these genes in diatoms and other microalgae, BLASTP was used against a database containing the 67 translated transcriptomes from the MMETSP (Keeling et al., 2014). PNSLswith no significant hits (E-value <10-10) to non-diatom genes were considered “diatom-specific”.
For genes of interest that did not have an associated KO number, such as silicon transporters (SITs) and silicanin-1 (Sin-1), manual annotation was performed as previously described (Durkin et al., 2016; Brembu et al., 2017a; Kotzsch et al., 2017) using BLASTP+ v2.5.0 best hit with e-value cutoff of 10-5. Putative SIT sequences were further classified to clade designations, using a maximum-likelihood tree constructed from a reference alignment (Durkin et al. 2016) with RAxML version 8.2.12 – PROTGAMMAWAGF substitution model and 100 bootstrap replicates (Stamatakis, 2014). A HMM-profile was constructed from the reference alignment (using hmmbuild 3.2.1) and used with hmmalign to align amino acid sequences corresponding to SIT orfs. Alignment and tree files were packaged using taxtastic v.0.8.3, and SIT orfs were placed on the reference tree using pplacer v.1.1.alpha19 with posterior probability calculated (Matsen et al., 2010). The most closely related reference sequence was assigned to each SIT orf using guppy v.1.1.alpha19 (Matsen et al., 2010).
Prior to differential abundance analysis, raw counts were aggregated within diatom genera by KO number. For genes lacking a KO number, e.g. SITs, ISIPs, Sin1, etc., aggregation was done based on gene assignment through KEGG annotation or BLASTX query of supplemental databases. Genus-specific aggregation of functionally annotated reads reduces redundancy and allows the use of tools originally designed for single organism RNAseq analysis (e.g. DESeq2, edgeR). This is necessary for microbial community transcriptomic analysis due to methodological and computational difficulties in resolving species-level, differential transcript abundance (Toseland et al., 2014; Alexander et al., 2015; Kopf et al., 2015; Cohen et al., 2017, 2021; Hu et al., 2018; Lampe et al., 2018). However, this approach does not resolve species-level contig abundance, nor does it a priori resolve clade-specific abundance patterns for genes that belong to multigene families. In the case of SITs, individual clades were identified through phylogenetic analysis as described above, and manually annotated to allow interrogation of SIT transcripts at the clade level. Additionally, annotations for urea transporters 1 and 2 were grouped together because of their close phylogenic relationship and shared abundance pattern in response to nitrogen supply (Smith et al., 2019). A similar approach for other multigene families, such as ammonium transporters and nitrate transporters, was not used because abundance of these genes in response to nitrogen supply does not appear to be clade-specific (Smith et al., 2019).
Metabolic pathway groupings
Genes indicative of nitrogen limitation were selected based on culture-based studies characterizing the response of the diatom Phaeodactylum tricornutum to depletion and replenishment of nitrate, nitrite, and ammonia (Smith et al., 2019). Using independent transcriptomic studies of Si-depleted and replenished Thalassiosira pseudonana cultures (Shrestha et al., 2012; Smith et al., 2016a; Brembu et al., 2017a), we identified a subset of genes conserved across those studies in either their response to silicon starvation or association with silica production and denote these here as SiLA (Si-limitation associated) or SiPA (silica production associated), respectively. SiLA genes were identified as those with a >2-fold increase in transcript abundance >4 h-post Si-starvation and that did not increase expression in response to Si replenishment. Similarly, SiPA genes were identified as those with a >2-fold increase in transcript abundance >4 h-post Si replenishment and did not increase expression in response to Si starvation. The 2-fold threshold was used to provide confidence that the response would be robust while the response time of >4 h was chosen to exclude genes involved in the initial response to changes in Si(OH)4 concentration and ensure that the change in expression was persistent. The rationale for this is that for the expression of a gene to be useful as a molecular marker in natural communities, it should be robust and persistent given that the biogeochemical history of a phytoplankton community is usually unknown prior to sampling. Putative T. pseudonana SiLA and SiPA genes were screened for RXL and KXXK silaffin motifs as described above. Gene annotations included for carbon metabolism included genes from T. pseudonana CCMP 133, F. cylindrus CP 1102, and P. tricornutum CCAP 1055 included in the KEGG modules: carbon fixation in photosynthetic organisms, photosynthesis, photosynthesis - antennae protein, as well as KEGG class-3 modules for ATP synthesis and carbon fixation. Genes annotated as carbonic anhydrase, DIC uptake, and flavodoxin were also included.
Statistical analysis
The normalization and differential abundance analysis of metatranscriptome data was analyzed within each taxonomic group using edgeR v.3.32.1 (Robinson et al., 2010). Significance between samples was determined using exactTest with tagwise dispersion and corrected for multiple testing with a significance threshold of FDR <0.05 (Benjamini and Hochberg, 1995). Significant differential abundance of genes related to carbon, nitrogen, and silicon metabolism was visualized through heatmaps. Differential enrichment of gene sets was assessed using mroast within the limma v.3.50.3 package with default settings (Wu et al., 2010) which assigns statistical significance to a set of genes within a metabolic pathway of interest and returns the number of genes detected (n), the percent of the gene set that was differentially expressed, and the p-value.
Principle component analysis was performed using the log-transformed normalized counts for the 2500 most variable genes and supplemental environmental data. PCA biplots were created with ggplot2 v.3.3.5 and factoextra v.1.0.7.