Transcriptomic analysis of light-induced genes in Nasonia vitripennis: possible implications for circadian light entrainment pathways

Wang, Yifan 1 ; Wertheim, Bregje1 ; Beukeboom, Leo1 ; Hut, Roelof1

Published Sep 12, 2023 on Dryad. https://doi.org/10.5061/dryad.jq2bvq8dq

Abstract

Circadian entrainment to the environmental day-night cycle is essential for the optimal use of environmental resources. In insects, opsin-based photoreception in the compound eye and ocelli, and CRYPTOCHROME1 (CRY1) in circadian clock neurons are thought to be involved in sensing photic information, but genetic regulation of circadian light entrainment in species without light-sensitive CRY1 remains unclear. To elucidate a possible CRY1-independent light transduction cascade, we analysed light-induced gene expression through RNA-sequencing in Nasonia vitripennis. Entrained wasps were subjected to a light pulse in the subjective night to reset the circadian clock and light-induced changes in gene expression were characterized at four different time points in wasp heads. We used co-expression, functional annotation, and transcription factor binding motif analyses to gain insight into the molecular pathways in response to acute light stimulus and form a hypothesis about the circadian light resetting pathway. Maximal gene induction was found after 2h of light stimulation (1432 genes), including the opsin opblue and the core clock genes cry2 and npas2. Pathway and cluster analyses revealed light activation of glutamatergic and GABA-ergic neurotransmission, including CREB and AP-1 transcription pathway signalling. This suggests that circadian photic entrainment in Nasonia may require pathways that are similar to mammals. We propose a model for hymenopteran circadian light resetting that involves opsin-based photoreception, glutamatergic neurotransmission, and gene induction of cry2 and npas2 to reset the circadian clock.

Provenance for this README

File name: README.MD
Authors: Yifan Wang
Other contributors: Bregje Wertheim, Leo W. Beukeboom, Roelof A. Hut
Date created: 2022-12-31

Dataset Version and Release History

Current Version:
- Number: 1.0.0
- Date: 2023-02-14
- Persistent identifier:
- Summary of changes: n/a

Dataset Attribution and Usage

Dataset Title: Data for the article "Transcriptomic analysis of light-induced genes in Nasonia vitripennis: possible implications for circadian light entrainment pathways"
Persistent identifier: DOI
Dataset contributors: Yifan Wang, Leo W. Beukeboom, Bregje Wertheim, Roelof A. Hut
Dataset citation:

\> Wang\, Y\,

Contact Information

Name: Yifan Wang
Affiliations: Evolutionary genetics group, Groningen Institute for Evolutionary Life Sciences, University of Groningen
ORCID ID: https://orcid.org/0000-0002-6541-7435
Email: yifan.wang@rug.nl
Alternate Email: y.wang01@umcg.nl
Address: e-mail preferred
Alternative Contact: postdoctoral PI
- Name: Bregje Wertheim
- Affiliations: Evolutionary genetics group, Groningen Institute for Evolutionary Life Sciences, University of Groningen
- ORCID ID: https://orcid.org/0000-0001-8555-1925
- Email: b.wertheim@rug.nl
- Address: Nijenborgh 7, 9747 AG Groningen, Room 5172.0674

Detailed Information of the Dataset

Description of the dataset and the project:
This dataset contains processed RNAseq data of Nasonia vitripennis.

The aim of this experiment was to analyze the circadian light input pathway in Nasonia via transcriptomic analysis.
Nasonia vitripennis was first entrained under 14:10 LD for a week, at day 8 ZT18, the animals received a treatment (either light treatment or dark treatment as control). We then collected samples after 30 mins, 1 hour, 2 hours, and 4 hours of the treatment. Based on our experiments, we then analyzed the RNAseq data and were able to propose a novel light transduction pathway in Nasonia, involving several photoreceptors, neurotransmitters, signalling pathways, and the induction of core clock genes.
Methods of data collection/generation: see manuscript for more details

Data and File Overview

Summary Metrics:

File count: 11
File formats: .gtf, .txt, .csv, .stats

Naming Conventions:

File names include the experiment performed and the figure that data is used for in the manuscript

Table of contents:

Table_S1._Summary_of_RNA-seq_data_in_Nasonia_vitripennis.csv
Table_S2._Genome_mapping_stats.stats
Table_S3._List_of_all_DEGs.csv
Table_S4._List_of_enriched_GO_terms_after_different_duration_of_light_exposure.csv
Table_S5._List_of_enriched_KEGG_terms_after_different_duration_of_light_exposure.csv
Table_S6._List_of_enriched_GO_terms_enriched_in_the_Nasonia_head_based_on_time-course_clusters.csv
Table_S7._List_of_enriched_motifs_in_each_time-course_cluster.csv
sampleinfo.csv
GOTERMS14_10122021.csv
gene_count_matrix_annotatedstringtie.csv
novel_sites_allhits.stringtie.gtf

File details

Details for Table_S1._Summary_of_RNA-seq_data_in_Nasonia_vitripennis.csv

Description: this file contains information on the raw RNAseq reads data
Format: .csv
Size: 2KB
Dimensions: 21 rows x 10 columns
Variables:
- Sample name: the name of the RNAseq samples
- Treatment: the treatment Nasonia received
- Treatment duration: how long the treatment was
- Raw reads: how many reads in the RNAseq raw data files
- Clean reads: how many reads in the RNAseq data files after preprocessing
- GC content (%): the percentage of GC in the RNAseq data
- Length: the length of reads
- Total mapped: how many reads were mapped to the genome
- Multiple mapped: how many reads were mapped to multiple parts on the genome
- Uniquely mapped: how many reads were uniquely mapped
Missing data codes: there is no missing data in this file
Other informations: this file was created by MultiQC and the scripts can be found on GitHub.

Details for Table S2. Genome mapping stats

Description: this file contains output from gffcompare software and contains information of how well genome mapping was.
Format: .stats
Size: 2KB
Other informations: this file can be opened in a txt file editor and this file can be reproduced based on scripts that can be found on GitHub.

Details for Table S3. List of all DEGs

Description: this file contains information of all DEGs in this analysis
Format: .txt
Size: 680KB
Dimensions: 7545 rows x 13 columns
Variables:
- NCBI.GeneID: the gene name
- Symbol: the gene symbol
- Description: the description of gene function
- TreatmentDuration: the duration of treatment received
- Regulation: the statistical analysis outcome of DEGs, either upregulated, downregulated, or n.s. (non significant)
- Tcseq_cluster: cluster number based TCseq clustering analysis
- baseMean: the calculated baseMean from the statistic test
- logFC: the calculated log fold change from the statistic test
- pvalue: the p value from the statistic test
- FDR: -log10(p value) from the statistic test
- rlog: rlog transformation of the raw count data
- z.rlog: zscore transformation of the rlog transformation
- Missing data codes: there is no missing data in this file
Other informations: this file was created by R script that can be found on GitHub.

Details for Table S4. List of enriched GO terms after different duration of light exposure

Description: this file contains GO analysis results based on each time point
Format: .csv
Size: 7KB
Dimensions: 78 rows x 9 columns
Variables:
- GO.ID: the ID of that GO term
- Term: the term of that GO term
- Annotated: how many annotated genes belonging in this GO term
- Significant: how many significant genes found in this GO term
- Expected: the number of genes that should be found in this GO term to be significant
- pvalue: the p value from the GO overrepresentation analysis
- category: the type of GO terms, BP (biological process), MF (molecular function), CC (cellular component)
- Percentage: significant genes/annotated genes
- Group: which group this GO terms belong to
Missing data codes: there is no missing data in this file
Other informations: this file was created by R script that can be found on GitHub.

Details for Table S5. List of enriched KEGG terms after different duration of light exposure

Description: this file contains data from KEGG pathway analysis for each time point
Format: .csv
Size: 2KB
Dimensions: 8 rows x 10 columns
Variables:
- ID: the KEGG pathway ID
- Description: the KEGG pathway description
- GeneRatio: the ratio between significant genes and genes in this pathway
- BgRatio: the ratio between gene in this pathway and overall annotated genes
- p value: p value from KEGG analysis
- q value: q value from KEGG analysis
- geneID: the gene names in this KEGG pathway
- Count: the number of DEGs in this KEGG pathway
- timepoint: the timepoint this belongs to
Missing data codes: there is no missing data in this file
Other informations: this file was created by R script that can be found on GitHub.

Details for Table S6. List of enriched GO terms enriched in the Nasonia head based on time-course clusters

Description: this file contains GO analysis for each time series cluster
Format: .csv
Size: 8KB
Dimensions: 101 rows x 9 columns
Variables:
- GO.ID: the ID of that GO term
- Term: the term of that GO term
- Annotated: how many annotated genes belonging in this GO term
- Significant: how many significant genes found in this GO term
- Percentage: significant genes/annotated genes
- Expected: the number of genes that should be found in this GO term to be significant
- elimFisher: the p value from the GO overrepresentation analysis
- Module: the time serie cluster number
- category: the type of GO terms, BP (biological process), MF (molecular function), CC (cellular component)
Missing data codes: there is no missing data in this file
Other informations: this file was created by R script that can be found on GitHub.

Details for Table S7. List of enriched motifs in each time-course cluster

Description: this file contains information for enriched motifs in each time-course cluster
Format: .csv
Size: 55KB
Dimensions: 506 rows x 10 columns
Variables:
- Motif Name: the name of the motifs
- Consensus: the sequence of the motifs
- P-value: the p value from motif analysis
- Log P-value: the log transformation of the p values
- q-value (Benjamini): the Benjamin correction of p values
- # of Target Sequences with Motif: number of target sequences with this motif
- % of Target Sequences with Motif: the percentage of target sequences with this motif
- # of Background Sequences with Motif: the number of background sequences with this motif
- % of Background Sequences with Motif: the percentage of background sequences with this motif
- cluster: the cluster number from time course clustering
Missing data codes: there is no missing data in this file
Other informations: this file was created by scripts that can be found on GitHub.

Details for sampleinfo.csv

Description: this file contains information about the RNAseq samples
Format: .csv
Size: 2KB
Dimensions: 21 rows x 12 columns
Variables:
- FileName: the name of the RNAseq file
- Rawsamplename: the name of the raw sample
- Sample: sample name that is used in analysis
- Treatment: the kind of treatment
- TreatmentDuration: the length of the treatment
- LaneID: the sequence lane
- Batch: the sequence batch
- RNAextractiontime: the date of RNA extraction
- RNAextractionround: the round of RNA extraction
- Generation: the generation of Nasonia
- Generationnumber: the generation number of Nasonia
- Biological_variables: biological variable name of the samples
Missing data codes: there is no missing data in this file

Details for GOTERMS14_10122021.csv

Description: this file contains GO annotations used in the analysis
Format: .csv
Size: 700KB
Dimensions: 9818 rows x 2 columns
Variables:
- ref_gene_id: the name of gene
- GO: GO ids fo the gene
Missing data codes: there is no missing data in this file
Other informations: this file was created by scripts that can be found on GitHub.

Details for gene_count_matrix_annotatedstringtie.csv

Description: this file contains the final gene counts matrix by StringTie
Format: .csv
Size: 700KB
Dimensions: 15905 rows x 21 columns
Variables:
- gene_id: the name of gene
- column 2-21: RNAseq sample name and the raw count data for each gene in each sample
Missing data codes: there is no missing data in this file
Other informations: this file was created by scripts that can be found on GitHub.

Details for novel_sites_allhits.stringtie.gtf

Description: this file contains the final gene counts matrix by StringTie
Format: .gtf
Size: 97374KB
Metadata: The transcriptome .gtf file as produced by StringTie assembly (see scripts on GitHub for details), this was after preprocessing and functional annotation where novel genes that had no hits in blasting were removed.

Sharing/access Information

This dataset has not yet been deposited anywhere else.

END OF THE README

Transcriptomic analysis of light-induced genes in Nasonia vitripennis: possible implications for circadian light entrainment pathways

Data files

Abstract

README