Transcriptomic analysis of light-induced genes in Nasonia vitripennis: possible implications for circadian light entrainment pathways
Data files
Sep 12, 2023 version files 103.33 MB
-
gene_count_matrix_annotatedstringtie.csv
1.67 MB
-
GOTERMS14_10122021.csv
716.55 KB
-
novel_sites_allhits.stringtie.gtf
99.71 MB
-
README.md
11.69 KB
-
sampleinfo.csv
1.40 KB
-
Table_S1._Summary_of_RNA-seq_data_in_Nasonia_vitripennis.csv
1.93 KB
-
Table_S2._Genome_mapping.stats
1.34 KB
-
Table_S3._List_of_all_DEGs.csv
1.14 MB
-
Table_S4._List_of_enriched_GO_terms_after_different_duration_of_light_exposure.csv
7.07 KB
-
Table_S5._List_of_enriched_KEGG_terms_after_different_duration_of_light_exposure.csv
1.33 KB
-
Table_S6._List_of_enriched_GO_terms_enriched_in_the_Nasonia_head_based_on_time-course_clusters.csv
7.27 KB
-
Table_S7._List_of_enriched_motifs_in_each_time-course_cluster.csv
55.38 KB
Abstract
Circadian entrainment to the environmental day-night cycle is essential for the optimal use of environmental resources. In insects, opsin-based photoreception in the compound eye and ocelli, and CRYPTOCHROME1 (CRY1) in circadian clock neurons are thought to be involved in sensing photic information, but genetic regulation of circadian light entrainment in species without light-sensitive CRY1 remains unclear. To elucidate a possible CRY1-independent light transduction cascade, we analysed light-induced gene expression through RNA-sequencing in Nasonia vitripennis. Entrained wasps were subjected to a light pulse in the subjective night to reset the circadian clock and light-induced changes in gene expression were characterized at four different time points in wasp heads. We used co-expression, functional annotation, and transcription factor binding motif analyses to gain insight into the molecular pathways in response to acute light stimulus and form a hypothesis about the circadian light resetting pathway. Maximal gene induction was found after 2h of light stimulation (1432 genes), including the opsin opblue and the core clock genes cry2 and npas2. Pathway and cluster analyses revealed light activation of glutamatergic and GABA-ergic neurotransmission, including CREB and AP-1 transcription pathway signalling. This suggests that circadian photic entrainment in Nasonia may require pathways that are similar to mammals. We propose a model for hymenopteran circadian light resetting that involves opsin-based photoreception, glutamatergic neurotransmission, and gene induction of cry2 and npas2 to reset the circadian clock.
Provenance for this README
- File name: README.MD
- Authors: Yifan Wang
- Other contributors: Bregje Wertheim, Leo W. Beukeboom, Roelof A. Hut
- Date created: 2022-12-31
Dataset Version and Release History
- Current Version:
- Number: 1.0.0
- Date: 2023-02-14
- Persistent identifier:
- Summary of changes: n/a
Dataset Attribution and Usage
- Dataset Title: Data for the article “Transcriptomic analysis of light-induced genes in Nasonia vitripennis: possible implications for circadian light entrainment pathways”
- Persistent identifier: DOI
- Dataset contributors: Yifan Wang, Leo W. Beukeboom, Bregje Wertheim, Roelof A. Hut
- Dataset citation:
<br>
\> Wang\, Y\,
Contact Information
- Name: Yifan Wang
- Affiliations: Evolutionary genetics group, Groningen Institute for Evolutionary Life Sciences, University of Groningen
- ORCID ID: https://orcid.org/0000-0002-6541-7435
- Email: yifan.wang@rug.nl
- Alternate Email: y.wang01@umcg.nl
- Address: e-mail preferred
- Alternative Contact: postdoctoral PI
- Name: Bregje Wertheim
- Affiliations: Evolutionary genetics group, Groningen Institute for Evolutionary Life Sciences, University of Groningen
- ORCID ID: https://orcid.org/0000-0001-8555-1925
- Email: b.wertheim@rug.nl
- Address: Nijenborgh 7, 9747 AG Groningen, Room 5172.0674
Detailed Information of the Dataset
- Description of the dataset and the project:
This dataset contains processed RNAseq data of Nasonia vitripennis.
<br>
The aim of this experiment was to analyze the circadian light input pathway in Nasonia via transcriptomic analysis.
<br>
Nasonia vitripennis was first entrained under 14:10 LD for a week, at day 8 ZT18, the animals received a treatment (either light treatment or dark treatment as control).
We then collected samples after 30 mins, 1 hour, 2 hours, and 4 hours of the treatment.
Based on our experiments, we then analyzed the RNAseq data and were able to propose a novel light transduction pathway in Nasonia, involving several photoreceptors, neurotransmitters, signalling pathways, and the induction of core clock genes.
<br>
Methods of data collection/generation: see manuscript for more details
Data and File Overview
Summary Metrics:
- File count: 11
- File formats: .gtf, .txt, .csv, .stats
Naming Conventions:
- File names include the experiment performed and the figure that data is used for in the manuscript
Table of contents:
- Table_S1._Summary_of_RNA-seq_data_in_Nasonia_vitripennis.csv
- Table_S2._Genome_mapping_stats.stats
- Table_S3._List_of_all_DEGs.csv
- Table_S4._List_of_enriched_GO_terms_after_different_duration_of_light_exposure.csv
- Table_S5._List_of_enriched_KEGG_terms_after_different_duration_of_light_exposure.csv
- Table_S6._List_of_enriched_GO_terms_enriched_in_the_Nasonia_head_based_on_time-course_clusters.csv
- Table_S7._List_of_enriched_motifs_in_each_time-course_cluster.csv
- sampleinfo.csv
- GOTERMS14_10122021.csv
- gene_count_matrix_annotatedstringtie.csv
- novel_sites_allhits.stringtie.gtf
File details
Details for Table_S1._Summary_of_RNA-seq_data_in_Nasonia_vitripennis.csv
- Description: this file contains information on the raw RNAseq reads data
- Format: .csv
- Size: 2KB
- Dimensions: 21 rows x 10 columns
- Variables:
- Sample name: the name of the RNAseq samples
- Treatment: the treatment Nasonia received
- Treatment duration: how long the treatment was
- Raw reads: how many reads in the RNAseq raw data files
- Clean reads: how many reads in the RNAseq data files after preprocessing
- GC content (%): the percentage of GC in the RNAseq data
- Length: the length of reads
- Total mapped: how many reads were mapped to the genome
- Multiple mapped: how many reads were mapped to multiple parts on the genome
- Uniquely mapped: how many reads were uniquely mapped
- Missing data codes: there is no missing data in this file
- Other informations: this file was created by MultiQC and the scripts can be found on GitHub.
Details for Table S2. Genome mapping stats
- Description: this file contains output from gffcompare software and contains information of how well genome mapping was.
- Format: .stats
- Size: 2KB
- Other informations: this file can be opened in a txt file editor and this file can be reproduced based on scripts that can be found on GitHub.
Details for Table S3. List of all DEGs
- Description: this file contains information of all DEGs in this analysis
- Format: .txt
- Size: 680KB
- Dimensions: 7545 rows x 13 columns
- Variables:
- NCBI.GeneID: the gene name
- Symbol: the gene symbol
- Description: the description of gene function
- TreatmentDuration: the duration of treatment received
- Regulation: the statistical analysis outcome of DEGs, either upregulated, downregulated, or n.s. (non significant)
- Tcseq_cluster: cluster number based TCseq clustering analysis
- baseMean: the calculated baseMean from the statistic test
- logFC: the calculated log fold change from the statistic test
- pvalue: the p value from the statistic test
- FDR: -log10(p value) from the statistic test
- rlog: rlog transformation of the raw count data
- z.rlog: zscore transformation of the rlog transformation
- Missing data codes: there is no missing data in this file
- Other informations: this file was created by R script that can be found on GitHub.
Details for Table S4. List of enriched GO terms after different duration of light exposure
- Description: this file contains GO analysis results based on each time point
- Format: .csv
- Size: 7KB
- Dimensions: 78 rows x 9 columns
- Variables:
- GO.ID: the ID of that GO term
- Term: the term of that GO term
- Annotated: how many annotated genes belonging in this GO term
- Significant: how many significant genes found in this GO term
- Expected: the number of genes that should be found in this GO term to be significant
- pvalue: the p value from the GO overrepresentation analysis
- category: the type of GO terms, BP (biological process), MF (molecular function), CC (cellular component)
- Percentage: significant genes/annotated genes
- Group: which group this GO terms belong to
- Missing data codes: there is no missing data in this file
- Other informations: this file was created by R script that can be found on GitHub.
Details for Table S5. List of enriched KEGG terms after different duration of light exposure
- Description: this file contains data from KEGG pathway analysis for each time point
- Format: .csv
- Size: 2KB
- Dimensions: 8 rows x 10 columns
- Variables:
- ID: the KEGG pathway ID
- Description: the KEGG pathway description
- GeneRatio: the ratio between significant genes and genes in this pathway
- BgRatio: the ratio between gene in this pathway and overall annotated genes
- p value: p value from KEGG analysis
- q value: q value from KEGG analysis
- geneID: the gene names in this KEGG pathway
- Count: the number of DEGs in this KEGG pathway
- timepoint: the timepoint this belongs to
- Missing data codes: there is no missing data in this file
- Other informations: this file was created by R script that can be found on GitHub.
Details for Table S6. List of enriched GO terms enriched in the Nasonia head based on time-course clusters
- Description: this file contains GO analysis for each time series cluster
- Format: .csv
- Size: 8KB
- Dimensions: 101 rows x 9 columns
- Variables:
- GO.ID: the ID of that GO term
- Term: the term of that GO term
- Annotated: how many annotated genes belonging in this GO term
- Significant: how many significant genes found in this GO term
- Percentage: significant genes/annotated genes
- Expected: the number of genes that should be found in this GO term to be significant
- elimFisher: the p value from the GO overrepresentation analysis
- Module: the time serie cluster number
- category: the type of GO terms, BP (biological process), MF (molecular function), CC (cellular component)
- Missing data codes: there is no missing data in this file
- Other informations: this file was created by R script that can be found on GitHub.
Details for Table S7. List of enriched motifs in each time-course cluster
- Description: this file contains information for enriched motifs in each time-course cluster
- Format: .csv
- Size: 55KB
- Dimensions: 506 rows x 10 columns
- Variables:
- Motif Name: the name of the motifs
- Consensus: the sequence of the motifs
- P-value: the p value from motif analysis
- Log P-value: the log transformation of the p values
- q-value (Benjamini): the Benjamin correction of p values
- # of Target Sequences with Motif: number of target sequences with this motif
- % of Target Sequences with Motif: the percentage of target sequences with this motif
- # of Background Sequences with Motif: the number of background sequences with this motif
- % of Background Sequences with Motif: the percentage of background sequences with this motif
- cluster: the cluster number from time course clustering
- Missing data codes: there is no missing data in this file
- Other informations: this file was created by scripts that can be found on GitHub.
Details for sampleinfo.csv
- Description: this file contains information about the RNAseq samples
- Format: .csv
- Size: 2KB
- Dimensions: 21 rows x 12 columns
- Variables:
- FileName: the name of the RNAseq file
- Rawsamplename: the name of the raw sample
- Sample: sample name that is used in analysis
- Treatment: the kind of treatment
- TreatmentDuration: the length of the treatment
- LaneID: the sequence lane
- Batch: the sequence batch
- RNAextractiontime: the date of RNA extraction
- RNAextractionround: the round of RNA extraction
- Generation: the generation of Nasonia
- Generationnumber: the generation number of Nasonia
- Biological_variables: biological variable name of the samples
- Missing data codes: there is no missing data in this file
Details for GOTERMS14_10122021.csv
- Description: this file contains GO annotations used in the analysis
- Format: .csv
- Size: 700KB
- Dimensions: 9818 rows x 2 columns
- Variables:
- ref_gene_id: the name of gene
- GO: GO ids fo the gene
- Missing data codes: there is no missing data in this file
- Other informations: this file was created by scripts that can be found on GitHub.
Details for gene_count_matrix_annotatedstringtie.csv
- Description: this file contains the final gene counts matrix by StringTie
- Format: .csv
- Size: 700KB
- Dimensions: 15905 rows x 21 columns
- Variables:
- gene_id: the name of gene
- column 2-21: RNAseq sample name and the raw count data for each gene in each sample
- Missing data codes: there is no missing data in this file
- Other informations: this file was created by scripts that can be found on GitHub.
Details for novel_sites_allhits.stringtie.gtf
- Description: this file contains the final gene counts matrix by StringTie
- Format: .gtf
- Size: 97374KB
- Metadata: The transcriptome .gtf file as produced by StringTie assembly (see scripts on GitHub for details), this was after preprocessing and functional annotation where novel genes that had no hits in blasting were removed.
Sharing/access Information
This dataset has not yet been deposited anywhere else.
END OF THE README
This dataset consists of all processed data needed to reproduce the analysis of RNAseq data from Nasonia vitripennis, published in Biology under the same title.
Data included here: the final transcriptome including final gene counts matrix, sample information file, functional annotation, GO annotation table, and other supplementary data described in the manuscript.
The raw RNAseq reads can be found on the European Nucleotide Archive (ENA) under accession no. PRJEB57723.
All the scripts needed to process the data and reproduce the analysis can be found on GitHub at https://github.com/YFWang-YvH/Transcriptomic-analysis-of-light-induced-genes-in-Nasonia-vitripennis-implications-for-circadian.
Briefly: raw RNAseq reads were preprocessed and trimmed following the 'new Tuxedo' pipeline (see Pertea et al. 2016 Nature Protocols 11(9). This pipeline includes transcript assembly and quantification with StringTie, mapped to the newest Nasonia vitripennis reference genome “GCF_009193385.2_Nvit_psr_1.1_genomic.fna”. The final gene count matrix from StringTie is deposited here. The gene counts matrix and sample information file were used for further statistical analysis and the production of downstream datafiles and final figures shown in this manuscript. Please see the scripts on GitHub for pipeline and analysis details.
File extensions of the deposited data files include .gtf, .stats, .txt, and .csv. These are all text-based file formats that can be opened with any text processor.