Replicated differential expression analysis in a green-brown polymorphic grasshopper reveals role of beta-carotene-binding protein in body coloration
Data files
Oct 21, 2025 version files 187.42 MB
-
Gsib_color_transcriptome.fa
184.21 MB
-
meta.confirmatory_dataset.txt
1.10 KB
-
meta.discovery_dataset.txt
601 B
-
README.md
2.16 KB
-
script.txt
14.74 KB
-
tx2gene.dta.tsv
3.19 MB
Abstract
This dataset accompanies a study on the genetic basis of green–brown color polymorphism in the club-legged grasshopper (Gomphocerus sibiricus). It includes RNA-seq data from fourth-instar nymphs (N4) of green and brown morphs, along with associated metadata (e.g., morph, sex) and analysis scripts. Transcriptomes were sequenced to investigate differential gene expression linked to coloration, and a reference-guided transcriptome assembly was generated. Six genes consistently upregulated in green individuals across two datasets were identified as beta-carotene-binding proteins (βCBPs), suggesting a role in carotenoid-mediated pigmentation. This dataset enables further functional and evolutionary analyses of βCBP genes in Orthoptera and may be reused in studies of gene expression, pigmentation, and evolutionary developmental biology.
Dataset DOI: 10.5061/dryad.qz612jmv2
Description of the data and file structure
This dataset was generated as part of a study investigating the genetic basis of green–brown color polymorphism in the club-legged grasshopper (Gomphocerus sibiricus). We collected early-instar nymphs from the wild and reared them under laboratory conditions until the fourth instar (N4), when they were phenotyped and snap-frozen for RNA extraction. Transcriptome sequencing (RNA-seq) was performed on green and brown morphs to identify differentially expressed genes associated with coloration.
The data include:
- A reference-guided transcriptome assembly (
Gsib_color_transcriptome.fa). - Metadata for two datasets: the discovery and confirmatory datasets (
meta.discovery_dataset.txt,meta.confirmatory_dataset.txt). - A transcript-to-gene mapping file used in downstream gene-level expression analysis (
tx2gene.dta.tsv). - The full analysis pipeline and commands used to process and analyze the data (
script.txt).
These files support reproducibility of the study and are intended for reuse in related research on pigmentation, gene expression, and evolutionary biology in Orthoptera.
Files and variables
File: Gsib_color_transcriptome.fa
Description: The reference-guided transcriptome assembly
File: tx2gene.dta.tsv
Description: Mapping between transcript IDs and gene IDs
File: meta.confirmatory_dataset.txt
Description: meta information of the individuals in confirmatory dataset
File: script.txt
Description: all commands used during the analysis
File: meta.discovery_dataset.txt
Description: meta information of the individuals in discovery dataset
Code/software
The file script.txt contains all the commands we used during the analysis.
Access information
Other publicly accessible locations of the data:
The raw reads have been submitted to NCBI, under BioProject PRJNA1241690
