Data from: Cryptocercus genomes expand knowledge of adaptations to xylophagy and termite sociality
Data files
Jan 25, 2026 version files 59.05 MB
-
bger_IRs.fa
565.66 KB
-
cfor_IRs.fa
102.72 KB
-
cmer_IRs.fa
130.76 KB
-
cpun_IRs.fa
131.37 KB
-
cryptocercus_punctulatus.scaffolded.gff
43.62 MB
-
CryptocercusBranchesContractions_BP_table.txt
836 B
-
CryptocercusBranchesExpansions_BP_table.txt
861 B
-
CryptocercusRootContractions_BP_table.txt
166 B
-
CryptocercusRootExpansions_BP_table.txt
666 B
-
csec_IRs.fa
95.30 KB
-
dpun_IRs.fa
247.86 KB
-
elan_IRs.fa
18.80 KB
-
focc_IRs.fa
15.88 KB
-
IR_db.fasta
493.29 KB
-
IR_db.hmm
300.94 KB
-
IR_db.msa
12.79 MB
-
mnat_IRs.fa
68.95 KB
-
pame_IRs.fa
269.66 KB
-
README.md
5.30 KB
-
rspe_IRs.fa
66.42 KB
-
Significant_GOterms_selection.csv
2.35 KB
-
znev_IRs.fa
124.21 KB
Abstract
Subsociality and wood-eating or xylophagy are understood as key drivers in the evolution of eusociality in Blattodea (cockroaches and termites), two features observed in the cockroach genus Cryptocercus, the sister group of all termites. We present and analyse two new high-quality genomes from this genus, C. punctulatus from North America and C. meridianus from Southeast Asia, to explore the evolutionary transitions to xylophagy and subsociality within Blattodea. Our analyses reveal evidence of relaxed selection in both Cryptocercus and termites, indicating that a reduction in effective population size may have occurred in their subsocial ancestors. These findings challenge the expected positive correlation between dN/dS ratios and social complexity, as Cryptocercus exhibits elevated dN/dS values that may exceed those of eusocial termites. Additionally, we identify positive selection on mitochondrial ribosomal proteins and components of the NADH dehydrogenase complex, suggesting significant evolutionary changes in energy production. Future studies incorporating additional genomic data from diverse blattodea species are essential to elucidate the molecular mechanisms driving transitions to xylophagy and eusociality.
https://doi.org/10.5061/dryad.np5hqc04p
Description of the data and file structure
Genomic DNA was extracted from single snap-frozen legs of Cryptocercus punctulatus individuals collected in Virginia and reared in the laboratory. The tissue was homogenised on ice using a TissueRuptor and lysed with buffer CT and Proteinase K. RNA contamination was removed by RNase A treatment. High-molecular-weight DNA was captured using Circulomics Nanodisks and eluted in elution buffer. The extraction followed the Circulomics Insect Big DNA kit protocol (v0.20a). The resulting DNA was of high quality and high molecular weight, with fragment sizes of approximately 170 kb. Quality and size distribution were confirmed using an Agilent Femto Pulse system.
Genome Annotations:
cryptocercus_punctulatus.scaffolded.gff - Cryptocercus punctulatus annotations based on the genome that is stored on NCBI within the project PRJNA1188519
- Annotations of IRs with BitaCora:
*_IRs.fa: protein sequences of Ionotropic Receptors
IR_db.: sequence database used as input for BitaCora
- Enriched GO terms, results of TopGo analyses:
_BP_table.txt (from cafe results)
Significant_GOterms_selection.xlsx (from selection analyses)
Files and variables
File: cryptocercus_punctulatus.scaffolded.gff
Description: genome annotations for C. punctulatus
The following fasta files are named xx_IR.fa contain protein sequences of aannotatedIonotropic Receptors (IRs)
File: pame_IRs.fa
Description: IR sequences of P. americana
File: rspe_IRs.fa
Description: IR sequences of R. speratus
File: cmer_IRs.fa
Description: IR sequences of C. meridianus
File: znev_IRs.fa
Description: IR sequences of Z. nevadensis
File: csec_IRs.fa
Description: IR sequences of C. secundus
File: elan_IRs.fa
Description: IR sequences of E. langierum
File: dpun_IRs.fa
Description: IR sequences of D. punctata
File: cpun_IRs.fa
Description: IR sequences of C. punctulatus
File: focc_IRs.fa
Description: IR sequences of F. occidentalis
File: mnat_IRs.fa
Description: IR sequences of M. natalensis
File: cfor_IRs.fa
Description: IR sequences of C. formosanus
File: bger_IRs.fa
Description: IR sequences of B. germanica
File: CryptocercusRootContractions_BP_table.txt
Description: TopGo results for contractions at the Cryptocercus root
Variables
- GO.ID: GO term ID
- Term: GO term description
- Annotated: Number of annotated GO terms
- Significant: Number of genes of interest
- Expected: Expected number
- pvalue
File: CryptocercusRootExpansions_BP_table.txt
Description: TopGo results for expansions at the Cryptocercus root
Variables
- GO.ID: GO term ID
- Term: GO term description
- Annotated: Number of annotated GO terms
- Significant: Number of genes of interest
- Expected: Expected number
- pvalue
File: CryptocercusBranchesContractions_BP_table.txt
Description: TopGo results for contractions on all Cryptocercus branches
Variables
- GO.ID: GO term ID
- Term: GO term description
- Annotated: Number of annotated GO terms
- Significant: Number of genes of interest
- Expected: Expected number
- pvalue
File: CryptocercusBranchesExpansions_BP_table.txt
Description: TopGo results for expansions on all Cryptocercus branches
Variables
- GO.ID: GO term ID
- Term: GO term description
- Annotated: Number of annotated GO terms
- Significant: Number of genes of interest
- Expected: Expected number
- pvalue
File: IR_db.msa
Description: IR database alignment file
File: IR_db.fasta
Description: IR database fasta file
File: IR_db.hmm
Description: HMMER output, specifically a Hidden Markov Model (HMM) database related to ionotropic receptors (IR).
File: Significant_GOterms_selection.csv
Description: TopGo results for selection analyses
Variables
- GO.ID: GO term ID
- Term: GO term description
- Annotated: Number of annotated GO terms
- Significant: Number of genes of interest
- Expected: Expected number
- pvalue
Code/software
Python script for calculating CpGo/e
Access information
Other publicly accessible locations of the data:
- NCBI - PRJNA1188519
Data was derived from the following sources:
- C. punctulatus genome is novel
- C. meridianus genome here: https://doi.org/10.1101/2025.01.20.633303
Code/Software
Cpg.py: This Python script calculates the CpG observed/expected (CpG O/E) ratio for each gene sequence in a FASTA file. It reads the input file from the command line and stores each gene’s DNA sequence in a dictionary. For every sequence, it counts the number of cytosines (C), guanines (G), and CpG dinucleotides (CG). The CpG O/E value is computed as the observed number of CGs divided by the expected number based on C and G frequencies in the sequence. If no CpG sites are present, the value is set to zero. Finally, the script outputs each gene name and its CpG O/E ratio in a tab-separated format.
