Data from: DNA methylation reflects tissue and caste identity but not parasitism-induced changes in a social insect
Data files
Dec 04, 2025 version files 31.93 MB
-
README.md
7.91 KB
-
Supplementary_File_1-Table_1.csv
663 B
-
Supplementary_File_1-Table_2.csv
5.33 KB
-
Supplementary_File_1-Table_3.csv
1.77 KB
-
Supplementary_File_1-Table_4.csv
5.42 KB
-
Supplementary_File_2-Table_1.csv
1.62 KB
-
Supplementary_File_2-Table_2.csv
33.86 KB
-
Supplementary_File_2-Table_3.csv
66.09 KB
-
Supplementary_File_2-Table_4.csv
637.40 KB
-
Supplementary_File_2-Table_5.csv
460.43 KB
-
Supplementary_File_2-Table_6.csv
6.04 MB
-
Supplementary_File_2-Table_7.csv
349.72 KB
-
Supplementary_File_3-Table_1.csv
2.66 MB
-
Supplementary_File_3-Table_2.csv
6.25 MB
-
Supplementary_File_4.zip
10.43 MB
-
Supplementary_File_5-Table_1.csv
828.78 KB
-
Supplementary_File_5-Table_2.csv
1.35 MB
-
Supplementary_File_6.html
2.81 MB
Abstract
Dataset DOI: 10.5061/dryad.7h44j107x
Description of the data and file structure
The data were generated to quantify genome-wide DNA methylation patterns and gene expression differences across castes, tissues, and infection states in the ant Temnothorax nylanderi. Whole-genome bisulfite sequencing and RNA-seq were performed on brain and fat body samples from queens, uninfected workers, and parasite-infected workers to assess how caste identity and parasite manipulation influence epigenetic and transcriptional profiles.
Files and variables
File: Supplementary_File_1-Table_1.csv
Description: Overview of colony-level information for samples used in Whole-Genome Bisulfite Sequencing (WGBS), Whole-Genome Sequencing (WGS), and RNA sequencing (RNA-seq). For each colony, the table reports the sample collection date, count date, colony ID, and the number of individuals of each caste or developmental stage, including queens, workers (infected and uninfected), pupae, larvae, and males. Queens belonging to the colonies that start with "Q" are only included in RNA-seq analysis.
File: Supplementary_File_1-Table_2.csv
Description: Summary of sequencing and alignment quality metrics for all samples from WGBS, including duplication rate, GC content, number of methylation-informative reads, and alignment performance. Note: “NA” indicates that the metric was not applicable for a given sample (e.g., sequencing-only metrics vs alignment-only metrics). NA does not indicate missing data.
File: Supplementary_File_1-Table_3.csv
Description: Summary of sequencing and alignment quality metrics for all samples from WGS, including duplication rate, GC content, number of informative reads, and alignment performance. Note: “NA” indicates that the metric was not applicable for a given sample (e.g., sequencing-only metrics vs alignment-only metrics). NA does not indicate missing data.
File: Supplementary_File_1-Table_4.csv
Description: Summary of sequencing and alignment quality metrics for all samples from RNA-seq, including duplication rate, GC content, number of uniquely mapped reads, and alignment performance.Note: “NA” indicates that the metric was not applicable for a given sample (e.g., sequencing-only metrics vs alignment-only metrics). NA does not indicate missing data.
File: Supplementary_File_2-Table_1.csv
Description: Methylation quality metrics per sample. Lambda phage bisulfite conversion rates and mean methylation levels for each sample, with experimental group labels and relevant metadata for quality assessment.
File: Supplementary_File_2-Table_2.csv
Description: CpG colocalization data. Summary of CpG sites and their genomic colocalization information using two annotation levels: "Basic" at exon and "Detailed" at CDS, 5′UTR and 3′UTR levels. Each row provides the region identity and the corresponding methylated and all CpG counts.
File: Supplementary_File_2-Table_3.csv
Description: Classification of genes based on DNA methylation consistency across samples.
For each gene, the table reports the number of CpG sites that were classified as methylated in every sample.
File: Supplementary_File_2-Table_4.csv
Description: Classification of genes based on DNA methylation consistency across samples. Gene-level expression values were used to compare the expression profiles of consistently methylated genes (Housekeeping) versus other genes in the genome. For each gene, the raw median expression across samples was length-normalized (per kilobase) and log₂-transformed.
File: Supplementary_File_2-Table_5.csv
Description: Classification of genes based on DNA methylation consistency across samples. For each gene, expression values were log₂-transformed and used to compute the median absolute deviation (MAD) and standard deviation (SD), allowing comparison between consistently methylated (Housekeeping) genes and all other genes.
File: Supplementary_File_2-Table_6.csv
Description: DMLs with genomic position, Q-value, methylation difference, and classification (hyper/hypomethylated) for each tissue/caste comparison. DMLs were considered hypo or hypermethylated if they have a methylation difference of more than 10% and qvalue < 0.01 in the genome between tissue and castes.
File: Supplementary_File_2-Table_7.csv
Description: Summary of differential methylation metrics (average difference, number of DMLs, weighted Scores) by tissue and caste comparisons.
File: Supplementary_File_3-Table_1.csv
Description: Raw gene-level read counts obtained from featureCounts, with genes listed in rows and samples in columns. These counts represent the input data for DESeq2 differential expression analysis and include all samples used in the study.
File: Supplementary_File_3-Table_2.csv
Description: Differential gene expression results from multiple pairwise RNA-seq comparisons generated with DESeq2 are presented in this table. For each gene and comparison, the table reports the gene identifier (GeneID), average normalised expression (baseMean), log2 fold change (log2FoldChange), standard error of the fold change estimate (lfcSE), Wald test statistic (stat), raw p-value (pvalue), and adjusted p-value (padj).
File: Supplementary_File_4.zip
Description: GO Enrichment Background Set
This file provides the complete list of Gene Ontology (GO) terms used as the background universe for GO enrichment analyses.
File: Supplementary_File_5-Table_1.csv
Description: GO enrichment terms for differentially expressed genes processed with REVIGO to reduce semantic redundancy based on GO term similarity. The table includes results from comparisons across tissue types, castes, and infection status.
File: Supplementary_File_5-Table_2.csv
Description: REVIGO-processed GO enrichment terms for genes grouped by DNA methylation status, including differentially methylated genes and genes with balanced methylation between brain and fat body. The table includes results from comparisons across tissue types, castes, and infection status.
File: Supplementary_File_6.html
Description: Full Analysis Workflow (HTML Format):
A complete, reproducible report of the analysis pipeline, including both Bash and R code. The workflow spans all steps from raw data processing to figure generation: quality control, alignment, methylation calling, preprocessing, differential analysis, normalisation, dataset integration, GO enrichment, and visualisation.
File: Supplementary_Figures.pdf
Description: Supplementary histograms and density plots showing genome-wide distributions of weighted gene-level methylation differences across tissues and castes.
Code/software
File 6. Full analysis workflow (HTML).
This HTML report contains the complete, reproducible workflow used in the study. It documents all Bash and R steps executed on a Linux HPC environment, including quality control, alignment, methylation calling, preprocessing, differential analysis, normalisation, dataset integration, GO enrichment, and visualisation. All software versions (R, Bismark/Bowtie2, SAMtools, BEDTools, and R packages) match those reported in the Methods section of the manuscript. The file can be opened in any standard web browser.
Access information
Other publicly accessible locations of the data:
- None. All data associated with this study are deposited exclusively in Dryad, Zenodo, and NCBI SRA as described. BioProject accession: PRJNA1297301
Data was derived from the following sources:
- All datasets were generated specifically for this study; no external datasets were used.
