Data from: Inferences from epigenetic information in an ecological context: A case study of DNA methylation and early-life environmental effects on zebra finches
Data files
Feb 25, 2026 version files 15.92 GB
-
Cov_ZF_2023_MalesnoW.rds
15.92 GB
-
PhilTrans_DNAm_BroodSize_R1.R
86.04 KB
-
README.md
5.90 KB
-
ZF_SampleInfo_Mass_R1.xlsx
22.50 KB
Abstract
DNA methylation (DNAm) is known to affect gene expression and has been suggested as a putative mechanism through which environmental factors can continuously shape phenotypic variation. DNAm data can be assessed using a variety of approaches, ranging from single-nucleotide resolution to the development of composite indexes, each providing unique insights. Utilizing whole-genome, longitudinal DNAm data from adult zebra finches raised in either small or large broods, we present a case study aimed at exploring how to utilize DNAm data to not only assess environmental effects on the epigenome but also to develop tools to directly measure those effects. Specifically, we (i) identified CpG sites where DNAm differed significantly between adult zebra finches raised in small and large broods, (ii) developed a phenotypic index using the methylation of the differentially methylated sites, and (iii) using an elastic net regression predicted brood size from methylation. Our findings suggest that early-life environment can lead to long-term differences in the DNAm of specific CpG sites and generate phenotypic variation. These methylation signatures can be leveraged to develop scalable tools to predict phenotypic quality and fitness outcomes as well as retroactively quantify stress.
Dataset DOI: 10.5061/dryad.nk98sf85p
Description of the data and file structure
DNA was extracted from zebra finch blood samples. Enzymatic Methyl-sequencing was used to quantify whole genome methylation.
Files and variables
Raw data can be found in https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA1108628. Bioinformatics pipeline from raw sequencing data to .cov.gz methylation files can be found in https://zenodo.org/records/12074859
File descriptions
R-Script: PhilTrans_DNAm_BroodSize_R1.R
R code, including all plots and statistical analyses presented in the case study
Cov_ZF_2023_MalesnoW.rds
- StartPosition: position of the site
- Endposition: position of site (“StartPosition” and “Endposition” should be the same)
- MethylationPercentage: percentage of DNA methylation per site
- CountMethylated: count of methylated reads per site
- CountNonMethylated: count of unmethylated reads per site
- Chromosome: the chromosome in which the site is located
- Coverage: coverage per site (CountMethylated+CountNonMethylated)
- Sample: sample ID
ZF_SampleInfo_Mass_R1.xlsx
Sheet = Info
- BirdID: individual ID
- ZV_ID: blood sample ID
- Seq#: sequencing ID
- Sample: Sample ID
- Sexcode: sex (M or F)
- StandRearNestSize: brood size (2=small, 6=large)
- AgeYO: age at sampling (Young or Old, two observations per BirdID)
- CollectionYear: year the blood sample was collected
Sheet = Mass
- BirdID: individual ID
- HatchDate: hatching date
- MassDate1st: date the first mass measurement was taken
- AgeDays1st: age at which the first mass measurement was taken
- Mass1st: first mass measurement (gr)
- MassDate15: date the day 15 mass measurement was taken
- AgeDays15: age at which the day 15 mass measurement was taken
- Mass15: day 15 mass measurement (gr)
"NA" indicates missing mass data
Sheet = BirdID
- Sample: arbitrary Sample ID for Methylkit
- BirdID: individual ID
- StandRearNestSize: brood size (2=small, 6=large)
Sheet = Pos_per_chrom
- Chromosome: chromosome
- Unique_Pos_Count: number of CpG sites per chromosome after filtering
Due to the size of the datasets and the computational demands of these analyses, we strongly recommend running the R workflows on a high‑performance computing (HPC) system rather than a standard desktop environment. Working with the "Cov_ZF_2023_MalesnoW.rds" is resource‑intensive and, in our tests, required approximately 400 GB of RAM and about 29 hours of runtime on an HPC system.
Code/software
R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=English_Netherlands.utf8 LC_CTYPE=English_Netherlands.utf8 LC_MONETARY=English_Netherlands.utf8 LC_NUMERIC=C
[5] LC_TIME=English_Netherlands.utf8
time zone: Europe/Amsterdam
tzcode source: internal
attached base packages:
[1] grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggVennDiagram_1.5.4 scales_1.4.0 ggrepel_0.9.6 rptR_0.9.23 sjPlot_2.9.0 tidyr_1.3.1 UpSetR_1.4.0 VennDiagram_1.7.3
[9] futile.logger_1.4.3 AICcmodavg_2.3-4 ppcor_1.1 MASS_7.3-60 purrr_1.0.4 janitor_2.2.1 ggpubr_0.6.2 ggplot2_4.0.0
[17] lmerTest_3.1-3 lme4_1.1-37 Matrix_1.6-1.1 dplyr_1.1.4 readxl_1.4.5 readr_2.1.5
loaded via a namespace (and not attached):
[1] psych_2.5.6 sjlabelled_1.2.0 tidyselect_1.2.1 farver_2.1.2 S7_0.2.0 timechange_0.3.0 lifecycle_1.0.4 survival_3.5-7
[9] magrittr_2.0.3 compiler_4.3.2 rlang_1.1.5 tools_4.3.2 utf8_1.2.4 knitr_1.50 ggsignif_0.6.4 lambda.r_1.2.4
[17] labeling_0.4.3 mnormt_2.1.1 bit_4.6.0 plyr_1.8.9 RColorBrewer_1.1-3 abind_1.4-8 withr_3.0.2 numDeriv_2016.8-1.1
[25] datawizard_1.3.0 stats4_4.3.2 unmarked_1.5.0 xtable_1.8-4 insight_1.4.3 cli_3.6.4 crayon_1.5.3 reformulas_0.4.2
[33] generics_0.1.4 tzdb_0.5.0 pbapply_1.7-4 minqa_1.2.8 stringr_1.6.0 splines_4.3.2 parallel_4.3.2 formatR_1.14
[41] cellranger_1.1.0 vctrs_0.6.5 boot_1.3-28.1 VGAM_1.1-13 carData_3.0-5 car_3.1-3 hms_1.1.4 bit64_4.6.0-1
[49] rstatix_0.7.3 Formula_1.2-5 glue_1.8.0 nloptr_2.2.1 cowplot_1.2.0 lubridate_1.9.4 stringi_1.8.7 gtable_0.3.6
[57] ggeffects_2.3.1 tibble_3.2.1 pillar_1.11.1 R6_2.6.1 Rdpack_2.6.4 evaluate_1.0.5 vroom_1.6.6 lattice_0.21-9
[65] futile.options_1.0.1 rbibutils_2.3 backports_1.5.0 broom_1.0.10 snakecase_0.11.1 Rcpp_1.1.0 gridExtra_2.3 nlme_3.1-163
[73] xfun_0.52 mgcv_1.9-0 sjmisc_2.8.11 pkgconfig_2.0.3
