Data from: Endocrine and epigenetic flexibility in an African starling
Data files
Oct 21, 2025 version files 288.02 MB
-
README.md
5.13 KB
-
Rubenstein_Solomon_dryad.zip
288.02 MB
Oct 21, 2025 version files 288.02 MB
-
README.md
5.42 KB
-
Rubenstein_Solomon_dryad_v2.zip
288.02 MB
Abstract
This dataset for the manuscript "Endocrine and epigenetic flexibility in an African starling" includes methylation input files derived from Superb starling Reduced Representation Bisulfite Sequencing (RRBS) libraries, and sample collection variables. It includes code for Bismark alignment of bisulfite data, for Metilene differentially methylated region (DMR) analysis (with actual data and randomized sample header runs), CGmapTools variant calling, GEMMA relatedness matrix generation, an R lme4qtl generalized linear mixed model for CpG methylation sites, and bedtools intersection of significant DMRs and CpG sites with genome regions of interest.
Dataset DOI: 10.5061/dryad.t4b8gtjff
Description of the data and file structure
Superb starlings were captured across three sites in central Kenya with a gradient of rainfall, and kept in aviaries in the moderate rainfall savanna site. Baseline and stress-induced corticosterone levels were measured by ELISA assay at initial capture, and at 3 and 6 month timepoints in captivity (Table S1 of associated manuscript RSTB-2025-0027.R1). Reduced representation bisulfite sequencing (RRBS) libraries were sequenced for each individual. This data includes sequence processing, methylation analysis, variant calling and relatedness matrix generation.
Files and variables
File: Rubenstein_Solomon_dryad_v2.zip
Zip file with subfolders for each analysis.
1) alignments_metilene_lme4qtl_run_code_SNP_calling
includes raw sequence file trimming trimming_raw_sequencer_fq_files.txt and Bismark bisulfite read alignment bismark_bisulfite_alignments.txt to the Lamprotornis superbus reference genome (sequence files are hosted on National Center for Biotechnology Sequence Read Archive under BioProject accession PRJNA1252789, and genome reference under GenBank Accession GCA_015883425.2).
The folder also includes sorting_alignments_and_SNV_calling.txt for CGmapTools v0.1.3 SNP calling, indexing_and_merging_SNV_vcfs_and_matrix_gen.txt for GEMMA v0.98.5 relatedness matrix generation, bismark_cov_processing_and_metilene_runs_intersects.txt for Metilene v0.2-8 DMR analysis and bedtools v2.31.1 intersects, and lme4qtl_R_code.txt (based on Lindner et al. 2021) for lme4qtl v0.2.2/lme4 v1.1.37 CpG site analysis.
fdrtool_R_lme4qtl_pvalues_to_qvalues.txt is fdrtool v1.2.18 of lme4qtl p-values.
header_for_lme4qtl_input_files.txt is the header for the lme4qtl files. cpg_cluster_test.txt is a check for promoter clustering of significant CpG sites.
GCA_015883425.2_CU_Lasu_v2_genomic_genes_and_pseudogenes_sorted.bed is genome bed track for genes. GCA_015883425.2_CU_Lasu_v2_genomic_just_10kb_promoters_sorted_intersect_withinfonlyfrom_v2_genes_and_pseudogenes_10kbpromoter.bedgraph is a bed track for 10kb promoters.
2) metilene_runs
includes the above bismark_cov_processing_and_metilene_runs_intersects.txt and also metilene input and output files, as well as the promoter and gene bodies tracks above
3) randomized_metilene_runs
metilene_runs_randomized_header_wild_3_and_3_6.txt, code for 10 Metilene runs with randomized sample headers, using the same input file, just with changed headers
randomized_header_wild_3.csv, and randomized_header_wild_6.csv, randomized_header_3_6.csv are the 10 randomized headers for each group comparison
3_vs_6_metilene_genebodies.txt, wild_vs_6_metilene_genebodies.txt, wild_vs_3_metilene_genebodies.txt, wild_vs_6_metilene_promoter.txt, wild_vs_3_metilene_promoter.txt, 3_vs_6_metilene_promoter.txt are bedtools intersect tracks with significant DMRs from original metilene run with real non-randomized data
4) R_lme4qtl_input_files
all input files for lme4qtl analysis above
5) lme4qtl_metilene_intersects_code
includes 61_sampling_site_intersects.txt, for further CpG site bedtools intersects
wild_vs_3_metilene_genebodies.txt, 3_vs_6_metilene_promoter.txt, 3_vs_6_metilene_genebodies.txt, wild_vs_6_metilene_promoter.txt, wild_vs_6_metilene_genebodies.txt, wild_vs_3_metilene_promoter.txt are metilene DMR region bed tracks. promoters_for_promoters_DMRs.txt, is bed track for promoters associated with metilene DMRs
References:
Guo, W., Zhu, P., Pellegrini, M., Zhang, M. Q., Wang, X., Ni, Z. 2018. CGmapTools improves the precision of heterozygous SNV calls and supports allele-specific methylation detection and visualization in bisulfite-sequencing data. Bioinformatics 34, 381-387. (doi:10.1093/bioinformatics/btx595)
Jühling, F., Kretzmer, H., Bernhart, S.H., Otto, C., Stadler, P.F. & Hoffmann, S. 2016. Metilene: fast and sensitive calling of differentially methylated regions from bisulfite sequencing data. Genome Res. 26, 256–262. (doi:10.1101/gr.196394.115)
Krueger, F., Andrews, S.R. 2011. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571–1572. (doi:10.1093/bioinformatics/btr167)
Lindner, M., Laine, V. N., Verhagen, I., Viitaniemi, H. M., Visser, M. E., van Oers, K., Husby, A. 2021. Rapid changes in DNA methylation associated with the initiation of reproduction in a small songbird. Mol. Ecol. 30, 3645-3659. (doi: 10.1111/mec.15803)
Quinlan, A.R., Hall, I.M. 2010 BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842. (doi:10.1093/bioinformatics/btq033)
Strimmer, K. 2008. fdrtool: a versatile R package for estimating local and tail area-based false discovery rates. Bioinformatics 24, 1461-1462. (doi:10.1093/bioinformatics/btn209)
Zhou, X., & Stephens, M. 2012. Genome-wide efficient mixed-model analysis for association studies. Nat. Gen. 44, 821-824. (doi:10.1038/ng.2310)
Ziyatdinov, A., Vázquez-Santiago, M., Brunel, H., Martinez-Perez, A., Aschard, H., Soria, J. M. 2018. lme4qtl: linear mixed models with flexible covariance structure for genetic studies of related individuals. BMC Bioinformatics 19, 68. (doi:10.1186/s12859-018-2057-x)
Changes after Oct 21, 2025:
Added Lindner et al. citation from in press manuscript associated with this dataset to readme and to the specific portions of the lme4qtl_R_code.txt that were adapted and modified.
