CALERIE trial molecular data summary: DNA methylation, mRNA, smRNA for blood, adipose, and muscle
Abstract
Caloric restriction (CR) slows biological aging and prolongs healthy lifespan in model organisms. Findings from the CALERIE randomized, controlled trial of long-term CR in healthy, non-obese humans (NCT00427193) broadly support a similar pattern of effects in humans. To expand our understanding of the molecular pathways and biological processes underpinning CR effects in humans, we generated a series of genomic datasets from stored biospecimens collected from n=218 participants during the trial. These data constitute the first genomic data resource for a randomized controlled trial of an intervention targeting the biology of aging. Datasets include whole-genome SNP genotypes, and three-timepoint-longitudinal DNA methylation, mRNA, and small RNA datasets generated from blood, skeletal muscle, and adipose tissue samples (total sample n=2327). The CALERIE Genomic Data Resource described in this article is available from the Aging Research Biobank. This multi-tissue, multi-omic, longitudinal data resource has great potential to advance translational geroscience.
https://doi.org/10.5061/dryad.pzgmsbcxh
Description of the data and file structure
Principle Investigator Contact Information
Name: Calen Patrick Ryan, PhD
Institution: Columbia Aging Center Geroscience Computational Core, Columbia University
Email: cpr2139@cumc.columbia.edu
Alternate Contact Information
Name: Daniel W. Belsky, PhD
Institution: Department of Epidemiology, Columbia Aging Center Geroscience Computational Core, Columbia University
Email: db3275@cumc.columbia.edu
Funding
This research utilized the FlowSorted.BloodExtended.EPIC software packages developed at Dartmouth College which are governed by the licensing terms provided by Dartmouth Technology Transfer (https://github.com/immunomethylomics/FlowSorted.BloodExtended.EPIC/blob/main/SoftwareLicense.FlowSorted.BloodExtended.EPIC%20to%20sign.pdf). Figures 1 and 2 and Extended Figures 1 and 2 include images created in BioRender in the Columbia Aging Center Geroscience Computational Core (2024) https://biorender.com/t12h125. CALERIE is a registered trademark.
The authors confirm all summary data in this repository can be published under a CC0 license waiver.
Recommended Citation
CP Ryan, DL Corcoran, N Banskota, C Eckstein Indik, A Floratos, R Friedman, MS Kobor, VB Kraus, WE Kraus, JL MacIsaac, MC Orenduff, CF Pieper, JP White, L Ferrucci, S Horvath, KM Huffman, DW Belsky. (2024) CALERIE trial molecular data summary: DNA methylation, mRNA, smRNA for blood, adipose, and muscle. Dryad Data Repository. https://doi.org/10.5061/dryad.pzgmsbcxh
Description of the data and file structure
This contains two main folders: Data and Code. Data contains summary data (not raw data) for each dataset. Code contains code used to process data or create summaries and figures in the paper.
Files and Folders
Data/DNAm
Contains all DNA methylation summary files.
blood_dnam_summary
Blood_DNAm_all_baseline_probes.csv
Blood_DNAm_all_12_month_probes.csv
Blood_DNAm_all_24_month_probes.csv
Blood_DNAm_all_samples_probes.csv
These files contain a list of probes with complete data for samples at baseline only, samples at 12-month follow-up only, samples at 24-month follow-up only, or all samples at all timepoints in the study, respectively.
Blood_DNAm_all_baseline_summary.csv
Blood_DNAm_all_12_month_summary.csv
Blood_DNAm_all_24_month_summary.csv
Blood_DNAm_all_samples_summary.csv
These files contain a list of probes, where each row corresponds to each probe, and columns are for the number of samples missing that probe, the mean, standard deviation, median, minimum, and maximum beta values.
muscle_dnam_summary
Muscle_DNAm_all_baseline_probes.csv
Muscle_DNAm_all_12_month_probes.csv
Muscle_DNAm_all_24_month_probes.csv
Muscle_DNAm_all_samples_probes.csv
These files contain a list of probes with complete data for samples at baseline only, samples at 12-month follow-up only, samples at 24-month follow-up only, or all samples at all time points in the study, respectively.
Muscle_DNAm_all_baseline_summary.csv
Muscle_DNAm_all_12_month_summary.csv
Muscle_DNAm_all_24_month_summary.csv
Muscle_DNAm_all_samples_summary.csv
These files contain a list of probes, where each row corresponds to each probe, and columns are for the number of samples missing that probe, the mean, standard deviation, median, minimum, and maximum beta values.
adipose_dnam_summary
Adipose_DNAm_all_baseline_probes.csv
Adipose_DNAm_all_12_month_probes.csv
Adipose_DNAm_all_24_month_probes.csv
Adipose_DNAm_all_samples_probes.csv
These files contain a list of probes with complete data for samples at baseline only, samples at 12-month follow-up only, samples at 24-month follow-up only, or all samples at all time points in the study, respectively.
Adipose_DNAm_all_baseline_summary.csv
Adipose_DNAm_all_12_month_summary.csv
Adipose_DNAm_all_24_month_summary.csv
Adipose_DNAm_all_samples_summary.csv
These files contain a list of probes, where each row corresponds to each probe, and columns are for the number of samples missing that probe, the mean, standard deviation, median, minimum, and maximum beta values.
Data/mRNA
Contains all mRNA summary files.
muscle_rna_summary
Muscle_mRNA_all_baseline_summary.csv
Muscle_mRNA_all_12_month_summary.csv
Muscle_mRNA_all_24_month_summary.csv
Muscle_mRNA_all_samples_summary.csv
These files contain a list of transcripts, where each row corresponds to each transcript, and columns are for the number of samples missing that probe, the mean, standard deviation, median, minimum, and maximum beta values. Transcripts missing in any sample were removed.
adipose_rna_summary
Adipose_mRNA_all_baseline_summary.csv
Adipose_mRNA_all_12_month_summary.csv
Adipose_mRNA_all_24_month_summary.csv
Adipose_mRNA_all_samples_summary.csv
These files contain a list of transcripts, where each row corresponds to each transcript, and columns are for the number of samples missing that probe, the mean, standard deviation, median, minimum, and maximum beta values. Transcripts missing in any sample were removed.
Data/smallRNA
Contains all smallRNA summary files.
Plasma
Plasma_smRNA_all_baseline_summary.xlsx
Plasma_smRNA_all_12_month_summary.xlsx
Plasma_smRNA_all_24_month_summary.xlsx
Plasma_smRNA_all_samples_summary.xlsx
These files contain a list of transcripts, where each row corresponds to each transcript, and columns are for the number of samples missing that probe, the mean, standard deviation, median, minimum, and maximum beta values. Transcripts missing in any sample were removed.
Muscle
Muscle_smRNA_all_baseline_summary.xlsx
Muscle_smRNA_all_12_month_summary.xlsx
Muscle_smRNA_all_24_month_summary.xlsx
Muscle_smRNA_all_samples_summary.xlsx
These files contain a list of transcripts, where each row corresponds to each transcript, and columns are for the number of samples missing that probe, the mean, standard deviation, median, minimum, and maximum beta values. Transcripts missing in any sample were removed.
Adipose
Adipose_smRNA_all_baseline_summary.xlsx
Adipose_smRNA_all_12_month_summary.xlsx
Adipose_smRNA_all_24_month_summary.xlsx
Adipose_smRNA_all_samples_summary.xlsx
These files contain a list of transcripts, where each row corresponds to each transcript, and columns are for the number of samples missing that probe, the mean, standard deviation, median, minimum, and maximum beta values. Transcripts missing in any sample were removed.
Code/Software
README.md
Basic readme file for the GitHub with code at: https://github.com/CPRyan/CALERIE_Genomic_Data_Resource
Phenotypes_by_Tissue_Molecular_Datatype.R
Code used to produce a simple table showing key anthropometric and demographic information on participants for each molecular dataset.
Sample_Overlap_and_Matrix_Clean.R
Code used to determine upset plots and sample overlap matrix file provided with the paper (Figure 1C). This file allows researchers to determine, for a given sample, which molecular data types are available at which time points.
Blood_DNAm
Processing/processRawData.R
Processing pipeline for blood DNAm.
Derived_Variables/CALERIE_Biorepository_age_scatterplot_matrix.R
Code used to produce age correlations between clocks (residualized) and between clocks and age, as shown in Figure 4.
Derived_Variables/cell_counts_for_biorepository.R
Code used to produce cell count boxplots in Figure 5 in the paper.
Summary_Tables/Blood_Summary_Tables.R
Code used to generate blood DNAm summary tables.
Muscle_DNAm
Processing/Muscle_Corcoran_Pipeline.R
Processing pipeline for muscle DNAm.
Summary_Tables/Muscle_Summary_Tables.R
Code used to generate muscle DNAm summary tables.
Adipose_DNAm
Processing/Adipose_Corcoran_Pipeline.R
Processing pipeline for adipose DNAm.
Summary_Tables/Muscle_Summary_Tables.R
Code used to generate adipose DNAm summary tables.
Muscle_RNA
Muscle_RNAseq_Summary.R
Code used to generate muscle RNA summary tables.
Adipose_RNA
Adipose_RNAseq_Summary.R
Code used to generate adipose RNA summary tables.
Adipose_RNA_volcano.R
Code used to generate volcano plots in Figure 6 from limma output (Tables S4 and S5) for 12-month and 24-month time points.
GSEA_Function.R
The function used in Run_GSEA_on_Adipose_RNAseq.R generates enrichment figures in Figure 6.
Run_GSEA_on_Adipose_RNAseq.R
The code used to run GSEA is used for Figure 6 and Tables S2 and S3.
Access Information/Data and Code Availability
Other publicly accessible locations of the data:
Processed data can be accessed through the Aging Research Biobank (https://agingresearchbiobank.nia.nih.gov/studies/calerie/). Data use is restricted to non-commercial use in studies to determine factors that affect age-related conditions. Applications for data access include a brief summary of the research question and intended analysis and proof of IRB approval for the project.
Original raw data may be obtained from the Belsky Lab (cac_geroscience@cumc.columbia.edu). Code used in the production of summary data and figures are available at https://github.com/CPRyan/CALERIE_Genomic_Data_Resource
CALERIE Phase 2 was a multi-center, randomized controlled trial conducted at three clinical centers in the United States8 (ClinicalTrials.gov Identifier: NCT00427193). It aimed to evaluate the time-course effects of 25% CR (that is, intake 25% below the individual’s baseline level) over a 2-yr period in healthy adults (men aged 21–50 yr, premenopausal women aged 21–47 yr) with BMI in the normal weight or slightly overweight range (BMI 22.0–27.9 kg m−2). The study protocol was approved by Institutional Review Boards at three clinical centers (Washington University School of Medicine, St Louis, MO, USA; Pennington Biomedical Research Center, Baton Rouge, LA, USA; Tufts University, Boston, MA, USA) and the coordinating center at Duke University (Durham, NC, USA). All study participants provided written informed consent. Nongenomic data were obtained from the CALERIE Biorepository (https://calerie.duke.edu/apply-samples-and-data-analysis). Oversight of our study was performed by the Institutional Review Board of Columbia University Irving Medical Center AAAS2948.
Extending CALERIE phase I in both scale and duration, CALERIE recruited a total of 220 subjects and assigned them in a 2:1 allocation to a CR treatment group or ad libitum (AL) control arm. Subjects were randomly assigned to CR or AL groups stratified on study site, sex, and body mass index. Participants in the CR group were assigned to a protocol designed to result in a 25% reduction in caloric intake relative to estimated energy requirements at enrollment. CR participants received an intensive behavioral intervention that included individual and group sessions, a meal provision phase, digital assistants to monitor caloric intake, and training in portion estimation and other nutrition and behavioral topics. Adherence was assessed using measures of energy expenditure using the doubly-labeled water method as well as expected changes in body composition. The duration of the study for both CR and AL participants was 2 years.
Throughout the 2-year study duration, starting at baseline prior to randomization and recurring at months 1, 3, 6, 9, 12, 18, and 24, participants were evaluated for a range of pre-specified anthropometric, psychological, behavioral, and physiological outcomes. Blood samples were collected every six months. At baseline, 12-months, and 24-months, whole blood and samples were collected and banked. In addition, a subset of participants agreed to biopsies of adipose and muscle tissue at baseline, 12-months, and 24-months. From these samples, SNP-based genotypes, DNA methylation, mRNA, and small RNAs were assayed. Here, we describe these datasets and provide an overview of this data resource (Fig. 1).
More details about the CALERIE trial, including study protocols and ongoing and published research, are available at https://calerie.duke.edu. Data can be accessed through the Aging Research Biobank (https://agingresearchbiobank.nia.nih.gov/studies/calerie/). Data use is restricted to non-commercial use in studies to determine factors that affect age-related conditions. Biospecimens are available, but limited to research on the effects that caloric restriction may have on aging and aging-related diseases. Applications for data access include a brief summary of the research question and intended analysis and proof of IRB approval for the project.