Skip to main content
Dryad

The evolutionary dynamics and fitness landscape of clonal hematopoiesis

Cite this dataset

Watson, Caroline J. et al. (2020). The evolutionary dynamics and fitness landscape of clonal hematopoiesis [Dataset]. Dryad. https://doi.org/10.5061/dryad.83bk3j9mw

Abstract

Somatic mutations acquired in healthy tissues as we age are major determinants of cancer risk. Whether variants confer a fitness advantage or rise to detectable frequencies by change remains largely unknown. Blood sequencing data from ∼50,000 individuals reveals how mutation, genetic drift and fitness shape the genetic diversity of healthy blood (clonal hematopoiesis). We show that positive selection, not drift, is the major force shaping clonal hematopoiesis, provide bounds on the number of hematopoietic stem cells, and quantify the fitness advantages of key pathogenic variants, at single-nucleotide resolution, as well as the distribution of fitness effects (fitness landscape) within commonly mutated driver genes. These data are consistent with clonal hematopoiesis being driven by a continuing risk of mutations and clonal expansions that become increasingly detectable with age.

Usage notes

This repository contains the necessary data files to accompany the code for analyses in the manuscript "The Evolutionary Dynamics and Fitness Landscape of Clonal Hematopoiesis".  Accompanying code can be found in folders of the same corresponding name on the Blundell Lab Github page (10.5281/zenodo.3706791).  

 

Data used for generation of figures/ analyses in main text:

Figure 1:

  • Figure 1b (distribution of variants across DNMT3A):
    • DRYAD data folder: Distribution of variants across DNMT3A - data files
  • Figure 1e (VAF-density histrogram):
    • DRYAD data folder: Distribution of variants across DNMT3A.

Figure 2:

  • Figure 2a (fitness effects & mutation rates for top 20 most commonly observed CH variants):
    • DRYAD data folder: Maximum likelihood estimations/ Fitness landscape of top 20 most commonly observed variants in CH.
  • Figure 2b (distribution of fitness effects in key CH driver genes):
    • DRYAD data folder: Maximum likelihood estimations/ Fitness landscape of key CH driver genes (DNMT3A, TET2, ASXL1, TP53).

Figure 3: predicted prevalence of CH as a function of age):

  • DRYAD data folder: Predicted prevalence of CH and double mutations.

 

Code & data for generation of figures/ analyses in supplement:

Supplementary Methods 1 (data trimming):

  • DRYAD data folder: Trimming the data.

Supplementary Methods 5 (study-specific mutation rates):

  • DRYAD data folder: Mutation rate calculations.

Supplementary Methods 6 (emergence of clones with multiple mutations):

  • DRYAD data folder: Predicted prevalence of CH and double mutations.

Supplementary Methods 7:

  • Figure S9 (DNMT3A R882H):
    • DRYAD data folder: Maximum likelihood estimations/ DNMT3A R882H, DNMT3A nonsynonymous, all synonymous.
  • Figure S10 (top 20 most commonly observed variants):
    • DRYAD data folder: Maximum likelihood estimations/ Fitness landscape of top 20 most commonly observed variants.
  • Figure S11 (R882H vs R882HC):
    • DRYAD data folder: Maximum likelihood estimations/ Fitness of DNMT3A R882H vs R882C.
  • Figure S12 (nonsynonymous variants parameter estimation):
    • DNMT3A:
      • DRYAD data folder: Maximum likelihood estimations/ DNMT3A R882H, DNMT3A nonsynonymous, all synonymous.
    • TET2, ASXL1, TP53:
      • DRYAD data folder: Maximum likelihood estimations/ Fitness landscape of key CH driver genes (DNMT3A, TET2, ASXL1, TP53).
  • Figure S13 (synoymous variants parameter estimation):
    • DRYAD data folder: Maximum likelihood estimations/ DNMT3A R882H, DNMT3A nonsynonymous, all synonymous.

Supplementary Methods 8:

  • Figure S16 (hitchhiker variants parameter estimation):
    • DRYAD data folder: Maximum likelihood estimations/ DNMT3A R882H, DNMT3A nonsynonymous, all synonymous.

Supplemetary Methods 9 (odds ratio of AML stratified by variant fitness):

  • DRYAD data folder: Highly fit variants are enriched in pre-AML blood samples.

Supplementary Methods 10 (age prevalence of R882H and R882C mutations):

  • DRYAD data folder: Age prevalence of DNMT3A R882H and R882C variants.

Supplementary Methods 11 (parameter estimation for 10 commonly mutated CH genes):

  • DRYAD data folder: Predicted prevalence of CH and double mutants.

Supplementary Methods 12: (estimating fitness effects of infrequently mutated sites)

  • DRYAD data folder: Estimating fitness effects of infrequently mutated sites.

Supplementary Methods 13: (limitations of study size and sequencing limit)

  • DRYAD data folder: Limitations of study size and sequencing limit.

 

Funding

National Science Foundation

National Cancer Institute

Entertainment Industry Foundation

UK Research and Innovation

Cancer Research UK

Cancer Research UK Cambridge Centre

Bei Shan Tang Foundation