Skip to main content

DNA methylation signatures of duplicate gene evolution in angiosperms

Cite this dataset

Niederhuth, Chad (2023). DNA methylation signatures of duplicate gene evolution in angiosperms [Dataset]. Dryad.


Gene duplication is a source of evolutionary novelty. DNA methylation may play a role in the evolution of duplicate genes through its association with gene expression. While this relationship is examined to varying extents in a few individual species, the generalizability of these results at either a broad phylogenetic scale with species of differing duplication histories or across a population remains unknown. We apply a comparative epigenomics approach to 43 angiosperm species across the phylogeny and a population of 928 Arabidopsis thaliana accessions, examining the association of DNA methylation with paralog evolution. Genic DNA methylation is differentially associated with duplication type, the age of duplication, sequence evolution, and gene expression. Whole genome duplicates are typically enriched for CG-only gene-body methylated or unmethylated genes, while single-gene duplications are typically enriched for non-CG methylated or unmethylated genes. Non-CG methylation, in particular, was characteristic of more recent single-gene duplicates. Core angiosperm gene families are differentiated into those which preferentially retain paralogs and ‘duplication-resistant’ families, which convergently revert to singletons following duplication. Duplication-resistant families which still have paralogous copies are, uncharacteristically for core angiosperm genes, enriched for non-CG methylation. Non-CG methylated paralogs have higher rates of sequence evolution, higher frequency of presence-absence variation, and more limited expression. This suggests that silencing by non-CG methylation may be important to maintaining dosage following duplication and be a precursor to fractionation. Our results indicate that genic methylation marks differing evolutionary trajectories and fates between paralogous genes and have a role in maintaining dosage following duplication.

Usage notes

This dataset includes formatted genomes and annotations for use with code from:


National Science Foundation, Award: IOS-2029959

United States Department of Agriculture, Award: MICL02572

National Science Foundation, Award: DBI-1757043