Data from: Revisiting ancient whole-genome duplications in the seed and flowering plants through the lens of dosage-sensitive genes
Data files
Abstract
Whole-genome duplication (WGD) has been proposed as a catalyst for evolutionary innovation in seed plants and angiosperms, yet its occurrence remains contentious. By integrating gene dosage balance principles with phylogenomic reconciliation and probabilistic modeling, we revisit the debated ancestral seed and angiosperm WGDs. Leveraging dosage-sensitive orthologous gene groups (OGs) as evolutionary markers across representative plants for gene-tree/species-tree reconciliation, we demonstrate that gene retention patterns in Amborella and Aristolochia — early-diverging plants lacking post-angiosperm-origin WGDs — reveal a single gene duplication peak predating the seed plant diversification, with no signal of ancestral angiosperm WGD. Correlation analyses of observed and expected OG copy numbers given proposed WGD(s) further refute an angiosperm WGD. Probabilistic retention-modeling analysis corroborates these findings and shows that retention rates of dosage-sensitive genes from the putative angiosperm WGD are extremely low. Besides, our study establishes that genes inferred to have higher dosage sensitivity based on their sequential retention following WGD events may have increased utility in resolving ancestral polyploidy.
Dryad DOI: https://doi.org/10.5061/dryad.gb5mkkx3z
This repository is associated with Shi, T. & Van de Peer, Y. (2025). Revisiting ancient whole-genome duplications in the seed and flowering plants through the lens of dosage-sensitive genes. Science Advances. In Press.
In this study, Ortholog Groups (OGs) are clustered by Orthofinder2 based on plants listed in Table S1 of our current manuscript (Shi & Van de Peer, 2025). The gene trees from OGs were based on FastTree v2, which were used for reconciliation with the species tree using Notung v2.9. In this reconciliation process, we applied a tree rearrangement threshold of 0.9 based on the SH-like score (tree node support) to minimize the duplication/loss cost.
This repository contains the scripts, OG files, and gene tree files necessary to perform dosage-sensitivity assessment of each OG by correlation between observed and expected OG copy numbers, gene-tree, and species-tree reconcilation for phylogenetic placement of Amborella & Aristolochia gene duplication events. The software included in these scripts is: R, Notung, Perl, and WGDgc. All software is open source and free to access.
Files and variables
File: data.zip
Description: 'data.zip' (after uncompressing) contains 1) folder 'OG_nonseedPlants' with ortholog groups (OGs) list related to Fig. 4 of the manuscript, 2) folder 'geneTrees_LandPlants' with gene tree files of .nwk format related to Fig. 5 of the manuscript, 3) folder 'OG_LandPlants' contains ortholog groups (OGs) related to Fig. 5, 4) folder 'OG_LandPlantsSmall' contains ortholog groups (OGs) related to Fig. 6
Code/software
File: codes.zip
Description:codes.zip contains codes for WGD detection, including 1) 'WGDdetection.txt' contains code for Dosage sensitivity assessment for OGs, and WGD detections using gene-tree-and-species-tree reconcilation, and WGDgc, 2) 'z_all_childnodes1.pl' and 'z_bulk_childnodes.pl' are Perl scripts used in'WGDdetection.txt'
Access information
Other publicly accessible locations of the data: N/A
Data was derived from the following sources: N/A
