Reproductive complexity, whole genome duplication, and genome size data across vascular plants
Data files
Nov 22, 2024 version files 2.99 MB
-
Analysis_Scripts.R
75.01 KB
-
Angio_reorder.tree
5.30 KB
-
ComplexityWGD_data.csv
292.01 KB
-
Data_Citations.txt
175.63 KB
-
Forest2018.tree
40.84 KB
-
Janssens2020.tree
2.12 MB
-
Nitta2022.tree
255.20 KB
-
README.md
10.88 KB
-
Testo2018.tree
16.02 KB
Abstract
Whole genome duplication (WGD) may be an important factor in plant macroevolution, implicated in diversification rate shifts, structural innovations, and increased disparity. But general effects of WGD on plant evolution are challenging to evaluate, in part due to the difficulty of directly comparing morphological patterns across clades. We explored relationships between WGD and the evolution of complexity across vascular plants using a metric based on the number of reproductive part types. We used multiple regression models to evaluate the relative importance of inferred WGD events, genome size, and a suite of additional variables relating to growth habit and reproductive biology in explaining part type complexity. WGD was a consistent predictor of reproductive complexity only among angiosperms. Across vascular plants more generally, reproductive biology, clade identity, and the presence of bisexual strobili (those that produce microsporangiate and megasporangiate organs) were better predictors of complexity. Angiosperms are unique among vascular plants in combining frequent polyploidy with high reproductive complexity. Whether WGD is mechanistically linked to floral complexity is unclear, but we suggest widespread polyploidy and increased complexity were ultimately facilitated by the evolution of herbaceous growth habits in early angiosperms.
README: Reproductive complexity, whole genome duplication, and genome size data across vascular plants
GENERAL INFORMATION
- Title of Dataset: Whole Genome Duplications and Reproductive Complexity
- Author Information A. Principal Investigator Contact Information Name: Andrew Leslie Institution: Stanford University Address: 450 Jane Stanford Way, Building 320, Room 118, Stanford, CA 94305, USA.\ Email: aleslieb@stanford.edu
B. Associate or Co-investigator Contact Information
Name: Luke Mander
Institution: The Open University
Address: Walton Hall, Milton Keynes, MK7 6AA, UK.
Email: luke.mander@gmail.com
DATA & FILE OVERVIEW
File List:
- ComplexityWGD_data.csv
- Analysis Scripts.R
- Data_Citations.txt
- Forest2018.tree
- Janssens2020.tree
- Nitta2022.tree
- Testo2018.tree
- Angio_reorder.tree
"ComplexityWGD_data.csv"" contains the data and character scorings used in our analyses. The various .tree files include phylogenetic trees used to generate the phylogeny in this study or subtrees used in figures.\
The data and tree files can be imported into the provided R script ("Analysis Scripts.R") to reproduce the results, main figures, and supplemental figures reported in the manuscript.
DATA-SPECIFIC INFORMATION FOR: ComplexityWGD_data.csv
- Number of variables: 44
- Number of cases/rows: 1291
- Variable List:
- Taxon: species to which each reproductive structure belongs.
- in_tree: whether (1) or not (0) taxon is included in a phylogenetic tree used in this analysis.
- phyloname: taxonomic name used to represent each taxon in the phylogenetic trees. In some cases, the species we scored was not the same as that included in the phylogeny. We substituted our species into the phylogeny in cases where this did not change the topology.
- group1-5: nested taxonomic affiliations for each reproductive structure.
- group1-5: nested taxonomic affiliations for each reproductive structure.
- group1-5: nested taxonomic affiliations for each reproductive structure.
- group1-5: nested taxonomic affiliations for each reproductive structure.
- group1-5: nested taxonomic affiliations for each reproductive structure.
- group6: a binary character used in some analyses to identify all angiosperms that are not monocots.
- WGDmincon: the "consensus" low number of possible WGD events in the history of a taxon. This number is based on WGD events that are either widely regarded as having occurred (such as at the base of the angiosperms), are recovered in many separate studies, or were recovered by multiple analytical methods (typically Ks plots plus MAPS analysis). This is the number of WGD events used throughout the manuscript as the low number.
- WGDmin: the low number of possible WGD events in the history of a taxon after removing a debated WGD event at the base of angiosperms.
- WGDmax: the high number of possible WGD events in the history of a taxon. This number is based on WGD events recovered by any analysis method in any study. It is the number used as the high estimate of WGD events in the manuscript.
- KPmin: the low number of possible WGD events recovered in the One Thousand Plant Transcriptomes Initiative and shown on the phylogeny in Supplemental Figure 8. This analysis is used as a consistent baseline for identifying WGD events across all vascular plants. The low number represents only WGD events found by two analytical methods (Ks plots, MAPS).
- KPmax: the high number of possible WGD events recovered in the One Thousand Plant Transcriptomes Initiative and shown on the phylogeny in Supplemental Figure 8. This analysis is used as a consistent baseline for identifying WGD events across all vascular plants. The high number represents WGD events found by either or both of the analytical methods (Ks plots, MAPS).
- Huang2020: WGD events among leptosporangiate ferns recovered by Huang et al. 2020
- Huang2020+KP: WGD events among leptosporangiate ferns recovered by Huang et al. 2020 with added potential WGD events recovered in the 1KP analysis. These events correspond to two putative WGD events in the lineage leading to euphyllophytes but which was outside the coverage of the Huang et al. 2020 study.
- Pelosi2020min: the low number of WGD events among leptosporangiate ferns recovered by Pelosi et al. 2020. The low number represents only WGD events found by two analytical methods (Ks plots, MAPS).
- Pelosi2020max: the high number of WGD events among leptosporangiate ferns recovered by Pelosi et al. 2020. The high number represents WGD events found by Ks plots alone.
- Pelosi2020min+KP: the low number of WGD events among leptosporangiate ferns recovered by Pelosi et al. 2020 with added WGD events recovered in the 1KP analysis. These events correspond to two putative WGD events in the lineage leading to euphyllophytes but which was outside the coverage of the Pelosi et al. 2020 study.
- Pelosi2020max+KP: the high number of WGD events among leptosporangiate ferns recovered by Pelosi et al. 2020 with added WGD events recovered in the 1KP analysis. These events correspond to two putative WGD events in the lineage leading to euphyllophytes but which was outside the coverage of the Pelosi et al. 2020 study.
- Stull2021: WGD events among acrogymnosperms recovered by Stull et al. 2021.
- Stull2021+KP: WGD events among acrogymnosperms recovered by Stull et al. 2021 with added WGD events recovered in the 1KP analysis. These events correspond to two putative WGD events in the lineage leading to euphyllophytes but which was outside the coverage of the Stull et al. 2021 study.
- Cvalue: DNA 1C-value (in pg) from the Kew C-Value Database
- Cvalue_avg: whether (1) or not (0) and Cvalue is based on a genus average or was measured from the particular species in our data set.
- fossil: whether reproductive structure was produced by an extinct (fossil) or extant taxon.
- period: geologic age of reproductive structures; recorded to most finely resolved time interval.
- epoch: geologic age of reproductive structures; recorded to most finely resolved time interval.
- stage: geologic age of reproductive structures; recorded to most finely resolved time interval.
- maximum absolute age: age (millions of years ago) of reproductive structures based on the age range of its most finely resolved geologic time interval or radiometric dating of its locality.
- minimum absolute age: age (millions of years ago) of reproductive structures based on the age range of its most finely resolved geologic time interval or radiometric dating of its locality.
- midpoint absolute age: age (millions of years ago) of reproductive structures based on the age range of its most finely resolved geologic time interval or radiometric dating of its locality.
- reference: citation(s) used to score each reproductive structure; numbers correspond to those in "Data_Citations.txt".
- habit: whether a given reproductive structure was produced by a free-sporing vascular plant ('freesporing') or a seed plant ('seed').
- heterospory: whether (1) or not (0) that taxon that produced the reproductive structure was heterosporous.
- combined: a multistate character identifying how fertile organs are arranged in the reproductive structure. (0) unisexual: only microsporangiate or megasporangiate organs present, (1) bisexual: both microsporangiate and megasporangiate organs present, (2) artificially combined: separate unisexual reproductive structures produced by a single taxon were amalgamated into a single hypothetical structure, (3) artificially combined, interpreted: separate unisexual reproductive structures possibly produced by a single taxon were amalgamated into a single hypothetical structure, (4) bisexual structure, interpreted: both microsporangiate and megasporangiate organs are thought to be produced in a single structure but are not definitively known.
- analysis_group: basic groups used in analyses presented in the main text. Where possible, these groups are based on resolved clades; for groups with unknown affinities, they may also represent heterogenous groupings used for convenience.
- mega: whether (1) or not (0) a given reproductive structure has female functionality; that is, whether it produces either megaspores or seeds. Bisexual reproductive structures that produce seeds are scored as “1” because they have female functionality.
- biotic.pollination: whether (1) or not (0) a reproductive structure is pollinated by animals. NA for free-sporing plants and fossil taxa where pollination is not known definitively.
- seed.plant: whether (1) or not (0) a reproductive structure is produced by a seed plant.
- angiosperm: whether (1) or not (0) a reproductive structure is produced by an angiosperm.
- total.parts: the total number of part types in a reproductive structure. This number includes all part types expressed over ontogeny.
- bisex.parts: part types directly associated with an additional reproductive organ; in structures that function as microsporangiate this would include sterile ovule parts, while in those that function to produce megaspores or seeds it would include microsporangiate parts. Scored as "NA" for artificially combined reproductive structures and for fossil taxa, which were not used in analyses with bisexuality as a predictor variable.
- symmetry: floral symmetry scores for angiosperm taxa; 1=helical, 2=radial, 3=zygomorphic/disymmetric, 4=asymmetric, 5=reduced flower lacking symmetry, NA=not applicable.
- merism: floral organ arrangements; 1=tri/hexamerous, 2=di/tetramerous, 3=pentamerous, NA=not applicable
- Missing data codes: missing or inapplicable data given as "NA".
SPECIFIC INFORMATION FOR: Analysis Scripts.R
This annotated R script contains all commands necessary to reproduce the analyses and plots from Figures 1-4 and Table 1 in the main text, and Figure S1 and Tables S1-S2 in the Supplement. It uses "ComplexityWGD_data.csv", "Angio_reorder.tree","Forest2018.tree","Janssens2020.tree","Nitta2020.tree", and "Testo2018.tree" as inputs.
DATA-SPECIFIC INFORMATION FOR: Data_Citations.txt
This text file contains a numbered list of all sources for morphological scorings. The numbers correspond to those associated with each reproductive structure in the "Reference" column of the "ComplexityWGD_data.csv" file.
DATA-SPECIFIC INFORMATION FOR: Angio_reorder.tree, Forest2018.tree, Janssens2020.tree, Nitta2022.tree, Testo2018.tree
These .tree files are used to build the vascular plant phylogeny for phylogenetic regression analyses in this study. Full citations for these studies are given in the main text.