Data and scripts from: Analyzing ontogenetic changes in Cupressaceae seed cone size
Data files
Aug 13, 2024 version files 904 KB
-
Cupressaceae_Disparity_Analyses.R
-
Hypervolume_and_Overlap_Construction.R
-
Ontogeny_data.csv
-
Overlaps_pooled.csv
-
Overlaps_randomized.csv
-
Overlaps_taxa.csv
-
README.md
-
Sampling_Curve_Construction.R
Abstract
README: Data and scripts from: Analyzing ontogenetic changes in Cupressaceae seed cone size
[Access this dataset on Dryad] (10.5061/dryad.9zw3r22nb)
An analysis of how inter-taxon morphological disparity changes across ontogeny in Cupressaceae seed cone scales. The dataset consists of measurements of the cone scales of 10 species of conifers with multiple cones per species.
Analysis is primarily done through the creation of hypervolumes from the 3-dimensional measurements of cone scales then comparing the resulting hypervolumes to contextualize both total morphospace occupance, via overlaps with the aggregate of each developmental stage, as well as morphospace position, via taxon-to-taxon overlap scores. Scores are measured via a calculated Jaccard Index Score.
Code within the dataset is provided for the generation of these hypervolumes and overlap scores, for the creation of three sampling curves gauging how additional cones contribute to overall morphological disparity within a specific species at a developmental stage (these being Platycladus orientalis at pollination and Metasequoia glyptostroboides at pollination and maturity), and for the creation of the various plots and figures presented in NPH19482.
Also provided are processed datasets of overlap scores for ease of figure construction, without needing to rerun the full overlap code.
Description of the data and file structure
Data consists of 3 R documents of code and four csv files for data.
FileList-
Cupressaceae_Disparity_Analyses.R
Hypvervolume_and_Overlap_Construction.R
Ontogeny_data.csv
Overlaps_pooled.csv
Overlaps_randomized.csv
Overlaps_taxa.csv
Sampling_Curve_Construction.R
Raw data from the measurements is provided in Ontogeny_data.csv, a csv spreadsheet consisting of four thousand eight hundred and ninety-six rows organized into thirteen columns. Each row represents a single cone scale.
Columns and what each means are as follows:
Taxon: Species as identified either by the botanical gardens or the authors.
Phylogenetic order: Order of appearance within the phylogenetic structure of each taxon, used for creating some figures.
Individual: Individual specimen cones were collected from, with numbers assigned to specific trees at time of collection.
Cone: Identification of which cone each cone scale belongs to with all scales from a given individual and cone number combination belonging to the same cone.
Cone_ID: A unique marker given to each cone scale in the form X_y where X represents the cone number and y the scale number from that cone.
Stage: Ontogenetic stage of cone scale at time of collection and measurement, with pollination being the point where ovules were exposed to windborne pollens and maturity being the point of seed dispersal.
Length: Length of the cone scale as measured in millimetres from the connection of a cone scale to the central axis outwards to the furthest point.
Width: Width of the cone scale as measured in millimetres from the longest distance within an adaxial/abaxial plane.
Thickness: Thickness of the cone scale as measured in millimetres from the largest line between the adaxial and abaxial surfaces.
Thickness_Bract.only: Thickness of the cone scale bract as measured in millimetres from the largest line between the adaxial and abaxial surfaces excluding structures defined as belonging to a ovuliferous scale of a bract-scale complex. This is identical in most respects to Thickness except for Cryptomeria japonica specimens.
lLength: The base 10 log of cone scale length.
lWidth: The base 10 log of cone scale width.
lThickness: The base 10 log of cone scale thickness.
Other datasheets represent analysed and pooled data on hypervolume overlaps (recorded as Jaccard scores) which are presented as csv files. All scores are generated by the code contained within the file Hypervolume_and_Overlap_Construction.R
These datasheets include:
Overlaps_taxa.csv: Taxon to taxon overlaps at both stages. Each column represents a different taxon and developmental stage with Taxon names given in a shortened form with the addition of a single letter for stage .p for pollination or .m for mature. For example column Cun.p would be the overlaps between Cunninghamia lanceolata at pollination and all other taxon at pollination, with each row representing the Jaccard score of a different taxon at pollination overlapping with Cunninghamia lanceolata.
Overlaps_pooled.csv: Overlap scores between taxa and the aggregate at each stage. Columns are the Taxon name, Jaccard score at Pollination, and Jaccard score at Maturity.
Overlaps_randomized.csv: The overlap scores generated by the bootstrapping process, presented as two columns one for each stage.
The three coding documents include:
Hypervolume_and_Overlap_Construction.R: The main data processing code which constructs hypervolumes and compares overlaps as annotated within.
Cupressaceae_Disparity_Analyses: Additional data processing and figure construction for figures contained within NPH19482.
Sampling_Curve_Construction: Construction of the sampling curves shown in the Supplementary Information for NPH19482, specifically SF1.
All coding documents are annotated for ease of repeat use.
Sharing/Access information
This is currently the most accessible form of this data.
Code/Software
Code is provided as R files (R Studio version information provided below). Various packages have been loaded as well, each is specified within the code documents, but they include shape, dendextend, dplyr, hypervolume, ggplot2, and readxl.
Data for figure construction outlined in Cupressaceae_Disparity_Analyses.R was generated using code found in Hypervolume_and_Overlap_Construction.R. The data was then extracted to an xl/csv file and reformatted to the version found within slightly rearranging the position of the various scores for ease in graphing. Other file transformations should take place within the code documents themselves.
RStudio 2023.06.0+421 "Mountain Hydrangea" Release (583b465ecc45e60ee9de085148cd2f9741cc5214, 2023-06-05) for windows
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) RStudio/2023.06.0+421 Chrome/110.0.5481.208 Electron/23.3.0 Safari/537.36