Data from: Assessing bayesian phylogenetic information content of morphological data using knowledge from anatomy ontologies
Data files
Apr 29, 2024 version files 5.36 MB
-
Dryad_Final.zip
-
README.md
Abstract
Morphology remains a primary source of phylogenetic information for many groups of organisms, and the only one for most fossil taxa. Organismal anatomy is not a collection of randomly assembled and independent ‘parts’, but instead a set of dependent and hierarchically nested entities resulting from ontogeny and phylogeny. How do we make sense of these dependent and at times redundant characters? One promising approach is using ontologies—structured controlled vocabularies that summarize knowledge about different properties of anatomical entities, including developmental and structural dependencies. Here we assess whether the proximity of ontology-annotated characters within an ontology predicts evolutionary patterns. To do so, we measure phylogenetic information across characters and evaluate if it is hierarchically structured by ontological knowledge—in much the same way as phylogeny structures across-species diversity. We implement an approach to evaluate the Bayesian phylogenetic information (BPI) content and phylogenetic dissonance among ontology-annotated anatomical data subsets. We applied this to datasets representing two disparate animal groups: bees (Hexapoda: Hymenoptera: Apoidea, 209 chars) and characiform fishes (Actinopterygii: Ostariophysi: Characiformes, 463 chars). For bees, we find that BPI is not substantially structured by anatomy since dissonance is often high among morphologically related anatomical entities. For fishes, we find substantial information for two clusters of anatomical entities instantiating concepts from the jaws and branchial arch bones, but among-subset information decreases and dissonance increases substantially moving to higher level subsets in the ontology. We further applied our approach to address particular evolutionary hypotheses with an example of morphological evolution in miniature fishes. While we show that ontology does indeed structure phylogenetic information, additional relationships and processes, such as convergence, likely play a substantial role in explaining BPI and dissonance, and merit future investigation. Our work demonstrates how complex morphological datasets can be interrogated with ontologies by allowing one to access how information is spread hierarchically across anatomical concepts, how congruent this information is, and what sorts of processes may structure it: phylogeny, development, or convergence.
README: Data from: Assessing Bayesian Phylogenetic Information Content of Morphological Data Using Knowledge From Anatomy Ontologies
This dataset contains the Supplemental Material and all files to reproduce the empirical analyses from Porto et al. (2022)
In summary, the empirical analyses consisted of applying the newly developed ontobayes approach, as described in Porto et al. (2022), to three morphological datasets, as detailed below.
Description of the data and file structure
1. BEE.zip: This zipped folder contains the R scripts and data files used in the analyses of the BEE dataset. The BEE dataset consists of 209 morphological phylogenetic characters coded for 10 species of corbiculate bees and related taxa (Hexapoda: Hymenoptera: Apidae) modified from Porto et al. (2021). The phylogenetic characters were annotated with terms from the Hymenoptera Anatomy Ontology (HAO). The folder contains the following subfolders and files:
1.1. The subfolder ‘data’ with three files:
(a) ‘bees.nex’: the phylogenetic character matrix modified from Porto et al. (2021) in NEXUS format;
(b) ‘bee_data.csv’: a table with the ontology annotations to the phylogenetic characters; it contains four columns: ID: character IDs from Porto et al. (2021); ONTO_ID: ontology IDs from HAO; ONTO_TERM: corresponding ontology terms (labels) from HAO; CHAR_DESCRIPTION: brief description of the phylogenetic characters from Porto et al. (2021);
(c) ‘HAO.obo’: the HAO ontology raw file in OBO format.
These files are necessary to run the functions from the ontobayes workflow.
1.2. ‘BEE_analyses.Rmd’: The R script to execute the ontobayes workflow with the BEE dataset. It produces the Table 1 (from Porto et al., 2022) and the Supplementary Table S1 and Figures S1 and S2 (see the PDF file: ‘SupplementaryMaterial.pdf’).
1.3. ‘BEE.RData’: The R saved data from the analyses.
2. FISH.zip: This zipped folder contains the R scripts and data files used in the analyses of the FISH dataset. The FISH dataset consists of 463 morphological phylogenetic characters coded for 10 species of characiform fishes (Actinopterygii: Ostariophysi: Characiformes) modified from [Dillman et al. (2016)](https://doi.org/10.1111/cla.12127. The phylogenetic characters were annotated with terms from the Uber-Anatomy Ontology (Uberon). The folder contains the following subfolders and files:
2.1. The subfolder ‘data’ with five files:
(a) ‘fishes.nex’: the phylogenetic character matrix modified from Dillman et al. (2016) in NEXUS format;
(b) ‘fish_data.csv’: a table with the ontology annotations to the phylogenetic characters; it contains four columns: ID: character IDs from Dillman et al. (2016); ONTO_ID: ontology IDs from Uberon; ONTO_TERM: corresponding ontology terms (labels) from Uberon; CHAR_DESCRIPTION: brief description of the phylogenetic characters from Dillman et al. (2016);
(c) ‘UBERON.obo’: the Uberon ontology raw file in OBO format.
(d) ‘fish_tree.nex’: the data and command blocks for MrBayes used in the Bayesian analyses to produce the reference species tree of the FISH dataset.
(e) ‘fish_tree.nex.con.tre’: the majority consensus tree from the Bayesian analysis of the FISH dataset.
These files are necessary to run the functions from the ontobayes workflow.
2.2. The subfolder ‘functions’ with the file ‘plotTree_barplot_mod.R’ with a modified wrapper function to plot Figure 4 (from Porto et al., 2022).
2.3. ‘FISH_analyses.Rmd’: The R script to execute the ontobayes workflow with the FISH dataset. It produces the Figures 4 and 5 (from Porto et al., 2022) and the Supplementary Figures S3-S7 (see the PDF file: ‘SupplementaryMaterial.pdf’).
2.4. ‘FISH.RData’: The R saved data from the analyses.
3. MINI.zip: This zipped folder contains the R scripts and data files used in the analyses of the MINI dataset. The MINI dataset consists of 453 morphological phylogenetic characters coded for 10 species of characiform fishes (Actinopterygii: Ostariophysi: Characiformes) modified from [Mirande (2019)](https://doi.org/10.1111/cla.12345. The phylogenetic characters were annotated with terms from the Uber-Anatomy Ontology (Uberon). The folder contains the following subfolders and files:
3.1. The subfolder ‘data’ containing four files:
(a) ‘mirande.nex’: the phylogenetic character matrix modified from Mirande (2019) in NEXUS format;
(b) ‘mirande.csv’: a table with the ontology annotations to the phylogenetic characters; it contains three columns: ID: character IDs from Mirande (2019); ONTO_ID: ontology IDs from Uberon; ONTO_TERM: corresponding ontology terms (labels) from Uberon;
(c) ‘mini_tree.nex’: the data and command blocks for MrBayes used in the Bayesian analyses to produce the reference species tree of the MINI dataset.
(d) ‘mini_tree.nex.con.tre’: the majority consensus tree from the Bayesian analysis of the MINI dataset.
These files are necessary to run the functions from the ontobayes workflow.
3.2. ‘MINI_analyses.Rmd’: The R script to execute the ontobayes workflow with the FISH dataset. It produces the Supplementary Figure S8 (see the PDF file: ‘SupplementaryMaterial.pdf’).
3.3. ‘MINI.RData’: The R saved data from the analyses.
4. RESAMPLING.zip: This zipped folder contains the R scripts and data files used in the resampling analyses of the FISH dataset. The FISH dataset is the same as described above (2). The folder contains the following subfolders and files:
4.1. The subfolder ‘data’ containing three files, the same files (a-c) as described in (2.1).
These files are necessary to run the functions from the ontobayes workflow.
4.2. ‘RESAMPLING_analyses.Rmd’: The R script to execute the resampling analyses with the FISH dataset. It produces the Figure 6 (from Porto et al., 2022) and the Supplementary Figures S9-S10 (see the PDF file: ‘SupplementaryMaterial.pdf’).
4.3. ‘RESAMPLING.RData’: The R saved data from the analyses.
Sharing/Access information
The methodology used to analyse this dataset is implemented in the R package ontobayes.