Data synthesis required for large-scale macroevolutionary studies is challenging with the current tools available for integration. Using a classic question regarding the frequency of paired fin loss in teleost fishes as a case study, we sought to create automated methods to facilitate the integration of broad-scale trait data with a sizable species-level phylogeny. Similar to the evolutionary pattern previously described for limbs, pelvic and pectoral fin reduction and loss are thought to have occurred independently multiple times in the evolution of fishes. We developed a bioinformatics pipeline to identify the presence and absence of pectoral and pelvic fins of 12,582 species. To do this, we integrated a synthetic morphological supermatrix of phenotypic data for the pectoral and pelvic fins for teleost fishes from the Phenoscape Knowledgebase (two presence/absence characters for 3,047 taxa) with a species-level tree for teleost fishes from the Open Tree of Life project (38,419 species). The integration method detailed herein harnessed a new combined approach by utilizing data based on ontological inference, as well as phylogenetic propagation, to reduce overall data loss. Using inference enabled by ontology-based annotations, missing data were reduced from 98.0% to 85.9%, and further reduced to 34.8% by phylogenetic data propagation. These methods allowed us to extend the data to an additional 11,293 species for a total of 12,582 species with trait data. The pectoral fin appears to have been independently lost in a minimum of 19 lineages and the pelvic fin in 48. Though interpretation is limited by lack of phylogenetic resolution at the species level, it appears that following loss, both pectoral and pelvic fins were regained several (3) to many (14) times respectively. Focused investigation into putative regains of the pectoral fin, all within one clade (Anguilliformes), showed that the pectoral fin was regained at least twice following loss. Overall, this study points to specific teleost clades where strategic phylogenetic resolution and genetic investigation will be necessary to understand the pattern and frequency of pectoral fin reversals.
Supplementary Materials File 1
Teleostei species-level tree from Open Tree. Tree description file (Newick) from Open Tree synthesis used as the input into Mesquite.
Supplementary Materials File 2
Output files of synthesis tree build from Open Tree. Archive files of the output directory for each step in the synthesis tree build.
Supplementary Materials File 3
Merged tree matrix. NEXUS-formatted translation of the final output matrix (Supplementary Materials Matrix 4) merged with the Teleostei species-level tree from Open Tree (Supplementary Materials File 1). This combined matrix and tree file was used for ancestral state reconstruction in Mesquite.
Supplementary Materials File 4
Anguilliformes species-level trees based on subset from Teleostei species-level tree from Open Tree (Supplementary Materials File 1) merged with the final output matrix (Supplementary Materials Matrix 4). File includes 1,000 trees each with randomly resolved polytomies performed using the APE package in R.
Supplementary Materials Matrix 1
OntoTrace generated NeXML synthetic morphological supermatrix for pectoral fin and pelvic fin presence and absence, including metadata for supporting states and publication information.
Supplementary Materials Matrix 2
The tab-delimited character matrix generated after pre-processing the OntoTrace matrix (Supplementary Materials Matrix 1) by converting from NeXML format.
Supplementary Materials Matrix 3
Resulting matrix after the propagation step. The taxon names are based on the VTO, and this matrix is the input for the taxon name reconciliation step.
Supplementary Materials Matrix 4
Final output matrix of the pipeline. This tab-delimited file was read into Mesquite v3.10 for mapping onto the Teleostei species-level tree (Supplementary Materials File 1).
Supplementary Materials Table 1
List of 87 publications used in constructing the synthetic supermatrix, including the number of taxa, pectoral and pelvic fin characters, and states. Studies that were specifically curated for the purpose of more fully representing the distribution of pelvic and pectoral fin conditions across teleosts are denoted by an asterisk.
Supplementary Materials Table 2
Comparison of valid teleost species between FishBase (CoF), and Open Tree.
Supplementary Materials Table 3
Comparison of teleost families between the Vertebrate Taxonomy Ontology (VTO), Open Tree, and Catalog of Fishes (CoF). Dash indicates family name that is not recognized within a particular source.
Supplementary Materials Table 4
Statistics for reconciliation between Vertebrate Taxonomy Ontology (VTO) and Open Tree taxa. This file lists the species with data that were mismatched during the reconciliation step, and they are separated based on the reason for the mismatch (due to species being extinct, unconventional naming, etc.).
Supplementary Materials Table 5
List of VTO teleost families that show pectoral fin absence, pelvic fin absence, or the absence of both paired fins. Families with pelvic fin absence were compared to previously documented families (Nelson 1990), and details given in footnotes.
Supplementary Materials Table 6
Ancestral state reconstruction across Anguilliformes Open Tree phylogeny (Supplementary Materials File 4) for pectoral fin gain and loss (Fig. 7).