Data from: Using supermatrices for phylogenetic inquiry: an example using the sedges
Data files
Nov 15, 2012 version files 72.92 MB
-
Appendix_I.pdf
37.14 KB
-
Appendix_II.pdf
872.17 KB
-
cyp_states.csv
113.35 KB
-
filter_fasta.py
1.69 KB
-
instability_multicore.py
7.83 KB
-
makesamplingmatrix.py
2.95 KB
-
phase1_at_nf.phy.reduced
30.58 MB
-
phase2_at_rf.phy.reduced
26.56 MB
-
phase3_sc_nf.phy.reduced
7.77 MB
-
phase4_sc_rf.phy.reduced
6.98 MB
Nov 15, 2012 version files 73.25 MB
-
Appendix_I.pdf
37.14 KB
-
Appendix_II.pdf
872.17 KB
-
cyp_states.csv
113.35 KB
-
filter_fasta.py
1.69 KB
-
instability_multicore.py
7.83 KB
-
makesamplingmatrix.py
2.95 KB
-
phase_1_contree_all_taxa_all_genes_all_tips.tre
107.58 KB
-
phase_2_contree_all_genes_stable_tips.tre
100.41 KB
-
phase_2.5_contree_has_ndhf_or_rbcl_all_tips.tre
37.15 KB
-
phase_3_contree_has_ndhf_or_rbcl_no_bad_taxa.tre
37.31 KB
-
phase_4_contree_has_ndhf_or_rbcl_stable_tips.tre
47.36 KB
-
phase1_at_nf.phy.reduced
30.58 MB
-
phase2_at_rf.phy.reduced
26.56 MB
-
phase3_sc_nf.phy.reduced
7.77 MB
-
phase4_sc_rf.phy.reduced
6.98 MB
Abstract
In this article, we use supermatrix data-mining methods to reconstruct a large, highly inclusive phylogeny of Cyperaceae from nucleotide data available on GenBank. We explore the properties of these trees and their utility for phylogenetic inference, and show that even the highly incomplete alignments characteristic of supermatrix approaches may yield very good estimates of phylogeny. We present a novel pipeline for filtering sparse alignments to improve their phylogenetic utility by maximizing the partial decisiveness of the matrices themselves through a technique we call “phylogenetic scaffolding,” and we present a new method of scoring tip instability (i.e. “rogue taxa”) based on the I statistic implemented in the software Mesquite. The modified statistic, which we call IS, is somewhat more straightforward to interpret than similar statistics, and our implementation of it may be applied to large sets of large trees. The largest sedge trees presented here contain more than 1500 tips (about one quarter of all sedge species) and are based on multigene alignments with more than 20 000 sites and more than 90% missing data. These trees match well with previously supported phylogenetic hypotheses, but have lower overall support values and less resolution than more heavily filtered trees. Our best-resolved trees are characterized by stronger support values than any previously published sedge phylogenies, and show some relationships that are incongruous with previous studies. Overall, we show that supermatrix methods offer powerful means of pursuing phylogenetic study and these tools have high potential value for many systematic biologists.