| dc.contributor.author | Hinchliff, Cody E. | |
| dc.contributor.author | Roalson, Eric H. | |
| dc.coverage.spatial | North America | |
| dc.coverage.spatial | South America | |
| dc.coverage.spatial | Eurasia | |
| dc.coverage.spatial | Oceania | |
| dc.coverage.spatial | Africa | |
| dc.coverage.temporal | Miocene | |
| dc.coverage.temporal | Paleocene | |
| dc.coverage.temporal | Eocene | |
| dc.coverage.temporal | Oligocene | |
| dc.coverage.temporal | Pleistocene | |
| dc.coverage.temporal | Holocene | |
| dc.coverage.temporal | Pliocene | |
| dc.coverage.temporal | Cretaceous | |
| dc.date.accessioned | 2012-11-15T18:36:03Z | |
| dc.date.available | 2012-11-15T18:36:03Z | |
| dc.date.issued | 2012-10-26 | |
| dc.identifier | doi:10.5061/dryad.6p76c3pb | |
| dc.identifier.citation | Hinchliff CE, Roalson EH (2012) Using supermatrices for phylogenetic inquiry: an example using the sedges. Systematic Biology 62(2): 205-219. | |
| dc.identifier.uri | http://hdl.handle.net/10255/dryad.38181 | |
| dc.description | In this article, we use supermatrix data-mining methods to reconstruct a large, highly inclusive phylogeny of Cyperaceae from nucleotide data available on GenBank. We explore the properties of these trees and their utility for phylogenetic inference, and show that even the highly incomplete alignments characteristic of supermatrix approaches may yield very good estimates of phylogeny. We present a novel pipeline for filtering sparse alignments to improve their phylogenetic utility by maximizing the partial decisiveness of the matrices themselves through a technique we call “phylogenetic scaffolding,” and we present a new method of scoring tip instability (i.e. “rogue taxa”) based on the I statistic implemented in the software Mesquite. The modified statistic, which we call IS, is somewhat more straightforward to interpret than similar statistics, and our implementation of it may be applied to large sets of large trees. The largest sedge trees presented here contain more than 1500 tips (about one quarter of all sedge species) and are based on multigene alignments with more than 20 000 sites and more than 90% missing data. These trees match well with previously supported phylogenetic hypotheses, but have lower overall support values and less resolution than more heavily filtered trees. Our best-resolved trees are characterized by stronger support values than any previously published sedge phylogenies, and show some relationships that are incongruous with previous studies. Overall, we show that supermatrix methods offer powerful means of pursuing phylogenetic study and these tools have high potential value for many systematic biologists. | |
| dc.relation.haspart | doi:10.5061/dryad.6p76c3pb/1 | |
| dc.relation.haspart | doi:10.5061/dryad.6p76c3pb/2 | |
| dc.relation.haspart | doi:10.5061/dryad.6p76c3pb/3 | |
| dc.relation.haspart | doi:10.5061/dryad.6p76c3pb/4 | |
| dc.relation.haspart | doi:10.5061/dryad.6p76c3pb/5 | |
| dc.relation.haspart | doi:10.5061/dryad.6p76c3pb/6 | |
| dc.relation.haspart | doi:10.5061/dryad.6p76c3pb/7 | |
| dc.relation.haspart | doi:10.5061/dryad.6p76c3pb/8 | |
| dc.relation.haspart | doi:10.5061/dryad.6p76c3pb/9 | |
| dc.relation.haspart | doi:10.5061/dryad.6p76c3pb/10 | |
| dc.relation.isreferencedby | doi:10.1093/sysbio/sys088 | |
| dc.relation.isreferencedby | PMID:23103590 | |
| dc.subject | phlawd | |
| dc.subject | mega-phylogeny | |
| dc.subject | supermatrix | |
| dc.subject | decisiveness | |
| dc.subject | BiSSE | |
| dc.subject | diversitree | |
| dc.subject | latitudinal diversity gradient | |
| dc.title | Data from: Using supermatrices for phylogenetic inquiry: an example using the sedges | |
| dc.type | Article | * |
| dwc.ScientificName | Cyperaceae | |
| dc.contributor.correspondingAuthor | Hinchliff, Cody E. | |
| prism.publicationName | Systematic Biology |
To the extent possible under law, the authors
have waived all copyright and related or neighboring rights to this data.
| Title | phase1_at_nf.phy |
|---|---|
| Downloaded | 17462 times |
| Description | Nucleotide data for Cyperaceae species from various markers, collected from GenBank using the software tool "phlawd". This alignment corresponds to the fully unfiltered alignment--labeled AT/NF in our manuscript. |
| Download | phase1_at_nf.phy.reduced (30.57Mb) View File Details |
| Title | phase2_at_rf.phy |
|---|---|
| Downloaded | 66 times |
| Description | Nucleotide data for Cyperaceae species from various markers, collected from GenBank using the software tool "phlawd". This alignment corresponds to the unscaffolded, rogues-filtered alignment--labeled AT/RF in our manuscript. |
| Download | phase2_at_rf.phy.reduced (26.56Mb) View File Details |
| Title | phase3_sc_nf.phy |
|---|---|
| Downloaded | 76 times |
| Description | Nucleotide data for Cyperaceae species from various markers, collected from GenBank using the software tool "phlawd". This alignment corresponds to the scaffolded alignment with rogues unfiltered--labeled SC/NF in our manuscript. |
| Download | phase3_sc_nf.phy.reduced (7.766Mb) View File Details |
| Title | phase4_sc_rf.phy |
|---|---|
| Downloaded | 75 times |
| Description | Nucleotide data for Cyperaceae species from various markers, collected from GenBank using the software tool "phlawd". This alignment corresponds to the maximally filtered alignment: scaffolded and having had rogues removed--labeled SC/RF in our manuscript. |
| Download | phase4_sc_rf.phy.reduced (6.976Mb) View File Details |
| Title | cyp_states |
|---|---|
| Downloaded | 59 times |
| Description | Latitudinal range data for all currently recognized species of Cyperaceae from Govaerts et al. (2007), World Checklist of Cyperaceae. Range data are encoded as tropical (state 1) or extratropical (state 0), and represent the position of the latitudinal midpoint of each species range, as estimated based on the geographic distribution data encoded within the World Checklist of Cyperaceae referenced above. |
| Download | cyp_states.csv (113.3Kb) View File Details |
| Title | Appendix_I |
|---|---|
| Downloaded | 67 times |
| Description | Formulas for summary statistics used in the creation of Figure 1 from the text. |
| Download | Appendix_I.pdf (37.14Kb) View File Details |
| Title | Appendix_II |
|---|---|
| Downloaded | 106 times |
| Description | ML bootstrap majority rule consensus tree topologies from 300-replicate RAxML bootstrap searches using alignments 1-3 described in the text. Branch labels are bootstrap proportions. |
| Download | Appendix_II.pdf (872.1Kb) View File Details |
| Title | filter_fasta.py |
|---|---|
| Downloaded | 101 times |
| Description | Usage: ./filter_fasta.py [path to input dir] [path to accepted taxon list]. Input files are expected to be in fasta format. The script will traverse all files in the input dir, so the input dir should contain only fasta files. The taxon list should be a line-delimited text file containing the names of tips as they correspond to those in the fasta alignments. |
| Download | filter_fasta.py (1.686Kb) View File Details |
| Title | instability_multicore.py |
|---|---|
| Downloaded | 59 times |
| Description | This script will calculate I^s scores, as described in (Hinchliff, C. E. and E. H. Roalson. 2012. Using supermatrices for phylogenetic inquiry: an example using the sedges. Systematic Biology). It requires a set of trees sharing a common set of tips, to be input as a newick file (though any format readable by dendropy should be trivial to use, just change the format in the appropriate line). It outputs a comma-delimited table containing the raw instability scores (the numerator from the right side of the equation in the referenced paper), as well as the scaled I^s scores. Taxa that move more have higher scores. |
| Download | instability_multicore.py (7.832Kb) View File Details |
| Title | makesamplingmatrix |
|---|---|
| Downloaded | 59 times |
| Description | This script accesses a directory, and traverses all FASTA files in it, recording the names of all taxa present in each file. Then it creates a tab-delimited file containing a matrix where the rows represent the taxa and the columns the FASTA files. The intended use is for a directory containing a set of FASTA files each corresponding to a single locus, and containing homologous sequences of that locus for different taxa. The script will record a 1 in the resulting matrix if a taxon is present in a locus file, or a 0 if not. Key point: the script does not intelligently differentiate FASTA files from other types, and it will attempt to parse any file in the directory. For this reason, you should remove all other files before you run the script. It will create (or overwrite!) a file in the passed directory called 'sampling_matrix.txt' that may be opened in any conventional spreadsheet or text-editor app. This file should be in the proper format for use in the Decisivator application. This script requires BioPython to be installed. |
| Download | makesamplingmatrix.py (2.951Kb) View File Details |
Learn more about: