README file for Dryad package This file describes the contents of the supplementary data package for Stotlzfus, et al, Sharing Phylogenetic Trees. The package includes this README file, a PDF file with user stories, and 4 spreadsheets (for 3 literature samples plus 1 quick analysis of Dryad content): * LitSample1_Apr2011_AmJBot_Evol.csv - all pubs from 2 April issues * LitSample2_40RecentPhylogenInDepth.csv - sample of 40 recent phylogen* pubs * LitSample3_100RandomPhylogen2010.csv - random sample of 2010 phylogen* pubs * ArchiveSample_AllDryad_2010_Phylogen.csv * UserStories_BarriersToReUse.pdf - user stories and taxonomy of barriers * README = this file This README file is Unicode (UTF-8) with Unix/Linux file endings. The .csv files are also in Unicode (UTF-8), with field delimiter symbol , (comma) and text delimiter symbol " (double quote mark) Below is the brief description of methodology. Please ask the authors if you have questions. 1. LitSample1. To get a sense of current practices, AS and BO picked 2 journals, Evolution and Am J Bot, and looked at every one of the 32 regular articles in the April 2011 issues. Evolution is the premier trade journal for organismal evolutionary biologists. American Journal of Botany is a frequent venue for phylogenetic systematics. The spreadsheet entitled "LitSample1_Apr2011_AmJBot_Evol.csv" is a record of our superficial comments on these articles. 2. LitSample2. We searched Web of Science (WoS) in May of 2011 for articles matching 'phylogen*' in title or 'topic'. WoS sorted the results by 'relevance', and we picked 40 articles from the top of the list. We deliberately chose this approach to focus on articles likely to focus on phylogeny, rather than to mention it peripherally. However, because we do not know exactly what 'topic' and 'relevance' mean in this case (and WoS does not make its methodology clear to users), we cannot be certain what kind of a sample this represents. Of the 40 articles, 38 report new trees, considerably more than the 27/40 expected by chance for an article that matches 'phylogen*' anywhere (see below). The file "LitSample2_40RecentPhylogenInDepth.csv" contains extensive notes on the 40 articles. This spreadsheet was populated by an online fillable form that is available from the authors on request (in case any reader would like to analyze their own literature sample). 3. LitSample3. The sole purpose of this survey was to estimate the frequency of reports of new trees among 2010 publications. We first searched Web of Science for 2010 papers that matched 'phylogen*' in any field. Many of the 11,664 matching publications might be false positives, i.e., papers that refer to 'phylogen*' in some way, but do not report a new tree. To estimate this fraction, we picked 100 papers at random. Each paper was assigned to BO, AS or RM for individual evaluation, with the result that 66 of the 100 papers reported a new tree. The file "LitSample3_100RandomPhylogen2010.csv" contains results of the analysis of the sample of 100 publications. There is not much in this spreadsheet other than a determination of whether it has a new tree or not. This spreadsheet was populated by an online fillable form that is available from the authors on request (in case any reader would like to analyze their own literature sample). We also considered false negatives due to papers that report a new phylogeny, but avoid the term 'phylogen*', using instead some term such as 'dendrogram', 'cladogram' or 'tree'. Because 'tree' has many non-phylogenetic uses, we used a restricted search methodology based on other terms associated with phylogenies, such as 'SSU' or 'cytb' and so on. By comparing matches to 'SSU + tree -phylogeny' to those for 'SSU + phylogeny', we can estimate how often authors use 'tree' as a synonym while avoiding 'phylogeny'. We got only about 1/100 as many hits, and many of these referred to "trees" that were not phylogenetic trees. Thus, the results suggest that phylogeny synonyms would increase the yield by less than 1 %. We did not estimate false negatives due to poor indexing, or non-indexing, in Web of Science. Web of Science may contain information on articles that are indexed very incompletely, e.g., articles for which only the citation information is available, without keywords or abstract. A poorly indexed article that reports a phylogeny will only be found if 'phylogen*' appears in the title. We also did not estimate the number of false negatives due to phylogeny reports that are not indexed at all in Web of Science. It is difficult to see how this could be done. However, one way to do it would be to take a very carefully researched review article, e.g., on phylogeny of major reptile groups, and then assess what fraction of cited phylogeny articles can be found in WoS. Apropos, TimeTree has nearly a thousand articles in its database, and a substantial fraction are not indexed in PubMed. 4. Archive sample. All TreeBASE entries have trees, but not all Dryad packages for phylogeny papers have decodable (i.e., not graphic) trees. Using the Dryad search interface in August, 2011, AS found 32 entries for 2010 studies in Dryad that match "phylogen". In this group, AS found one server error: http://datadryad.org/handle/10255/dryad.1786 Among the remainder, there were 24 packages without any phylogeny in decodable form, and 7 packages with one or more phylogenies in decodable form. Note that most of the NEXUS files do not have trees, and that there are trees in non-NEXUS formats, e.g., some are just Newick strings in text files (e.g.,http://datadryad.org/handle/10255/dryad.1965). The file "ArchiveSample_AllDryad_2010_Phylogen.csv" is a spreadsheet with the results of this very brief analysis. 5. User stories. We gathered and analyzed stories of phylogeny use & re-use, based on our own experiences, and those of colleagues who are sharing this information as a personal communication. This material provides a basis for many aspects of the barriers to re-use taxonomy in the text, and for individual comments about problems that users experience, such as inconsistent names, re-doing analyses, etc.