O'Leary, Maureen A.; Alphonse, Kenzley; Mariangeles, Arce H.; Cavaliere, Dario; Cirranello, Andrea; Dietterich, Thomas G.; Julius, Mattew; Kaufman, Seth; Law, Edith; Passarotti, Maria; Reft, Abigail; Robalino, Javier; Simmons, Nancy B.; Smith, Selena Y.; Stevenson, Dennis W.; Theriot, Ed; Velazco, Paúl M.; Walls, Ramona L.; Yu, Mengjie; Daly, Marymegan

Published May 26, 2017 on Dryad. https://doi.org/10.5061/dryad.766cp

Abstract

Scientists building the Tree of Life face an overwhelming challenge to categorize phenotypes (e.g., anatomy, physiology) from millions of living and fossil species. This biodiversity challenge far outstrips the capacities of trained scientific experts. Here we explore whether crowdsourcing can be used to collect matrix data on a large scale with the participation of the non-expert students, or “citizen scientists.” Crowdsourcing, or data collection by non-experts, frequently via the internet, has enabled scientists to tackle some large-scale data collection challenges too massive for individuals or scientific teams alone. The quality of work by non-expert crowds is, however, often questioned and little data has been collected on how such crowds perform on complex tasks such as phylogenetic character coding. We studied a crowd of over 600 non-experts, and found that they could use images to identify anatomical similarity (hypotheses of homology) with an average accuracy of 82% compared to scores provided by experts in the field. This performance pattern held across the Tree of Life, from protists to vertebrates. We introduce a procedure that predicts the difficulty of each character and that can be used to assign harder characters to experts and easier characters to a non-expert crowd for scoring. We test this procedure in a controlled experiment comparing crowd scores to those of experts and show that crowds can produce matrices with over 90% of cells scored correctly while reducing the number of cells to be scored by experts by 50%. Preparation time, including image collection and processing, for a crowdsourcing experiment is significant, and does not currently save time of scientific experts overall. However, if innovations in automation or robotics can reduce such effort, then large-scale implementation of our method could greatly increase the collective scientific knowledge of species phenotypes for phylogenetic tree building. For the field of crowdsourcing, we provide a rare study with ground truth, or an experimental control that many studies lack, and contribute new methods on how to coordinate the work of experts and non-experts. We show that there are important instances in which crowd consensus is not a good proxy for correctness.

Appendix 1 - List of six matrices developed for this study.

Online Appendix 1. List of six matrices developed for this study, including character names, types of characters, and assessment of character difficulty by experts. See also Morphobank (www.morphobank.org) projects P950, P2463, P2502, P2490, P2491, and P2577.

Appendix 1 Run3onlyVer5.xlsx

Appendix 2 characterDifficultyMP_3-15-16_MAH edited

Online Appendix 2. List of characters evaluated and their simplified anatomical descriptions used in the Evolution Project. The latter were presented to non-experts in the crowd.

Appendix 3 Anemones-character-taxon-results

Online Appendix 3. Sea anemones character scores. For each character and taxon in the anemones matrix, we show the probability (“Estimate”) that a crowd member’s score would agree with the majority vote of the crowd. We also show the lower confidence interval on this probability (ci.lower), which is the crowd confidence score. Finally, we indicate whether the majority vote was correct, and compute an ROC curve for the crowd’s scores. The Threshold Plot worksheet provides a visualization of this information.

Appendix 4 Anemones-user-results

Online Appendix 4. Sea anemones user scores. For each crowd member, we report the number of scores they provided and the number that were correct. The “Estimate” column is the probability that this crowd member voted correctly, and the “ci.lower” column gives the 95% lower confidence bound on this probability. These scores are for all characters (evaluation and test).

Appendix 5 Bats-character-taxon-results

Online Appendix 5. Bats character scores.

Appendix 6 Bats-users-results

Online Appendix 6. Bats user scores.

Appendix 7 Catfish-character-taxon-results

Online Appendix 7. Catfish character scores.

Appendix 8 Catfish-user-results

Online Appendix 8. Catfish user scores.

Appendix 9 Diatoms-character-taxon-results

Online Appendix 9. Diatom character scores

Appendix 10 Diatoms-user-results

Online Appendix 10. Diatom user scores

Appendix 11 Lilies-character-taxon-results

Online Appendix 11. Lilies character scores.

Appendix 12 Lilies-user-results

Online Appendix 12. Lilies user scores.

Appendix 13 Marine-Shrimp-character-taxon-results

Online Appendix 13. Marine shrimp character scores.

Appendix 14 Marine-Shrimp-user-results

Online Appendix 14. Marine shrimp user scores.

Appendix 15 all-users-results

line Appendix 15. Combined results for all users. Details for Figure 3.

Appendix 16 joint-predicted-difficulty

Online Appendix 16. Joint predicted difficulty. Predicted and Observed difficulty of each character based on a linear model fit to all of the character score data. Details for Figure 5.

Appendix 17 Diatoms-user-thresh-curve

Online Appendix 17. Results of the parameter tuning experiment on the Diatoms. Details for Figure 6.

Appendix 18 final-results-summary_MAH edited

Online Appendix 18. Final results. Details for Figure 3.

Appendix 19 Data and R Scripts_MAH modified

Online Appendix 19. R scripts for analysis as a zipped folder.

Appendix 20 Instructions to Crowd

Online Appendix 20. Instruction sheets for the study participants (undergraduate students at Ohio State University).

Data from: Crowds replicate performance of scientific experts scoring phylogenetic matrices of phenotypes

Data files

Abstract

Appendix 1 - List of six matrices developed for this study.

Appendix 2 characterDifficultyMP_3-15-16_MAH edited

Appendix 3 Anemones-character-taxon-results

Appendix 4 Anemones-user-results

Appendix 5 Bats-character-taxon-results

Appendix 6 Bats-users-results

Appendix 7 Catfish-character-taxon-results

Appendix 8 Catfish-user-results

Appendix 9 Diatoms-character-taxon-results

Appendix 10 Diatoms-user-results

Appendix 11 Lilies-character-taxon-results

Appendix 12 Lilies-user-results

Appendix 13 Marine-Shrimp-character-taxon-results

Appendix 14 Marine-Shrimp-user-results

Appendix 15 all-users-results

Appendix 16 joint-predicted-difficulty

Appendix 17 Diatoms-user-thresh-curve

Appendix 18 final-results-summary_MAH edited

Appendix 19 Data and R Scripts_MAH modified

Appendix 20 Instructions to Crowd

Data from: Crowds replicate performance of scientific experts scoring phylogenetic matrices of phenotypes

Data files

Abstract

Usage notes

Appendix 1 - List of six matrices developed for this study.

Appendix 2 characterDifficultyMP_3-15-16_MAH edited

Appendix 3 Anemones-character-taxon-results

Appendix 4 Anemones-user-results

Appendix 5 Bats-character-taxon-results

Appendix 6 Bats-users-results

Appendix 7 Catfish-character-taxon-results

Appendix 8 Catfish-user-results

Appendix 9 Diatoms-character-taxon-results

Appendix 10 Diatoms-user-results

Appendix 11 Lilies-character-taxon-results

Appendix 12 Lilies-user-results

Appendix 13 Marine-Shrimp-character-taxon-results

Appendix 14 Marine-Shrimp-user-results

Appendix 15 all-users-results

Appendix 16 joint-predicted-difficulty

Appendix 17 Diatoms-user-thresh-curve

Appendix 18 final-results-summary_MAH edited

Appendix 19 Data and R Scripts_MAH modified

Appendix 20 Instructions to Crowd

Works referencing this dataset