Data from: Improper data practices erode the quality of global ecological databases and impede the progress of ecological research
Data files
Jan 02, 2024 version files 4.32 MB
-
README.md
404 B
-
TRY_Data_29.12.2023.xlsx
4.32 MB
Abstract
The scientific community has entered an era of big data. However, with big data comes big responsibilities, and best practices for how data are contributed to databases have not kept pace with the collection, aggregation, and analysis of big data. Here, we rigorously assess the quantity of data for specific leaf area (SLA) available within the largest and most frequently used global plant trait database, the TRY Plant Trait Database, exploring how much of the data were applicable (i.e., original, representative, logical, and comparable) and traceable (i.e., published, cited, and consistent). Over three-quarters of the SLA data in TRY either lacked applicability or traceability, leaving only 22.9% of the original data usable compared to the 64.9% typically deemed usable by standard data cleaning protocols. The remaining usable data differed markedly from the original for many species, which led to altered interpretation of ecological analyses. Though the data we consider here make up only 4.5% of SLA data within TRY, similar issues of applicability and traceability likely apply to SLA data for other species as well as other commonly measured, uploaded, and downloaded plant traits. We end with suggested steps forward for global ecological databases, including suggestions for both uploaders to and curators of databases with the hope that, through addressing the issues raised here, we can increase data quality and integrity within the ecological community.
README
There are two additional documents associated with this publication. One is a word document that includes a description of each of the 120 datasets that contained SLA data for the four plant groups within the study (conifers, Plantago, Poa, and Quercus). The second is an excel document that contains the SLA data that was downloaded from TRY and all associated metadata.
Missing data codes: NA and N/A
Methods
SLA data was downlaoded from TRY (traits 3115, 3116, and 3117) for all conifer (Araucariaceae, Cupressaceae, Pinaceae, Podocarpaceae, Sciadopityaceae, and Taxaceae), Plantago, Poa, and Quercus species. The data has not been processed in any way, but additional columns have been added to the datset that provide the viewer with information about where each data point came from, how it was cited, how it was measured, whether it was uploaded correctly, whether it had already been uploaded to TRY, and whether it was uploaded by the individual who collected the data.