Dryad Home > Main > Dryad Data Packages > View Item

Data from: Who shares? Who doesn’t? Factors associated with openly archiving raw research data

When using this data, please cite the original article:

Piwowar HA (2011) Who shares? Who doesn’t? Factors associated with openly archiving raw research data. PLoS ONE 6(7): e18657. doi:10.1371/journal.pone.0018657

Additionally, please cite the Dryad data package:

Piwowar HA (2011) Data from: Who shares? Who doesn’t? Factors associated with openly archiving raw research data. Dryad Digital Repository. doi:10.5061/dryad.mf1sd
Cite | Share
Download the data package citation in the following formats:
   RIS (compatible with EndNote, Reference Manager, ProCite, RefWorks)
   BibTex (compatible with BibDesk, LaTeX)

Dryad Package Identifier doi:10.5061/dryad.mf1sd    837 views  
Abstract Many initiatives encourage investigators to share their raw datasets in hopes of increasing research efficiency and quality. Despite these investments of time and money, we do not have a firm grasp of who openly shares raw research data, who doesn’t, and which initiatives are correlated with high rates of data sharing. In this analysis I use bibliometric methods to identify patterns in the frequency with which investigators openly archive their raw gene expression microarray datasets after study publication. Automated methods identified 11,603 articles published between 2000 and 2009 that describe the creation of gene expression microarray data. Associated datasets in best-practice repositories were found for 25% of these articles, increasing from less than 5% in 2001 to 30%-35% in 2007-2009. Accounting for sensitivity of the automated methods, approximately 45% of recent gene expression studies made their data publicly available. First-order factor analysis on 124 diverse bibliometric attributes of the data creation articles revealed 15 factors describing authorship, funding, institution, publication, and domain environments. In multivariate regression, authors were most likely to share data if they had prior experience sharing or reusing data, if their study was published in an open access journal or a journal with a relatively strong data sharing policy, or if the study was funded by a large number of NIH grants. Authors of studies on cancer and human subjects were least likely to make their datasets available. These results suggest research data sharing levels are still low and increasing only slowly, and data is least available in areas where it could make the biggest impact. Let’s learn from those with high rates of sharing to embrace the full potential of our research output.
Keywords data sharing, bibliometrics, gene expression microarray, science communication, policy,
Date Deposited 2011-05-26T16:28:03Z
Show Full Metadata

Microarray publications and publication attributes    69 views   178 downloads View File Details
157 columns of attributes for 11603 publications identified as creating gene expression microarray data. Tab delimited. Key: PubMed identifier (pmid). See stats.R for data cleaning steps and more details on variables. Data collected in January 2010 using code available at http://github.com/hpiwowar/pypub
Download: rawdata.txt ( 7.639Mb )
To the extent possible under law, the authors have waived all copyright and related or neighboring rights to this data.  



Journal policy details for microarray data    67 views   47 downloads View File Details
Data sharing policy details for journals that publish a lot of gene expression microarray data. Policy links, excerpts, and classifications (24 columns) for 156 journals. Some of these classifications are included as columns in rawdata.txt as journal policy attributes.
Download: journal_policies_microarray_data.csv ( 345.8Kb )
To the extent possible under law, the authors have waived all copyright and related or neighboring rights to this data.  



Statistical analysis R script    77 views   49 downloads View File Details
R script for data cleaning, statistical analysis, and graphics as presented in the paper. Takes rawdata.txt as input and loads helper_functions.R source.
Download: stats.R ( 54.34Kb )
To the extent possible under law, the authors have waived all copyright and related or neighboring rights to this data.  



Helper R script functions    48 views   37 downloads View File Details
Helper functions loaded by stats.R for analysis and graphical output.
Download: helper_functions.R ( 39.70Kb )
To the extent possible under law, the authors have waived all copyright and related or neighboring rights to this data.  


My Account

Browse

Information