Data from: Data reuse and the open data citation advantage

 

Files in this package

Content in the Dryad Digital Repository is offered "as is." By downloading files, you agree to the Dryad Terms of Service. To the extent possible under law, the authors have waived all copyright and related or neighboring rights to this data. CC0 (opens a new window) Open Data (opens a new window)

Title manuscript source file (knitr)
Downloaded 46 times
Description The source file for running the statistics and inserting the results into the manuscript text. The manuscript does not include final wording revisions made in the peer review process. See the "about this doc" section for instructions on running this R knitr markdown file to produce stats.md and stats.R files.
Download stats_knit_.md (76.26 Kb)
Details View File Details
Title manuscript compiled file, with stats
Downloaded 54 times
Description This file is the result of running stats_knitr_.md through knitr. It does not include changes made during the peer review process.
Download stats.md (87.40 Kb)
Details View File Details
Title citations
Downloaded 22 times
Description Citations used by stats_knitr_.md to build the references list.
Download citation11k.bib (60.39 Kb)
Details View File Details
Title helpers.r
Downloaded 56 times
Description Helper R functions used by stats_knitr_.md to run the statistics.
Download helpers.R (5.328 Kb)
Details View File Details
Title preprocess_raw_data.r
Downloaded 23 times
Description Helper R functions used by stats_knitr_.md to preprocess data before running the statistics.
Download preprocess_raw_data.R (16.23 Kb)
Details View File Details
Title pubmed_gse_count.csv
Downloaded 20 times
Description The number of GSE data sets added to the NCBI's GEO repository each year, 2000-2011.
Download pubmed_gse_count.csv (141 bytes)
Details View File Details
Title pubmed_pmc_ratios.csv
Downloaded 12 times
Description The fraction of PubMed in PMC that are indexed with the MeSH term “gene expression profiling”, by year of publication, 2000-2011, as measured in 2012.
Download pubmed_pmc_ratios.csv (286 bytes)
Details View File Details
Title PLoSONE2011_rawdata.txt
Downloaded 255 times
Description Data from Piwowar HA (2011) Data from: Who shares? Who doesn’t? Factors associated with openly archiving raw research data. Dryad Digital Repository. doi:10.5061/dryad.mf1sd. Reproduced here to make it easy to rerun scripts.
Download PLoSONE2011_rawdata.txt (7.639 Mb)
Details View File Details
Title scopus_all.csv
Downloaded 29 times
Description Scopus citation data for publications in the cohort.
Download scopus_all.csv (148.5 Mb)
Details View File Details
Title GEO_dataset_attributes.csv
Downloaded 30 times
Description One row for every GEO dataset reuse detected by searching PMC for GEO accession numbers. Columns list GEO accession, gse number, gds number, related submit_pmids, identified reuse_pmcid, the reuse_pmids_for_pmc, the submission authors, the reuse authors, whether the sumission authors and the reuse authors overlap, the submission affiliation, the release date, columns to detect reuse and data creation keywords, excerpts around the accession numbers when available (the reuse paper was OA), reuse journal, year, and date_published, the medline_status, whether it was listed on the NCBI GEO reuse webpage, whether the reuse is OA, and whether it is listed as a metaanalysis by MEDLINE.
Download GEO_dataset_attributes.csv (16.45 Mb)
Details View File Details
Title Mendeley_annotated_250_of_11k.csv
Downloaded 62 times
Description Manual annotation of a random 250 papers from the 10,555 papers in the study. Manual examination was to determine whether the study did indeed generate gene expression microarray data. Rows with "created-microarray-data" were identified as generating microarray data in the manual review; "created-microarray-data-not" were identified as not actually generating gene expression microarray data despite being identified as such by our automated filter.
Download Mendeley_annotated_250_of_11k.csv (702.7 Kb)
Details View File Details
Title tracking1k_20111008.csv
Downloaded 14 times
Description Manually annotated instances of citation context to papers that created publicly available datasets. This study explores the subset of the dataset related to GEO data: "dataset reused" means the citation context was determined to be in the context of data reuse.
Download tracking1k_20111008.csv (1.874 Mb)
Details View File Details

When using this data, please cite the original publication:

Piwowar HA, Vision TJ (2013) Data reuse and the open data citation advantage. PeerJ 1: e175. http://dx.doi.org/10.7717/peerj.175

Additionally, please cite the Dryad data package:

Piwowar HA, Vision TJ (2013) Data from: Data reuse and the open data citation advantage. Dryad Digital Repository. http://dx.doi.org/10.5061/dryad.781pv
Cite | Share
Download the data package citation in the following formats:
   RIS (compatible with EndNote, Reference Manager, ProCite, RefWorks)
   BibTex (compatible with BibDesk, LaTeX)

Search for data

Be part of Dryad

We encourage organizations to: