Skip to main content
Dryad

COVID information commons archive

Cite this dataset

Hudson, Florence et al. (2024). COVID information commons archive [Dataset]. Dryad. https://doi.org/10.5061/dryad.37pvmcvqp

Abstract

The COVID Information Commons (CIC) is an open website portal and community to facilitate knowledge-sharing and collaboration across various COVID research efforts, funded by the NSF Convergence Accelerator and the  NSF Technology, Innovation and Partnerships Directorate. The CIC serves as an open resource for researchers, students, and decision-makers from academia, government, not-for-profits and industry to identify collaboration opportunities, to leverage each other's research findings, and to accelerate the most promising research to mitigate the broad societal impacts of the COVID-19 pandemic.

The CIC was developed as a collaborative proposal led by the Northeast Big Data Innovation Hub, hosted by Columbia University, in collaboration with the Midwest Big Data Innovation HubSouth Big Data Innovation Hub, and West Big Data Innovation Hub.  It was funded by the NSF Convergence Accelerator (NSF #2028999) in May  2020 and launched in July 2020.  The initial focus of the CIC website was on the 723 NSF-funded COVID Rapid Response Research (RAPID) projects funded in 2020. The CIC-E: COVID Information Commons Extension for Pandemic Recovery project was proposed and funded in 2021 (NSF #2139391) by the CIC project team with the goal to increase researcher collaboration across NSF and NIH awardees and with global collaborators, as we continue to combat the novel coronavirus, and glean learnings for future uses of innovations developed for COVID response and recovery, including potential insights which can be leveraged for future pandemics.

The CIC extension launched on June 30, 2022 increasing the corpus of awards from just NSF to include NIH-funded COVID related awards, both present and past, through all funding vehicles, in pertinent areas of COVID research, response and recovery. The CIC-extension provides more opportunity for multi-agency and multidisciplinary research collaboration as all the Principal Investigators (PIs) for awards in the CIC are invited to present their research and collaborate on CIC Research Lighting Talk Webinars and Collaboration Sessions.

README: COVID Information Commons Archive

https://doi.org/10.5061/dryad.37pvmcvqp

This archive is a snapshot of the COVID Information Commons (CIC). The CIC is a live database that records information about COVID-19 researchers and their projects.

Description of the data and file structure

The snapshot of the CIC contains the following files, each listed with a description of the fields it contains:

cic_people_export.json -- Researchers who have studied aspects of COVID-19. All information known about the researchers in CIC, except email addresses, which have been filtered out for privacy purposes. Some researchers have minimal information, as CIC may only know their name via a reference in a grant description. Other people have more complete records, if they have provided additional information to the CIC.

  • affiliations -- organizational affiliations of the researcher (as described for cic_orgs_export.json)
  • first_name -- researcher's first name
  • last_name -- researcher's last name
  • orcid -- researchers identifier in the ORCID identifier system
  • keywords -- subject keywords that are applicable to the researcher's focus
  • website -- URLs for sites associated with the researcher's work
  • comments -- clarifying comments that the researcher has submitted about their work, including preferred methods of contact

cic_orgs_export.json -- Organization names, along with their location and ROR identifiers when available.

  • ror -- identifier of the organization in the Research Organization Registry
  • name -- name of the organization
  • state -- US State of the organization
  • country -- home country of the organization

cic_grants_export.json -- Grant objects that have been harvested from their respective funding agencies and augmented with other information gathered by CIC.

  • award_id -- identifier for this grant as given by the funding agency
  • title -- title of the grant
  • funder -- funding agency (includes the funding agency's name and identifier from the Research Organization Registry)
  • funder_divisions -- divisions of the funding agency responsible for the grant
  • program_officials -- administrative offical at the funding agency who oversees the grant (as described in cic_people_export.json)
  • start_date -- starting date for the grant
  • end_date -- ending date for the grant
  • award_amount -- total amount of the grant, in US dollars
  • principal_investigator -- lead researcher for the grant (as described in cic_people_export.json)
  • other_investigators -- researchers who worked on the grant (as described in cic_people_export.json)
  • awardee_organization -- organization that received the grant (as described in cic_organizations_export.json)
  • abstract -- description of the grant

cic_assets_export.json -- Images and videos associated with the CIC researchers and grants. CIC does not store the media files directly, so they are presented as URLs without persistent identifiers. However, all of the URLs are valid at the time the snapshot is made.

  • filename -- name of the asset file. There are two special values that indicate how particular assets are used within the CIC. "cic_video" indicates a video of a presentation the researcher gave to the CIC community. "profile_image" indicates an image that may be used on the researcher's profile.
  • download_path -- URL for accessing the asset file
  • author -- CIC researcher associated with the asset (as described in cic_people_export.json)

cic_clusters.csv -- Clustering of grant data, produced by analysis in the Lingo4g tool.

  • labels -- subject labels assigned to this grant during the clustering process
  • parent _exemplar -- award_id of the grant that serves as exemplar for the cluster hierarchically above the current cluster
  • exemplar -- award_id of the grant that serves as examplar for the current cluster
  • similarity_to_exemplar -- similarity betwen the current grant and the exemplar
  • cluster_labels -- summary subject labels for the current cluster
  • title -- title of the grant
  • award_id -- award_id of the grant
  • state -- US State of the organization that received the grant
  • principal_investigator -- lead researcher for the grant

Sharing/Access information

In addition to the snapshots of data available here, data may also be downloaded from the COVID Information Commons site, which includes an open API.

Data was derived from the following sources:

Code/Software

The COVID Information Commons software repository contains both the code for running the CIC system and the code used to generate these data files.

Methods

The NSF and NIH funded COVID related awards corpus in the CIC was collected primarily from NSF and NIH via APIs. Further information has been collected directly from researchers, who filled out an online form to enhance the descriptions. 

The dataset has been cleaned and enhanced by automated processing, using custom scripts to remove invalid characters, and standardize names of funding agency divisions.

Funding

National Science Foundation, Award: 2028999

National Science Foundation, Award: 2139391