COVID information commons archive
Data files
Feb 07, 2024 version files 84.24 MB
-
cic_assets_export.json
106.10 KB
-
cic_clusters.csv
9.42 MB
-
cic_grants_export.json
68.59 MB
-
cic_orgs_export.json
450.42 KB
-
cic_people_export.json
5.67 MB
-
README.md
4.95 KB
Sep 11, 2025 version files 91.94 MB
-
cic_assets_export.json
114.58 KB
-
cic_clusters.csv
9.42 MB
-
cic_datasets_export.json
285.39 KB
-
cic_grants_export.json
76.09 MB
-
cic_orgs_export.json
475.04 KB
-
cic_people_export.json
5.32 MB
-
cic_publications_export.json
220.64 KB
-
README.md
6.16 KB
Abstract
The COVID Information Commons (CIC) is an open website portal and community to facilitate knowledge-sharing and collaboration across various COVID research efforts, funded by the NSF Convergence Accelerator and the NSF Technology, Innovation and Partnerships Directorate. The CIC serves as an open resource for researchers, students, and decision-makers from academia, government, not-for-profits, and industry to identify collaboration opportunities, to leverage each other's research findings, and to accelerate the most promising research to mitigate the broad societal impacts of the COVID-19 pandemic.
The CIC was developed as a collaborative proposal led by the Northeast Big Data Innovation Hub, hosted by Columbia University, in collaboration with the Midwest Big Data Innovation Hub, South Big Data Innovation Hub, and West Big Data Innovation Hub. It was funded by the NSF Convergence Accelerator (NSF #2028999) in May 2020 and launched in July 2020. The initial focus of the CIC website was on the 723 NSF-funded COVID Rapid Response Research (RAPID) projects funded in 2020. The CIC-E: COVID Information Commons Extension for Pandemic Recovery project was proposed and funded in 2021 (NSF #2139391) by the CIC project team with the goal to increase researcher collaboration across NSF and NIH awardees and with global collaborators, as we continue to combat the novel coronavirus, and glean learnings for future uses of innovations developed for COVID response and recovery, including potential insights which can be leveraged for future pandemics.
The CIC extension launched on June 30, 2022, increasing the corpus of awards from just NSF to include NIH-funded COVID-related awards, both present and past, through all funding vehicles, in pertinent areas of COVID research, response, and recovery. The CIC-extension provides more opportunity for multi-agency and multidisciplinary research collaboration, as all the Principal Investigators (PIs) for awards in the CIC are invited to present their research and collaborate on CIC Research Lighting Talk Webinars and Collaboration Sessions.
https://doi.org/10.5061/dryad.37pvmcvqp
This archive is a snapshot of the COVID Information Commons (CIC). The CIC is a live database that records information about COVID-19 researchers and their projects.
Description of the data and file structure
The snapshot of the CIC contains the following files, each listed with a description of the fields it contains:
cic_people_export.json -- Researchers who have studied aspects of COVID-19. All information known about the researchers in CIC, except email addresses, which have been filtered out for privacy purposes. Some researchers have minimal information, as CIC may only know their name via a reference in a grant description. Other people have more complete records if they have provided additional information to the CIC.
- affiliations -- organizational affiliations of the researcher (as described for cic_orgs_export.json)
- first_name -- researcher's first name
- last_name -- researcher's last name
- orcid -- researcher's identifier in the ORCID identifier system
- keywords -- subject keywords that are applicable to the researcher's focus
- website -- URLs for sites associated with the researcher's work
- comments -- clarifying comments that the researcher has submitted about their work, including preferred methods of contact
cic_orgs_export.json -- Organization names, along with their location and ROR identifiers when available.
- ror -- identifier of the organization in the Research Organization Registry
- name -- name of the organization
- state -- US State of the organization
- country -- home country of the organization
cic_grants_export.json -- Grant objects that have been harvested from their respective funding agencies and augmented with other information gathered by CIC.
- award_id -- identifier for this grant as given by the funding agency
- title -- title of the grant
- funder -- funding agency (includes the funding agency's name and identifier from the Research Organization Registry)
- funder_divisions -- divisions of the funding agency responsible for the grant
- program_officials -- administrative official at the funding agency who oversees the grant (as described in cic_people_export.json)
- start_date -- starting date for the grant
- end_date -- ending date for the grant
- award_amount -- total amount of the grant, in US dollars
- principal_investigator -- lead researcher for the grant (as described in cic_people_export.json)
- other_investigators -- researchers who worked on the grant (as described in cic_people_export.json)
- awardee_organization -- organization that received the grant (as described in cic_organizations_export.json)
- abstract -- description of the grant
cic_assets_export.json -- Images and videos associated with the CIC researchers and grants. CIC does not store the media files directly, so they are presented as URLs without persistent identifiers. However, all of the URLs are valid at the time the snapshot is made.
- filename -- name of the asset file. There are two special values that indicate how particular assets are used within the CIC. "cic_video" indicates a video of a presentation the researcher gave to the CIC community. "profile_image" indicates an image that may be used on the researcher's profile.
- download_path -- URL for accessing the asset file
- author -- CIC researcher associated with the asset (as described in cic_people_export.json)
cic_datasets_export.json -- Dataset descriptions that have been harvested from DataCite and connected to relevant objects in the CIC.
- doi -- Digital Object Identifier of the dataset
- title -- title of the dataset
- size -- filesize of the dataset, in bytes. This size is the sum of all files in the dataset.
- authors -- CIC researchers associated with the dataset (as described in cic_people_export.json)
- grants -- CIC grants associated with the dataset (as described in cic_grants_export.json)
- publications -- CIC publications associated with the dataset (as described in cic_publications_export.json)
cic_publications_export.json -- Descriptions of publications that have been harvested from CrossRef and connected to relevant objects in the CIC.
- doi -- Digital Object Identifier of the publication
- title -- title of the publication
- authors -- CIC researchers associated with the dataset (as described in cic_people_export.json)
- grants -- CIC grants associated with the dataset (as described in cic_grants_export.json)
cic_clusters.csv -- Clustering of grant data, produced by analysis in the Lingo4g tool.
- labels -- subject labels assigned to this grant during the clustering process
- parent _exemplar -- award_id of the grant that serves as exemplar for the cluster hierarchically above the current cluster
- exemplar -- award_id of the grant that serves as an exemplar for the current cluster
- similarity_to_exemplar -- similarity between the current grant and the exemplar
- cluster_labels -- summary subject labels for the current cluster
- title -- title of the grant
- award_id -- award_id of the grant
- state -- US State of the organization that received the grant
- principal_investigator -- lead researcher for the grant
Sharing/Access information
In addition to the snapshots of data available here, data may also be downloaded from the COVID Information Commons site, which includes an open API.
Data was derived from the following sources:
Code/Software
The COVID Information Commons software repository contains both the code for running the CIC system and the code used to generate these data files.
Version changes
2025-9-11: Updated existing JSON files to include content harvested and modified since the previous snapshot. Added new JSON files for datasets and publications.
The NSF and NIH-funded COVID-related awards corpus in the CIC was collected primarily from NSF and NIH via APIs. Further information has been collected directly from researchers, who filled out an online form to enhance the descriptions.
The dataset has been cleaned and enhanced by automated processing, using custom scripts to remove invalid characters and standardize the names of funding agency divisions.