COVID information commons archive
Data files
Feb 07, 2024 version files 84.24 MB
-
cic_assets_export.json
106.10 KB
-
cic_clusters.csv
9.42 MB
-
cic_grants_export.json
68.59 MB
-
cic_orgs_export.json
450.42 KB
-
cic_people_export.json
5.67 MB
-
README.md
4.95 KB
Abstract
The COVID Information Commons (CIC) is an open website portal and community to facilitate knowledge-sharing and collaboration across various COVID research efforts, funded by the NSF Convergence Accelerator and the NSF Technology, Innovation and Partnerships Directorate. The CIC serves as an open resource for researchers, students, and decision-makers from academia, government, not-for-profits and industry to identify collaboration opportunities, to leverage each other's research findings, and to accelerate the most promising research to mitigate the broad societal impacts of the COVID-19 pandemic.
The CIC was developed as a collaborative proposal led by the Northeast Big Data Innovation Hub, hosted by Columbia University, in collaboration with the Midwest Big Data Innovation Hub, South Big Data Innovation Hub, and West Big Data Innovation Hub. It was funded by the NSF Convergence Accelerator (NSF #2028999) in May 2020 and launched in July 2020. The initial focus of the CIC website was on the 723 NSF-funded COVID Rapid Response Research (RAPID) projects funded in 2020. The CIC-E: COVID Information Commons Extension for Pandemic Recovery project was proposed and funded in 2021 (NSF #2139391) by the CIC project team with the goal to increase researcher collaboration across NSF and NIH awardees and with global collaborators, as we continue to combat the novel coronavirus, and glean learnings for future uses of innovations developed for COVID response and recovery, including potential insights which can be leveraged for future pandemics.
The CIC extension launched on June 30, 2022 increasing the corpus of awards from just NSF to include NIH-funded COVID related awards, both present and past, through all funding vehicles, in pertinent areas of COVID research, response and recovery. The CIC-extension provides more opportunity for multi-agency and multidisciplinary research collaboration as all the Principal Investigators (PIs) for awards in the CIC are invited to present their research and collaborate on CIC Research Lighting Talk Webinars and Collaboration Sessions.
README: COVID Information Commons Archive
https://doi.org/10.5061/dryad.37pvmcvqp
This archive is a snapshot of the COVID Information Commons (CIC). The CIC is a live database that records information about COVID-19 researchers and their projects.
Description of the data and file structure
The snapshot of the CIC contains the following files, each listed with a description of the fields it contains:
cic_people_export.json -- Researchers who have studied aspects of COVID-19. All information known about the researchers in CIC, except email addresses, which have been filtered out for privacy purposes. Some researchers have minimal information, as CIC may only know their name via a reference in a grant description. Other people have more complete records, if they have provided additional information to the CIC.
- affiliations -- organizational affiliations of the researcher (as described for cic_orgs_export.json)
- first_name -- researcher's first name
- last_name -- researcher's last name
- orcid -- researchers identifier in the ORCID identifier system
- keywords -- subject keywords that are applicable to the researcher's focus
- website -- URLs for sites associated with the researcher's work
- comments -- clarifying comments that the researcher has submitted about their work, including preferred methods of contact
cic_orgs_export.json -- Organization names, along with their location and ROR identifiers when available.
- ror -- identifier of the organization in the Research Organization Registry
- name -- name of the organization
- state -- US State of the organization
- country -- home country of the organization
cic_grants_export.json -- Grant objects that have been harvested from their respective funding agencies and augmented with other information gathered by CIC.
- award_id -- identifier for this grant as given by the funding agency
- title -- title of the grant
- funder -- funding agency (includes the funding agency's name and identifier from the Research Organization Registry)
- funder_divisions -- divisions of the funding agency responsible for the grant
- program_officials -- administrative offical at the funding agency who oversees the grant (as described in cic_people_export.json)
- start_date -- starting date for the grant
- end_date -- ending date for the grant
- award_amount -- total amount of the grant, in US dollars
- principal_investigator -- lead researcher for the grant (as described in cic_people_export.json)
- other_investigators -- researchers who worked on the grant (as described in cic_people_export.json)
- awardee_organization -- organization that received the grant (as described in cic_organizations_export.json)
- abstract -- description of the grant
cic_assets_export.json -- Images and videos associated with the CIC researchers and grants. CIC does not store the media files directly, so they are presented as URLs without persistent identifiers. However, all of the URLs are valid at the time the snapshot is made.
- filename -- name of the asset file. There are two special values that indicate how particular assets are used within the CIC. "cic_video" indicates a video of a presentation the researcher gave to the CIC community. "profile_image" indicates an image that may be used on the researcher's profile.
- download_path -- URL for accessing the asset file
- author -- CIC researcher associated with the asset (as described in cic_people_export.json)
cic_clusters.csv -- Clustering of grant data, produced by analysis in the Lingo4g tool.
- labels -- subject labels assigned to this grant during the clustering process
- parent _exemplar -- award_id of the grant that serves as exemplar for the cluster hierarchically above the current cluster
- exemplar -- award_id of the grant that serves as examplar for the current cluster
- similarity_to_exemplar -- similarity betwen the current grant and the exemplar
- cluster_labels -- summary subject labels for the current cluster
- title -- title of the grant
- award_id -- award_id of the grant
- state -- US State of the organization that received the grant
- principal_investigator -- lead researcher for the grant
Sharing/Access information
In addition to the snapshots of data available here, data may also be downloaded from the COVID Information Commons site, which includes an open API.
Data was derived from the following sources:
Code/Software
The COVID Information Commons software repository contains both the code for running the CIC system and the code used to generate these data files.
Methods
The NSF and NIH funded COVID related awards corpus in the CIC was collected primarily from NSF and NIH via APIs. Further information has been collected directly from researchers, who filled out an online form to enhance the descriptions.
The dataset has been cleaned and enhanced by automated processing, using custom scripts to remove invalid characters, and standardize names of funding agency divisions.