Daily United States COVID-19 Testing and Outcomes Data By State, March 7, 2020 to March 7, 2021
Data files
Jul 28, 2021 version files 2.69 MB
Abstract
The COVID Tracking Project was a volunteer organization launched from The Atlantic and dedicated to collecting and publishing the data required to understand the COVID-19 outbreak in the United States. Our dataset was in use by national and local news organizations across the United States and by research projects and agencies worldwide.
Every day, we collected data on COVID-19 testing and patient outcomes from all 50 states, 5 territories, and the District of Columbia by visiting official public health websites for those jurisdictions and entering reported values in a spreadsheet. The files in this dataset represent the entirety of our COVID-19 testing and outcomes data collection from March 7, 2020 to March 7, 2021. This dataset includes official values reported by each state on each day of antigen, antibody, and PCR test result totals; the total number of probable and confirmed cases of COVID-19; the number of people currently hospitalized, in intensive care, and on a ventilator; the total number of confirmed and probable COVID-19 deaths; and more.
Methods
This dataset was compiled by about 300 volunteers with The COVID Tracking Project from official sources of state-level COVID-19 data such as websites and press conferences. Every day, a team of about a dozen available volunteers visited these official sources and recorded the publicly reported values in a shared Google Sheet, which was used as a data source to publish the full dataset each day between about 5:30pm and 7pm Eastern time. All our data came from state and territory public health authorities or official statements from state officials. We did not automatically scrape data or attempt to offer a live feed. Our data was gathered and double-checked by humans, and we emphasized accuracy and context over speed. Some data was corrected or backfilled from structured data provided by public health authorities. Additional information about our methods can be found in a series of posts at http://covidtracking.com/analysis-updates.
We offer thanks and heartfelt gratitude for the labor and sacrifice of our volunteers. Volunteers on the Data Entry, Data Quality, and Data Infrastructure teams who granted us permission to use their name publicly are listed in `VOLUNTEERS.md`.
Usage notes
The State Metadata file `state-metadata.csv` contains information about each jurisdiction for which The COVID Tracking Project collected data. These jurisdictions are usually informally termed "states" in the dataset even in the case of commonwealths, territories, and the District of Columbia. Information in the State Metadata file includes names and codes for each state, URLs for that state's official COVID data, and indicators of which fields from the history.csv file were used for each state's "Total Tests" and "New Tests" displays on covidtracking.com pages and charts. Read more about the tremendous variation in how states counted and reported tests at https://covidtracking.com/analysis-updates/counting-covid-19-tests.
The State Notes file `state-notes.md` contains information about data anomalies and individual peculiarities for each state arranged alphabetically by state and subsequently in reverse chronological order. These notes were written by experienced volunteers and staffers -- usually at the time of data collection when anomalies were identified -- and were published daily to state web pages on covidtracking.com along with the data.
The History file `history.csv` contains all values for case data, test data, hospitalization data, and death data collected from 56 US jurisdictions by The COVID Tracking Project from March 7, 2020 through March 7, 2021. Not every state reported all metrics, and not all metrics were reported by every state for the entire period: a blank cell means that that metric was not reported by that state on that date. Data definitions for each field are given in the README file. Please note that since states did not adopt a common standard for COVID data definitions, this file contains many fields -- especially for test data -- that attempt to capture the various definitions in use so that accurate comparisons can be made aomng values within a single field. States differed in whether they lumped together or reported separately "probable" and "confirmed" cases and deaths; whether they lumped together or reported separately antigen, antibody, and PCR tests for COVID infection; whether they used "specimens" versus "people" as test units; and whether they recorded each "test encounter" for each person tested. Moreover, states often defined common terms such as "probable" or "recovered" differently and reported in different time frames. Read more about these data issues at https://covidtracking.com/about-data and https://covidtracking.com/analysis-updates before analyzing this data.