This dataset contains census-tract-level indicators of social vulnerability and community resources across North Carolina, used to study their impact on economic outcomes following the COVID-19 pandemic. The data includes a comparison of variables from 2018 and 2022 to measure changes in unemployment and poverty rates across 1,698 census tracts .
Data Sources and Variables:
Social and Economic Vulnerability: Derived from the 2020 American Community Survey (ACS) and the CDC's Social Vulnerability Index (SVI), including metrics for education level, disability, minority status, and housing cost burden .
Infrastructure and Institutional Capacity: Includes variables such as broadband access (FCC), hospital capacity (NC OneMap), public and non-public school proximity, and annual vehicle hours of public transit service (U.S. DOT) .
Environmental Context: Includes the EPA Walkability Index and Census Bureau diversity indices .
The data were processed using ArcGIS to map geographical overlaps between census tract definitions across different time periods. Summary statistics and correlation matrices are provided to describe the relationship between these vulnerability factors and economic resilience in rural and non-rural areas. This data supports an empirical analysis using linear regression, logistic regression, and random forest models to identify how community resources moderate the effects of social vulnerability on economic recovery.
General Information
- Dataset Title: Data from: Social Vulnerability, Capacity, and Economic Outcomes in North Carolina Populations
- Date of Data Collection: 2018–2022
- Geographic location of data collection: North Carolina, USA (Census Tract level)
Data Overview
This dataset contains a processed merge of census-tract-level indicators used to analyze how community resources moderate the impact of social vulnerability on economic outcomes (specifically change in poverty and unemployment) between 2018 and 2022.
File List
refined_data_20250625.csv: The primary dataset containing all vulnerability indices, capacity variables, and outcome differences for 1,698 North Carolina census tracts. The primary dataset containing all vulnerability indices, capacity variables, and outcome differences for 3,124 North Carolina census tracts. Note: 1,698 tracts have complete SVI data across all three years (2018, 2020, and 2022); the remaining rows contain NaN in at least one SVI year's variables and are excluded from the primary analyses.
regression_dryad.R: R script for linear regression, stepwise AIC selection, and rural/non-rural interaction models.
logistic_regression_dryad.R: R script for logistic regression and quartile-based outcome analysis.
- Python Notebooks:
CDC_Poverty_analysis-7-23-2025-NonRural.ipynb, CDC_Unemployment_analysis-7-23-2025-NonRural.ipynb, CDC_Poverty_analysis-7-23-2025-Rural.ipynb, and CDC_Unemployment_analysis-7-23-2025-Rural.ipynb. These scripts perform the Random Forest ensemble modeling.
Data Dictionary
Tract Identifiers
| Variable Name |
Description |
Source |
| FIPS |
11-digit tract-level FIPS code uniquely identifying each census tract |
U.S. Census Bureau |
| COUNTY |
County name for each census tract |
U.S. Census Bureau |
| category |
Classification of tract as "rural", "suburban" or "urban", from the NC Rural Center county-level designations |
NC Rural Center |
CDC/ATSDR Social Vulnerability Index (SVI) — 2020
All variables in this section are drawn from the CDC/ATSDR Social Vulnerability Index 2020 for North Carolina census tracts. They are based on 2016–2020 American Community Survey (ACS) 5-year estimates. Percentile rankings range from 0 to 1, with higher values indicating greater vulnerability. Full documentation available at: https://www.atsdr.cdc.gov/place-health/php/svi/svi-data-documentation-download.html
SVI 2020 — Theme 1: Socioeconomic Status
| Variable Name |
SVI Field |
Description |
| EP_POV150_2020 |
EP_POV150 |
% of persons below 150% of the federal poverty line |
| EP_UNEMP_2020 |
EP_UNEMP |
Unemployment rate — % of civilian population age 16+ in the labor force who are unemployed |
| EP_HBURD_2020 |
EP_HBURD |
% of occupied housing units with annual income < $75,000 where 30%+ of income is spent on housing costs |
| EP_NOHSDP_2020 |
EP_NOHSDP |
% of persons age 25+ with no high school diploma |
| EP_UNINSUR_2020 |
EP_UNINSUR |
% of civilian noninstitutionalized population who are uninsured |
| RPL_THEME1_2020 |
RPL_THEME1 |
Percentile ranking for the Socioeconomic Status theme (sum of EPL_POV150, EPL_UNEMP, EPL_HBURD, EPL_NOHSDP, EPL_UNINSUR, ranked) |
SVI 2020 — Theme 2: Household Characteristics
| Variable Name |
SVI Field |
Description |
| EP_AGE65_2020 |
EP_AGE65 |
% of persons aged 65 and older |
| EP_AGE17_2020 |
EP_AGE17 |
% of persons aged 17 and younger |
| EP_DISABL_2020 |
EP_DISABL |
% of civilian noninstitutionalized population with a disability |
| EP_SNGPNT_2020 |
EP_SNGPNT |
% of single-parent households with children under 18 |
| EP_LIMENG_2020 |
EP_LIMENG |
% of persons age 5+ who speak English "less than well" |
| RPL_THEME2_2020 |
RPL_THEME2 |
Percentile ranking for the Household Characteristics theme (sum of EPL_AGE65, EPL_AGE17, EPL_DISABL, EPL_SNGPNT, EPL_LIMENG, ranked) |
SVI 2020 — Theme 3: Racial & Ethnic Minority Status
| Variable Name |
SVI Field |
Description |
| EP_MINRTY_2020 |
EP_MINRTY |
% of population identifying as a racial or ethnic minority (Hispanic/Latino of any race; Black/African American; American Indian/Alaska Native; Asian; Native Hawaiian/Pacific Islander; Two or more races; Other races — all non-white-non-Hispanic groups combined) |
| RPL_THEME3_2020 |
RPL_THEME3 |
Percentile ranking for the Racial & Ethnic Minority Status theme |
SVI 2020 — Theme 4: Housing Type & Transportation
| Variable Name |
SVI Field |
Description |
| EP_MUNIT_2020 |
EP_MUNIT |
% of housing units in structures with 10 or more units |
| EP_MOBILE_2020 |
EP_MOBILE |
% of housing units that are mobile homes |
| EP_CROWD_2020 |
EP_CROWD |
% of occupied housing units with more people than rooms |
| EP_NOVEH_2020 |
EP_NOVEH |
% of households with no vehicle available |
| EP_GROUPQ_2020 |
EP_GROUPQ |
% of persons living in group quarters |
| RPL_THEME4_2020 |
RPL_THEME4 |
Percentile ranking for the Housing Type & Transportation theme (sum of EPL_MUNIT, EPL_MOBILE, EPL_CROWD, EPL_NOVEH, EPL_GROUPQ, ranked) |
SVI 2020 — Overall Ranking
| Variable Name |
SVI Field |
Description |
| RPL_THEMES_2020 |
RPL_THEMES |
Overall SVI percentile ranking across all four themes combined |
SVI 2020 — Internet Access (Adjunct Variable)
|
Variable Name |
SVI Field |
Description |
| PERC_NOINT_2020 |
EP_NOINT (derived) |
% of the total population in households without a computer with a broadband internet subscription. Calculated as: 100 * E_NOINT_2020 / E_TOTPOP_2020, where E_NOINT is the SVI 2020 adjunct estimate (drawn from 2016–2020 ACS, table S2802) of households lacking broadband. Note: the standard SVI field EP_NOINT normalizes by number of households; this variable normalizes by total population. |
CDC SVI / ACS (S2802) |
CDC/ATSDR Social Vulnerability Index (SVI) — 2018
All variables in this section are drawn from CDC/ATSDR SVI 2018 for North Carolina. They are based on 2014–2018 ACS 5-year estimates. Note: SVI 2018 used a different set of Theme 1 indicators than SVI 2020. Specifically, the poverty variable (EP_POV_2018) reflects the 100% federal poverty threshold (not 150%), per capita income was used instead of housing cost burden, and uninsured status was an adjunct variable rather than a Theme 1 indicator. As a result, EP_HBURD does not appear in the 2018 variable set.
SVI 2018 — Theme 1: Socioeconomic Status
| Variable Name |
Description |
| EP_POV_2018 |
% of persons below 100% of the federal poverty line (2018 threshold; note: differs from the 150% threshold used in 2020 and 2022) |
| EP_UNEMP_2018 |
Unemployment rate — % of civilian population age 16+ in the labor force who are unemployed |
| EP_NOHSDP_2018 |
% of persons age 25+ with no high school diploma |
| EP_UNINSUR_2018 |
% of civilian noninstitutionalized population who are uninsured (adjunct variable in 2018 SVI; included in Theme 1 beginning with SVI 2020) |
| RPL_THEME1_2018 |
Percentile ranking for the Socioeconomic Status theme (2018 definition) |
SVI 2018 — Theme 2: Household Characteristics
| Variable Name |
Description |
| EP_AGE65_2018 |
% of persons aged 65 and older |
| EP_AGE17_2018 |
% of persons aged 17 and younger |
| EP_DISABL_2018 |
% of civilian noninstitutionalized population with a disability |
| EP_SNGPNT_2018 |
% of single-parent households with children under 18 |
| EP_LIMENG_2018 |
% of persons age 5+ who speak English "less than well" (moved from Theme 3 to Theme 2 in SVI 2020) |
| RPL_THEME2_2018 |
Percentile ranking for the Household Characteristics theme (2018 definition) |
SVI 2018 — Theme 3: Racial & Ethnic Minority Status
| Variable Name |
Description |
| EP_MINRTY_2018 |
% of population identifying as a racial or ethnic minority |
| RPL_THEME3_2018 |
Percentile ranking for the Racial & Ethnic Minority Status theme (2018 definition; included EP_LIMENG, which was moved to Theme 2 in 2020) |
SVI 2018 — Theme 4: Housing Type & Transportation
| Variable Name |
Description |
| EP_MUNIT_2018 |
% of housing units in structures with 10 or more units |
| EP_MOBILE_2018 |
% of housing units that are mobile homes |
| EP_CROWD_2018 |
% of occupied housing units with more people than rooms |
| EP_NOVEH_2018 |
% of households with no vehicle available |
| EP_GROUPQ_2018 |
% of persons living in group quarters |
| RPL_THEME4_2018 |
Percentile ranking for the Housing Type & Transportation theme (2018 definition) |
SVI 2018 — Overall Ranking
| Variable Name |
Description |
| RPL_THEMES_2018 |
Overall SVI percentile ranking across all four themes combined (2018) |
CDC/ATSDR Social Vulnerability Index (SVI) — 2022
All variables in this section are drawn from CDC/ATSDR SVI 2022 for North Carolina. They are based on 2018–2022 ACS 5-year estimates. The 2022 SVI uses the same theme structure and indicator definitions as SVI 2020 (including the 150% poverty threshold and housing cost burden).
SVI 2022 — Theme 1: Socioeconomic Status
| Variable Name |
Description |
| EP_POV150_2022 |
% of persons below 150% of the federal poverty line |
| EP_UNEMP_2022 |
Unemployment rate — % of civilian population age 16+ in the labor force who are unemployed |
| EP_HBURD_2022 |
% of occupied housing units with annual income < $75,000 where 30%+ of income is spent on housing costs |
| EP_NOHSDP_2022 |
% of persons age 25+ with no high school diploma |
| EP_UNINSUR_2022 |
% of civilian noninstitutionalized population who are uninsured |
| RPL_THEME1_2022 |
Percentile ranking for the Socioeconomic Status theme (2022) |
SVI 2022 — Theme 2: Household Characteristics
| Variable Name |
Description |
| EP_AGE65_2022 |
% of persons aged 65 and older |
| EP_AGE17_2022 |
% of persons aged 17 and younger |
| EP_DISABL_2022 |
% of civilian noninstitutionalized population with a disability |
| EP_SNGPNT_2022 |
% of single-parent households with children under 18 |
| EP_LIMENG_2022 |
% of persons age 5+ who speak English "less than well" |
| RPL_THEME2_2022 |
Percentile ranking for the Household Characteristics theme (2022) |
SVI 2022 — Theme 3: Racial & Ethnic Minority Status
| Variable Name |
Description |
| EP_MINRTY_2022 |
% of population identifying as a racial or ethnic minority |
| RPL_THEME3_2022 |
Percentile ranking for the Racial & Ethnic Minority Status theme (2022) |
SVI 2022 — Theme 4: Housing Type & Transportation
| Variable Name |
Description |
| EP_MUNIT_2022 |
% of housing units in structures with 10 or more units |
| EP_MOBILE_2022 |
% of housing units that are mobile homes |
| EP_CROWD_2022 |
% of occupied housing units with more people than rooms |
| EP_NOVEH_2022 |
% of households with no vehicle available |
| EP_GROUPQ_2022 |
% of persons living in group quarters |
| RPL_THEME4_2022 |
Percentile ranking for the Housing Type & Transportation theme (2022) |
SVI 2022 — Overall Ranking
| Variable Name |
Description |
| RPL_THEMES_2022 |
Overall SVI percentile ranking across all four themes combined (2022) |
Community Capacity & Infrastructure Variables
Variables in this section were collected or derived from sources outside the CDC SVI. All variables reflect 2020 conditions unless otherwise noted.
Social Diversity
| Variable Name |
Description |
Source |
| census_diversity_index_2020 |
Probability (0–100) that two randomly selected individuals from the same census tract belong to different racial or ethnic groups. Higher values indicate greater racial/ethnic diversity. Extracted from a shapefile using ArcGIS. |
2020 Census Demographic and Housing Characteristics (DHC): https://www.census.gov/data/tables/2023/dec/2020-census-dhc.html; via Esri ArcGIS Living Atlas: https://www.esri.com/arcgis-blog/products/arcgis-living-atlas/mapping/2020-census-dhc/ |
| census_diffusion_score_2020 |
% of the population in a census tract belonging to a racial or ethnic group that is not one of the three largest groups statewide. Derived from the same shapefile and method as census_diversity_index_2020. |
2020 Census DHC (same sources as above) |
Transportation & Walkability
| Variable Name |
Description |
Source |
| Vehicle_hours_2020 |
Annual vehicle hours of service for local public transportation serving the census tract. Extracted from PDF transit agency profile reports using a Python-based PDF parser. Note: many tracts have a value of 0, reflecting absence of public transit service. |
U.S. Department of Transportation, National Transit Database (NTD) agency profiles: https://www.transit.dot.gov/ntd/transit-agency-profiles/ |
| EPA_walkablity_index |
EPA National Walkability Index score, capturing land-use diversity, street connectivity, and proximity to transit. Originally measured at the block group level; aggregated to the census tract level using ArcGIS. Scores range from 1 (least walkable) to 20 (most walkable). |
U.S. Environmental Protection Agency: https://www.epa.gov/smartgrowth/national-walkability-index-user-guide-and-methodology |
Access to Institutions (Distance in Miles)
Distances were calculated from each census tract centroid to the nearest respective institution using ArcGIS and institution shapefiles from NC OneMap. All values are in miles.
Institutional Capacity
Derived Outcome and Difference Variables
| Variable Name |
Description |
Source |
| POV_diff |
Change in % of population in poverty (2018 to 2022) |
ACS |
| UNEMP_diff |
Change in % of population unemployed (2018 to 2022) |
ACS |
| NOHSDP_diff |
Change in % of persons with no high school diploma (2018 to 2022) |
ACS |
| UNINSUR_diff |
Change in % uninsured (2018 to 2022) |
ACS |
| SVI_Overall_diff |
Change in overall SVI percentile ranking (RPL_THEMES_2022 − RPL_THEMES_2018) |
CDC SVI |
| SVI_Theme1_diff |
Change in SVI Theme 1 (Socioeconomic Status) percentile ranking (2018 to 2022) |
CDC SVI |
| SVI_Theme2_diff |
Change in SVI Theme 2 (Household Characteristics) percentile ranking (2018 to 2022) |
CDC SVI |
| SVI_Theme3_diff |
Change in SVI Theme 3 (Racial & Ethnic Minority Status) percentile ranking (2018 to 2022) |
CDC SVI |
| SVI_Theme4_diff |
Change in SVI Theme 4 (Housing Type & Transportation) percentile ranking (2018 to 2022) |
CDC SVI |
Methodological Information
- Analytical Framework: Independent variables are organized into four conceptual vulnerability categories: Social (household characteristics and demographic vulnerability), Economic (financial strain indicators), Infrastructure (housing conditions and physical access), and Institutional (strength and accessibility of organized systems and services).
- Difference-in-Differences Design: The study uses a difference-in-differences approach, treating the COVID-19 pandemic (most intensely felt around 2020) as an external shock. Pre-pandemic (2018) and post-pandemic (2022) ACS data are compared to identify census tract characteristics associated with better or worse economic recovery.
- Spatial Processing: ArcGIS was used to map census tracts across different years to account for definition changes between 2018 and 2022. Distances to institutions were calculated from tract centroids. Block-group-level variables (e.g., EPA Walkability Index) were aggregated to the tract level using ArcGIS.
- Transformations: In the provided R scripts, skewed capacity variables (e.g., distance, vehicle hours, and hospital beds) are log-transformed to normalize distributions for regression analysis.
- Statistical Approach: Analysis includes OLS regression, bidirectional stepwise selection (AIC), logistic regression comparing the top and bottom quartiles of economic recovery, and Random Forest ensemble models evaluated using AUC and precision-recall curves.
- Rural/Non-Rural Classification: County-level rural/non-rural designations were obtained from the NC Rural Center (https://www.ncruralcenter.org/) and joined to tract-level data by county name. All analyses are stratified by rural vs. non-rural classification.
- SVI Year Comparability Note: Due to methodological changes between SVI 2018 and SVI 2020/2022 (see variable descriptions above), direct comparison of theme rankings across years should be interpreted with caution, particularly for Theme 1 (poverty threshold changed from 100% to 150%; housing cost burden replaced per capita income) and Theme 3 (English language proficiency moved to Theme 2 in 2020).
Usage Notes
- File Structure: To replicate findings, ensure the data file is placed in a folder named
/Data/ relative to the R scripts.
- Outputs: Results from the R scripts are exported as
.html (stargazer tables) and .xlsx files into a /Results/ folder.
- Missing Data: Per CDC SVI documentation, values of -999 indicate missing or unavailable data and were excluded from ranking calculations.
Statistical analysis was performed using R (for linear/logistic regressions and stepwise selection) and Python (for data extraction via PDF parsing and Random Forest classification). Geographic data processing, including census tract mapping and distance calculations, was conducted in ArcGIS.