A tale of two coasts: Unveiling U.S. Gulf and Atlantic coastal cities at high flood risk
Data files
Apr 06, 2026 version files 329.59 MB
-
Damage_Points_FRFs_Values.zip
7.62 MB
-
FRM_Data.zip
114.17 MB
-
Python_Scripts.zip
4.80 MB
-
README.md
4.71 KB
-
Shapefiles.zip
202.99 MB
Abstract
Flood impacts in coastal cities can be mitigated through proactive risk management, which requires comprehensive risk assessment. This study develops a data-driven risk assessment framework to identify high-risk coastal cities, estimate exposed populations and their infrastructures, and reveal underlying flood risk factors under dual scenarios—General Flood Damage (GFD) and Extreme Flood Damage (EFD)—along the U.S. Gulf and Atlantic Coasts (USGAC). Using historical flood damage data and 16 factors representing hazard, exposure, and vulnerability, three machine learning models—Support Vector Machine, Random Forest, and Multi-Layer Perceptron—are adopted to assess flood risk. Results indicate that under GFD, 1.14 % of the area with 16.67 % of the populations are at very high risk, while under EFD, 0.53 % of the area with 4.08 % of the population faces very high risk. Eight coastal cities are identified as high-risk. New York City has the largest population at risk (GFD: 4.75 M, EFD: 4.40 M), while New Orleans has the highest relative exposure (~99 % under both). Low elevation is the most influential factor for GFD, and high drainage density is the dominant factor for EFD. This scalable framework offers actionable insights for policymakers to reduce flood risk.
Dataset DOI: 10.5061/dryad.bzkh189q7
Description of the data and file structure
This study develops a robust and comprehensive data-driven framework of flood risk modelling to unveil flood-prone cities, estimate exposed population, and identify underlying factors of flood risk under two scenarios: General Flood Damage (GFD) and Extreme Flood Damage (EFD) along the U.S. Gulf and Atlantic Coasts (USGAC).
You will find all associated data and codes utilized in this study.
Files and variables
File: Damage_Points_FRFs_Values.zip
Description: This study utilized direct flood damage to human property from the Federal Emergency Management Agency (FEMA) website. The csv files (All_FL_dmg_63k_GFD and Destroyed_FL_dmg19k_EFD) contain all the corresponding values, extracted from flood risk factors (hazards, exposure and vulnerability) for each of these flood damage points.
Metadata of csv files:
| Variable names | Descriptions | Units |
| CID/class | Lebel of each flood damage points (flood damage=1 and non-damage=0). | - |
| Geometry | Geographic coordinates of each flood damage point. | Latitude & longitude |
| Elevation | The height of a region relative to mean sea level | meter |
| Slope | Topographic slope | degree |
| Soil_Moisture | Soil water content at 0 cm depth. | % |
| Precip91 | 30-year average of monthly total precipitation | mm |
| DisRiv | Distance from rivers, wetlands and coastlines. | km |
| NDVI | Normalized difference vegetation index | - |
| TWI_Fil2 | Topographic wetness index (TWI) refers to the spatial distribution of wetness across a region | - |
| DD | Drainage density is the ratio of total channel length to basin area. | km² |
| PopDen | Population density refers to the number of people living within a certain area. | persons/ km2 |
| BuildingHeight | The average height of buildings in a block group, categorized from low to high (1-5). | - |
| Dis_Road | Distance from roads. | km |
| POV | Percentage of population below 150% poverty. | % |
| Minority | Percentage of the minorities (Hispanic, African American, and American Indian). | % |
| NoHSD | Percentage of population with no high school diploma. | % |
| Age65 | Percentage of population aged above 65 years old. | % |
| Age5 | Percentage of population aged under 5 years old. | % |
File: FRM_Data.zip
Description: The raster files of flood risk maps under both GFD and EFD scenario are included here.
File: Python_Scripts.zip
Description: All codes including flood risk factors processing, flood risk modelling, SHAP analysis, and uncertainty tests are included here.
File: Shapefiles.zip
Description: All relevant shapefiles are included.
- The boundary of the study area
- Flood damage points
- Very high and high flood risk zones across all at-risk coastal cities
- Population data at the block group level along the USGAC
Code/software
Google Earth Engine, Python, and ArcGIS Pro are used to process and analyze the data.
Access information
The original datasets used in this study were obtained from publicly available sources, including FEMA, NASA/USGS, PRISM Climate Group, Landsat, Esri, CDC/ATSDR, Open Land Map, USGS, U.S. Census Bureau/ TIGER Data, and Microsoft Building Footprints.
These datasets are not redistributed in this repository due to licensing restrictions.
All data processing steps are fully documented in the original manuscript, and the data processing codes are provided, which enable users to independently access and reproduce the datasets used in this study.
