Data from: Predicting the composition of solid waste at the county scale
Data files
Jul 01, 2025 version files 37.92 KB
-
Compiled_Characterization.csv
33.94 KB
-
README.md
3.98 KB
Abstract
This repository contains compiled data from published waste characterization studies with intent for training waste composition prediction models. The data is first used in a study published in the journal Waste Management titled "Predicting the composition of solid waste at the county scale". The data available is contained in reports that we refer to as waste characterization studies; these reports are often titled similarly. The data is pulled from these reports, where each data point is an estimated percentage breakdown of the waste stream into various material categories for a particular geographic area and year. Some reports provide multiple data points, such as data for each county within a state. The project team manually extracted this data from the publicly available report PDFs. Since different reports often use unique sets of material categories, the team translated the data into a standardized characterization. This process is detailed in the aforementioned paper.
Dataset DOI: 10.5061/dryad.k3j9kd5kq
Files and variables
File: Compiled_Characterization.csv
Description: The resulting data table has 58 columns: 3 description columns, 43 detailed waste material category columns, and 12 aggregated waste category columns. Each row (i.e., waste characterization datapoint) contains the following:
- Notes: The area covered by the data, often matching the verbiage of the original report.
- YEAR: The reported year the data was gathered (note: this may be earlier than the year that the data was published).
- FIPS: Comma separated list of Federal Information Processing System (i.e., unique identifiers of geographic areas) codes for the counties that are represented by the data.
- Detailed Material Categories (Columns 4-46): Percentage of the waste stream that is made up by the material category represented by each column (note: materials categories that did not match the common categorization were left incomplete).
- Newspaper
- Corrugated Cardboard
- Office
- Magazine/Glossy
- Paperboard
- Mixed Paper
- Other Paper
- #1 PET Bottles
- #2 HDPE Bottles
- #3-#7 Bottles
- Expanded Polystyrene
- Film Plastic
- Other Rigid Plastic
- Clear Glass
- Green Glass
- Amber Glass
- Other Glass
- Steel Cans
- Aluminum Cans
- Other Ferrous
- Other Non-Ferrous
- Appliances/White Goods
- Yard Waste
- Wood
- Food Waste
- Textiles
- Other Organics
- Wood (C&D)
- Drywall
- Inerts
- Fines/Dirt
- Carpet (inc padding)
- Asphalt Roofing
- Other C&D
- Vehicle Fluids
- Lightbulbs
- Pharmaceutical/Medical
- Other HHW
- Electronics
- Batteries
- Tires
- Bulky Items/Furniture
- Other Inorganics
- Aggregated Material Categories (Columns 47-58): Combination of detailed material categories into fewer general material categories.
- Paper
- Plastic
- Glass
- Metal
- Yard Waste (Major)
- Food
- Textiles (Major)
- Other Organics (Major)
- Wood (Major)
- C&D Excluding Wood
- HHW
- Other
Missing Data: All missing data, input as 'N/A', is from a source that reported the data in an aggregated manner that is not compatible with the chosen categories. For example, if a report includes only the aggregate "Glass" data, then we cannot fill in data for "Clear Glass" or "Green Glass". In those cases, the aggregated data is reported, but missing values are introduced into the more detailed categories.
Abbreviations: PET - Polyethylene terephthalate (a type of plastic). HDPE - High-density polyethylene (a type of plastic). C&D - Construction and Demolition. HHW - Household Hazardous Waste.
Predictor Data
The data used to train and test the composition prediction model is publicly available. This data was compiled from three sources: Bureau of Economic Analysis (BEA), U.S. Census Bureau, and National Cancer Institute (NCI). Note: this data was not used for prediction as it is available. The composition prediction model described in the paper used transformations of this data. The data was made scale invariant (such as population/area to get population density and GDP/population to get per-capita GDP), and the natural logarithm was taken to better linearly correlate with the isometric log ratio transformed compositions. Details can be found in the paper.
Access information
- The BEA data was retrieved in 2024 from the interactive tables hosted on the BEA website: https://www.bea.gov/itable/
- The U.S. Census Bureau data was retrieved in 2024 from the TIGERweb application: https://tigerweb.geo.census.gov/tigerwebmain/TIGERweb_main.html
- The NCI data was retrieved from the SEER county population data posted in 2022: https://seer.cancer.gov/popdata/
