Digital media visibility of US National Parks
Data files
Nov 11, 2025 version files 78.56 MB
-
Data_Code.zip
77.99 MB
-
ReadMe_DataDescription.pdf
562.29 KB
-
README.md
1.73 KB
Abstract
The US National Park (NP) system is presently comprised of 63 distinct protected areas encompassing ∼212,000 km2 (2% of US land area). For over a century, the US National Park Service has been charged with managing the NP system, yet faces a paradoxical dual mandate – to both preserve invaluable national park environmental and cultural resources for future generations, and to ensure their public accessibility for recreational enjoyment. Yet with >124 million visitors in 2019, the NPs are at risk of being ‘loved to death’, as with protected areas the world over. Against this backdrop, we analyzed the structure and dynamics of this coupled human-natural system through a public lens constructed from >426,000 digital media articles mentioning at least one NP. Whereas NP visitation increased by 29% and federal budget levels decreased by 15% over the last decade, NP media visibility increased >3900%, with media articles featuring >1 NP largely associated with tourism messaging. Hence, despite NPs fostering public appreciation necessary to protect natural capital, the emergence of ecotourism marketing coupled with a growing global tourist population has rendered visitation growth unsustainable.
https://doi.org/10.5061/dryad.fttdz091w
This document describes the contents of the folder Data_Code/, which contains tabular data and code notebooks generated with Mathematica (v12/13) for reproducing the analysis associated with this data deposit. Primary source records consisting of metadata for individual digital media articles mentioning each US National Park (NP) by its official name at least once were obtained from the Media Cloud project via its open-source API, and integrated with other metadata about National Parks obtained from the US NPS and Wikipedia.
Description of the data and file structure
All data files in Data_Code.zip are provided in CSV (comma-separated value). See ReadMe_DataDescription.pdf for additional details on which Mathematica notebook files generate the corresponding figures in each associated study, and for a description of the input files and output figures/data files generated by each notebook. The workflow for executing Mathematica notebooks is to hit Shift+Enter to execute commands contained in any given cell; the initial cells upload the data files, and from there, the notebook cells should be executed from start to end in linear order.
Sharing/Access information
Primary source data were derived from the following sources:
- https://www.mediacloud.org/
- https://en.wikipedia.org/wiki/List_of_national_parks_of_the_United_States
- https://irma.nps.gov/Portal/
Code/Software
Notebooks were developed with Mathematica version 13.
The US NP system is presently comprised of 63 distinct protected areas encompassing ∼212,000 km2 (2% of US land area). To analyze this "system of systems", we assembled a dataset of 426,069 unique web-based digital media articles representing news articles, blog posts, and other web content specifically mentioning any of the 63 U.S. national parks by their official name, e.g., “Yosemite National Park”. These media articles were originally produced by 16,030 unique media sources, according to primary source data obtained from the Media Cloud project (MC) database, https://www.mediacloud.org/. In the associated report, we use these data to develop a co-occurrence framework for defining park-park relationships based upon media article co-visibility over the period 2000-2020, by applying concepts and methods from network science, machine learning (NLP), and organizational science.
