Plain-text conversions of U.S. Final Environmental Impact Statements from 2013-2020


Administrative procedures are intended to increase transparency and help agencies make better decisions. However, these requirements also increase agency workload. Understanding how public agencies satisfy procedural requirements is a critical facet of agency performance. This analysis focuses on the language agencies use in Environmental Impact Statements (EISs) required by the U.S. National Environmental Policy Act (NEPA) – specifically, the reuse of similar text within and between assessments. We synthesize theories of institutional isomorphism and bureaucratic coping to understand why and how text is reused, and consider the tradeoffs associated with this behavior. Using a national dataset of 1014 EISs published by 22 U.S. agencies from 2013 to 2020, we explore how boilerplate language varies by agency, authors, project type, location, and consulting firm involvement. We find that text reuse primarily occurs where there is a clear substantive rationale for boilerplate language or where studies share authors or contract consulting firms. This indicates: (1) that agencies largely do not merely engage in pro forma compliance efforts; and (2) that while NEPA procedures are oriented around individual projects and decisions, cross-project learning and the narrowness – or breadth – of agencies’ project portfolios shape analytical routines and the relative tradeoffs of boilerplate text in policy analysis. This paper adds to our theoretical understanding of agencies’ coping strategies in response to institutional pressures and makes a methodological contribution by demonstrating the application of text reuse measurement and information extraction methods in public administration research.


These data are full-text representations of EIS documents and associated metadata stored in the US EPA’s e-NEPA repository. The e-NEPA page provides a nearly comprehensive record of final EISs (FEISs) FEISs published since October 2012, although a small proportion of files are not posted, and some file links are corrupted or broken. We used web-scraping to record available metadata and download available documents. These raw data include all FEISs obtained from the e-NEPA page. The subsequent analysis concers a subsample of FEISs published between 2013 and 2020 for agencies that completed at least 5 EISs during that time. We exclude Adopted FEISs, which are cases where one agency uses an EIS already prepared by another agency in lieu of a separate analysis, as well as Withdrawn FEISs. There are also several cases where two agencies (often the Bureau of Land Management [BLM] and FS) file EISs referring to the same project, using almost identical documentation. In these cases, we keep the EIS published first.

After download, the documents were converted to plain text using the pdftools pckage in R and stored in data.table format ( by page. Finally, the aggregated text is stored as an .RDS file for compression (the equivalent csv is ~10GB).

Usage Notes

*All code used to obtain these data and used to generate the analyses and results shown in the associated publication are available at

*An earlier version of this data set included a text corpus that contained both DEIS and FEIS records. Since only FEISs are analyzed in the paper--and thus necessary to replicate the work--we have replaced the original file with an FEIS-only file that is half the file size and thus much easier to download and manipulate.