C-CAP digital collections and exhibitions research project DPLA contributor data
Data files
Apr 20, 2026 version files 1.32 MB
-
DPLA-all-contributors-final.csv
1.14 MB
-
DPLA-provider-table.csv
19.81 KB
-
DPLA-queries.zip
146.90 KB
-
list-of-field-types.csv
1.48 KB
-
README.md
9.92 KB
Apr 20, 2026 version files 1.32 MB
-
DPLA-all-contributors-final.csv
1.14 MB
-
DPLA-provider-table.csv
19.81 KB
-
DPLA-queries.zip
146.90 KB
-
list-of-field-types.csv
1.48 KB
-
README.md
9.93 KB
Abstract
This data was gathered as part of a research assessment with the goal of identifying the types of institutions represented in the national aggregation of the Digital Public Library of America (DPLA). DPLA is a non-profit organization that brings together digital primary sources from libraries, archives, museums, and other cultural heritage organizations across the US into a single website, which initially launched in 2013. The work was conducted as part of the University of California Irvine Libraries’ Mellon-funded project “Community-Centered Archives Practice: Transforming education, archives, and community history,” abbreviated as “C-CAP TEACH.” C-CAP TEACH is a 4-year project that cultivates commitment among higher education institutions to community-centered archives approaches, and aims to encourage academic libraries to engage critically and responsibly, to contribute to social justice-focused scholarship, training, pedagogy, and partnerships in their communities. The initial data was retrieved via API query of the DPLA website in April 2023, with additional queries conducted from September 2023 through May 2024.
This data was gathered as part of a research assessment conducted within University of California Irvine Libraries’ broader Mellon-funded project called “Community-Centered Archives Practice: Transforming education, archives, and community history,” abbreviated as “C-CAP TEACH,” with the goal of identifying the types of institutions represented in the national aggregation of the Digital Public Library of America (DPLA).
The research assessment was developed collaboratively between the University of California Irvine Libraries and the California Digital Library, which stewards the Calisphere digital aggregation platform.
This initiative is generously funded by The Andrew W. Mellon Foundation from 2022-2026.
Description of the data and file structure
DPLA-queries.zip includes two text files with the initial queries and resulting set of data retrieved from the DPLA API in April 2023. The queries were conducted for each DPLA Hub to retrieve a list of contributing institutions (See DPLA-provider-table.csv for a list of all DPLA Hubs, as of April 2023).
Disambiguating contributing institutions from intermediaries
We observed several contributing institutions participating in the DPLA aggregation by way of another organization (e.g., an intermediary organization providing local services, and subsequently contributing to the DPLA Hub). We include the intermediary information when available. We conducted additional queries to the DPLA API in order to identify the intermediary organizations indicated within the DPLA index. The query and results are included in the DPLA-queries.zip, and were conducted September 2023 to May 2024.
Normalizing contributing institution data
To identify the names of contributing institutions, we relied on the resulting data from our DPLA queries. DPLA harvests metadata from their Hubs; while the Hubs offer varying services to either harvest or host digital materials from contributing institutions. Institution names are provided by the contributing institutions through a metadata field.As we reviewed the initial data, we observed that some metadata fields, such as the contributor name, could occasionally contain errors. For example, the “contributor” field could contain author, creator, or other information about the object itself instead of the expected contributor name. To ensure as much accuracy as possible, we manually normalized the data provided in this field. The initial DPLA-queries.zip provides the original data, and the final dataset DPLA-all-contributors-final.csv reflects how the project team normalized this field.
Vocabulary to define institution types
The list-of-field-types.csv contains the vocabulary set applied to define each contributor participating in the DPLA aggregation. The DPLA-all-contributors-final.csv culminates as a list of all contributors alongside the selected vocabulary definition. The project team conducted some level of research including visiting organization websites, reading mission statements, investigating tax-exempt status, etc, in order to identify and apply the vocabulary. We recognize that some level of interpretation was involved in order to define organizations according to the information publicly available at the time, and we acknowledge that the classification of many of these fields is subjective.
Acronyms list
- DPLA: Digital Public Library of America
- C-CAP TEACH: Community-Centered Archives Practice: Transforming education, archives, and community history
Data and file contents listing
DPLA-queries.zip
The initial data was retrieved from the DPLA API in April 2023, following instructions from the DPLA API Codex. DPLA-queries.zip contains the queries conducted in April 2023 and the subsequent data retrieved. Initial queries were conducted to retrieve contribution institution (“dataProvider”) names by each DPLA Hub (“provider”). Additional queries were conducted to identify any intermediary institutions (“intermediateProvider”) facilitating the contribution process, covering September 2023 to May 2024.
list-of-field-types.csv
The list-of-field-types.csv builds on the initial query results to identify the following set of information for each institution: “Contributor: Name,” “Contributor: Number of item records,” “Contributor: Organizational Structure,” “Contributor: If government, level,” “Contributor: Is library or archive,” “Contributor: If library or archive, type,” “Contributor: Is museum,” “Contributor: Is cultural heritage site,” “Contributor: Is historical society or commission,” “Contributor: Is professional organization,” Contributor: Is religious organization,” “Contributor: Is HBCU,” “Contributor: State (USA),” “Contributor: Country,” “Partner Institution: Name,” “Partner Institution: Organizational Structure,” “Partner Institution: Is library or archive,” “Partner Institution: State (USA),” “Hub: Name,” Hub: Organizational Structure,” “Hub: Contribution Scope,” “Hub: State (USA).” This list includes the vocabulary set for each data field. Cells that are blank or marked “empty” indicate that the information was either unavailable or not applicable.
DPLA-provider-table.csv
DPLA-provider-table.csv is a summary of the DPLA Hubs (“provider”) that participated in DPLA in April 2023, when the data was initially retrieved. This spreadsheet identifies the following information: “Hub: Name,” “Hub: Number of Contributors (API),” “Hub: Number of Contributors (deduplicated),” “Hub: Number of item records,” “Hub: Contribution Scope,” “Hub: State (USA),” “Hub: Organizational structure,” “Note: Institution names,” “Note: Institution description,” “Note: Institution description source URL.” Cells that are blank or marked “empty” indicate that the information was either unavailable or not applicable.
DPLA-all-contributors-final.csv
DPLA-all-contributors-final.csv contains the reviewed, normalized, and augmented data from DPLA-queries.zip, the initial data retrieved from the DPLA API in 2023. This data has been reviewed for duplication of contributing institution (“dataProvider”) names (e.g., spelling or abbreviation inconsistencies). Any duplicated contributing institution name was de-duplicated and merged. For data mapping anomalies, in which the “dataProvider” field, for example, outputs an author or publishing company name (as indicated by the source data) instead of the contributing institution, we made an effort to identify the contributing institution and merge the record counts.
With the resulting normalized dataset, we classified each contributor according to the vocabulary set described in list-of-field-types.csv. We used the contributing institution’s website to identify further information, when available. We also used the data available in the Internal Revenue Service’s Tax Exempt Organization Search tool to help identify non-profit organizations. We acknowledge that the classification of many of these fields is subjective, and our analysis is based on the information publicly available at the time. Cells that are blank or marked “empty” indicate that the information was either unavailable or not applicable.
Sharing/Access information
Data used in the collection of these datasets were derived from the following sources:
- Digital Public Library of America
- DPLA API Codex: Includes documentation and resources to query the DPLA API.
- DPLA Terms & Conditions: States that “all metadata has been dedicated to the public pursuant to Creative Commons’ CC0 public domain dedication, and is available for download through DPLA’s Metadata API”
- DPLA Metadata Application Profile: Documents how metadata is structured in DPLA; also includes a policy statement on metadata and re-use principles.
The “Community-Centered Archives Practice: Transforming education, archives, and community history” (C-CAP TEACH) project features several interconnected components:
- Training and compensating students for intensive educational experiences in archival stewardship and working on community-centered archives projects;
- Providing direct payments to community-based organizations in the Orange County, California region that participate in community-centered archives partnerships;
- Implementing an assessment project to identify actionable strategies to support ethical and responsible representation of marginalized histories in digital collections;
- Redistributing funding to academic libraries to apply Community-Centered Archives Practice (C-CAP) models to collaborative community history work;
- Developing a comprehensive Community-Centered Archives Practice (C-CAP) Hub website; and
- Building a coalition of community-centered archives practitioners in the United States, including a National Summit.
Additional documents related to these additional components are available online:
Acknowledgments
We recognize the leadership and contributions of our colleagues in this research, including: Audra Eagle Yun, Krystal Tribbett, Julia Huynh, Hanako Ishizuka-Gunderson, Rivka Arbetter, Audrey Altman, Jolene Beiser, Sharon Mizota, Adrian Turner, and Thuy Vo Dang.
