University ecosystem analytics: Case study of regional integration and competitiveness in California and Texas
Data files
Jan 15, 2025 version files 18.38 MB
-
Data_Code.zip
18 MB
-
ReadMe_DataDescription.pdf
371.22 KB
-
README.md
1.93 KB
Abstract
Despite substantial policy efforts aimed at developing regional innovation systems (RIS), our understanding of institutional factors that promote synergy and integration at the regional scale is limited. To address this gap, we constructed 2 representations of research university ecosystems in California (CA) and Texas (TX) that identify institutional co-occurrences in research and news media, within and across these regions. The selection of these regions is attributed to the University of California and the University of Texas, two multi-campus university systems (MUS) that feature distinct configurations of institutional specialization. As such, we exploit these differences to analyze four institutional assortativity channels that foster system-level synergies: institutional proximity, prestige, homophily, and specialization.
The first representation we constructed is based upon ~3 million publications collected from Clarivate Analytics Web of Science Core Collection (WOS) that are affiliated with at least one of the 28 institutions in our sample, which together represent >5% of publications indexed by WOS over the sample period 1970-2020. The 28 institutions consist of 10 institutions belonging to the University of California (UC) system and 12 institutions belonging to the University of Texas (UT) system; we complement these two public multi-campus university systems (MUS) by including six prominent private universities, which represent a non-MUS comparison group.
As universities increasingly compete for visibility to attract student enrollment and build scientific reputation, the management of institution of higher education (IHE) brand has emerged as an important strategic endeavor. Hence, the second representation we constructed is based upon ~2 million digital news media articles published between 2000-2020 that specifically mention at least one of these universities. Similar to the first representation, mapping the rates of digital media co-visibility among IHE facilitates a systems-level understanding of the factors that condition the structure and dynamics of brand stratification within research university ecosystems, and fosters the development of novel measures for two dimensions of brand equity – namely, visibility and association.
README: University ecosystem analytics - Case study of regional integration and competitiveness in CA and TX:
Brief summary
This document describes the contents of the folder Data_Code/, constructed around two subfolders that contain tabular data and code notebooks generated with Mathematica (v13) for reproducing the university ecosystem analytics described in the two studies associated with this data deposit. Primary source records consist of: (i) metadata for individual research articles published by scholars affiliated with at least one of the focal universities in CA and TX, obtained from the Clarivate Analytics Web of Science Core Collection; (ii) metadata for individual digital media articles mentioning each university (from the same set) by its official name at least once, were obtained from the Media Cloud project via its open-source API. Data files contained in this repository are summary statistics derived from preliminary calculations applied to the primary source data.
Description of the Data and file structure
Source data files are provided in CSV (comma separated value), TSV (tab separated value), and DTA (used by STATA software). Programs required: Mathematica 13 (or later version) and STATA 13 (or later version). The workflow for executing Mathematica notebooks is to hit Shift+Enter to execute commands contained in any given cell; the initial cells upload the data files, and from there the notebook cells should be executed from start to end in linear order. The workflow for STATA files are defined in provided .do files. See ReadMe_DataDescription.pdf
for additional details on which data files generate the corresponding figures in each associated study.
Sharing/access Information
Source data locations:
https://www.webofscience.com/wos/woscc/basic-search
https://www.mediacloud.org/
Methods
1) Research affiliated with a particular institution. We collected 2,965,198 records published between 1970-2020 from the Clarivate Analytics WOS Core Collection using their in-house institutional disambiguation tool to identify publications with at least one author from a particular campus.
2) Digital media affiliated with a particular institution. We assembled a dataset of 1,947,349
unique web-based digital media articles representing news articles, blog posts and other web content specifically mentioning any of the institutions by their official name, e.g. “University of California Los Angeles” or “UCLA”, accounting for the official abbreviations. These media articles were originally produced by 57,947 unique media sources, according to primary source data obtained from the Media Cloud project (MC) database, https://www.mediacloud.org/ .
We use both data sources to develop a co-occurrence framework for defining university-university relationships based upon research co-production (via collaboration among scholars affiliated with each university) and media article co-visibility over the period 2000-2020, by applying concepts and methods from network science, machine learning (NLP) and organizational science.