A decade of digital competency: Emerging themes and trends
Data files
Mar 06, 2026 version files 6.26 MB
-
README.md
3.45 KB
-
S1_Dataset.csv
6.25 MB
-
S2_Code.py
9.78 KB
Abstract
This study applies Latent Dirichlet Allocation (LDA) to analyze 2,288 articles on Digital Competency (DC) from Scopus (2015–2024). It identifies key research themes, tracks their evolution, and highlights shifts in research priorities. Fourteen topics emerged, with "Teacher Education" being the most prominent. Post-COVID-19, DC research surged, with notable growth in "Scales for Digital Competency Measurement" and "Higher Education and COVID-19 Digital Transformation." "Business and Industry 4.0 Transformation" showed the highest inter-topic acceleration. Conversely, interest in "Social Media Use" and "Psychological Well-Being" declined. Although smaller in volume, the "Healthcare" theme gained momentum. Visualization revealed thematic clusters around education, business transformation, and assessment methods. The findings demonstrate a shift toward emerging fields, reflecting technological advances and evolving societal needs.
Title of Dataset
Replication Data and Code for “A Decade of Digital Competency: Emerging Themes and Trends”
Dataset DOI: https://doi.org/10.5061/dryad.2547d7x4x
Dataset Summary
This dataset contains bibliometric text data and analysis code used in the study “A Decade of Digital Competency: Emerging Themes and Trends”. The dataset includes article titles, abstracts, and author keywords collected from Scopus-indexed publications between 2015 and 2024. The accompanying Python script performs text preprocessing, Latent Dirichlet Allocation (LDA) topic modeling, coherence analysis, and visualization of topic structures. The dataset and code enable full reproducibility of the topic modeling analysis reported in the associated publication.
Authors: Seda AKTI ASLAN, Ahmet AYAZ, Erdem ÇEKMEZ
Associated publication: A Decade of Digital Competency: Emerging Themes and Trends (RSOS-251782)
Description of the data and file structure
File 1: S1_Dataset.csv
Content: This file contains bibliometric text data extracted from Scopus records.
Columns
Title – Article title
Abstract – Article abstract
Author Keywords – Keywords provided by the authors
concat_column – Combined text field created by merging Title, Abstract, and Author Keywords. This column was used as the primary input for text preprocessing and topic modeling.
Format: CSV file encoded in UTF-8.
File 2: S2_Code.py
Content: Python script used to conduct the text mining and topic modeling analysis.
Main functions of the script
Text preprocessing: tokenization, stopword removal, lemmatization
Topic modeling: training Latent Dirichlet Allocation (LDA) models and calculating coherence scores to determine the optimal number of topics
Visualization: word cloud generation and interactive topic visualization using pyLDAvis
Outputs generated by the script
Topic-term distributions (CSV)
Document-topic assignments (Excel)
Word cloud visualizations (PNG)
Interactive topic model visualization (HTML)
Code / Software
The analysis was performed using Python (version 3.10 or higher).
Required Python packages
pandas (v1.5+) – data handling and preprocessing
nltk (v3.8+) – text preprocessing (stopwords and lemmatization)
gensim (v4.3+) – Latent Dirichlet Allocation (LDA) topic modeling
matplotlib (v3.6+) – visualizations
wordcloud (v1.9+) – word cloud generation
pyLDAvis (v3.4+) – interactive topic model visualization
Workflow
- Load the dataset (S1_Dataset.csv) into a Python environment using pandas.
- Run the analysis script (S2_Code.py) to perform text preprocessing and train the LDA models.
- The script automatically generates the following outputs: topic-term distributions (CSV), document-topic assignments (Excel), word cloud visualizations (PNG), and interactive topic visualization (HTML using pyLDAvis).
Sharing / Access Information
Other publicly accessible locations of the data
No other public repository currently hosts this version of the dataset.
Data source
The bibliographic records were originally obtained from Scopus. Due to copyright restrictions, raw Scopus records cannot be publicly shared. The dataset provided here (S1_Dataset.csv) is a processed and anonymized version that complies with open data policies.
License
This dataset and accompanying code are released under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication.
