A decade of digital competency: Emerging themes and trends

Akti Aslan, Seda 1 ; Ayaz, Ahmet2; Çekmez, Erdem3

Published Mar 06, 2026 on Dryad. https://doi.org/10.5061/dryad.2547d7x4x

Data files

Mar 06, 2026 version files 6.26 MB

README.md

3.45 KB
S1_Dataset.csv

6.25 MB
S2_Code.py

9.78 KB

Abstract

This study applies Latent Dirichlet Allocation (LDA) to analyze 2,288 articles on Digital Competency (DC) from Scopus (2015–2024). It identifies key research themes, tracks their evolution, and highlights shifts in research priorities. Fourteen topics emerged, with "Teacher Education" being the most prominent. Post-COVID-19, DC research surged, with notable growth in "Scales for Digital Competency Measurement" and "Higher Education and COVID-19 Digital Transformation." "Business and Industry 4.0 Transformation" showed the highest inter-topic acceleration. Conversely, interest in "Social Media Use" and "Psychological Well-Being" declined. Although smaller in volume, the "Healthcare" theme gained momentum. Visualization revealed thematic clusters around education, business transformation, and assessment methods. The findings demonstrate a shift toward emerging fields, reflecting technological advances and evolving societal needs.

Title of Dataset

Replication Data and Code for “A Decade of Digital Competency: Emerging Themes and Trends”

Dataset DOI: https://doi.org/10.5061/dryad.2547d7x4x

Dataset Summary

This dataset contains bibliometric text data and analysis code used in the study “A Decade of Digital Competency: Emerging Themes and Trends”. The dataset includes article titles, abstracts, and author keywords collected from Scopus-indexed publications between 2015 and 2024. The accompanying Python script performs text preprocessing, Latent Dirichlet Allocation (LDA) topic modeling, coherence analysis, and visualization of topic structures. The dataset and code enable full reproducibility of the topic modeling analysis reported in the associated publication.

Authors: Seda AKTI ASLAN, Ahmet AYAZ, Erdem ÇEKMEZ

Associated publication: A Decade of Digital Competency: Emerging Themes and Trends (RSOS-251782)

Description of the data and file structure

File 1: S1_Dataset.csv

Content: This file contains bibliometric text data extracted from Scopus records.

Columns

Title – Article title

Abstract – Article abstract

Author Keywords – Keywords provided by the authors

concat_column – Combined text field created by merging Title, Abstract, and Author Keywords. This column was used as the primary input for text preprocessing and topic modeling.

Format: CSV file encoded in UTF-8.

File 2: S2_Code.py

Content: Python script used to conduct the text mining and topic modeling analysis.

Main functions of the script

Text preprocessing: tokenization, stopword removal, lemmatization

Topic modeling: training Latent Dirichlet Allocation (LDA) models and calculating coherence scores to determine the optimal number of topics

Visualization: word cloud generation and interactive topic visualization using pyLDAvis

Outputs generated by the script

Topic-term distributions (CSV)

Document-topic assignments (Excel)

Word cloud visualizations (PNG)

Interactive topic model visualization (HTML)

Code / Software

The analysis was performed using Python (version 3.10 or higher).

Required Python packages

pandas (v1.5+) – data handling and preprocessing

nltk (v3.8+) – text preprocessing (stopwords and lemmatization)

gensim (v4.3+) – Latent Dirichlet Allocation (LDA) topic modeling

matplotlib (v3.6+) – visualizations

wordcloud (v1.9+) – word cloud generation

pyLDAvis (v3.4+) – interactive topic model visualization

Workflow

Load the dataset (S1_Dataset.csv) into a Python environment using pandas.
Run the analysis script (S2_Code.py) to perform text preprocessing and train the LDA models.
The script automatically generates the following outputs: topic-term distributions (CSV), document-topic assignments (Excel), word cloud visualizations (PNG), and interactive topic visualization (HTML using pyLDAvis).

Sharing / Access Information

Other publicly accessible locations of the data

No other public repository currently hosts this version of the dataset.

Data source

The bibliographic records were originally obtained from Scopus. Due to copyright restrictions, raw Scopus records cannot be publicly shared. The dataset provided here (S1_Dataset.csv) is a processed and anonymized version that complies with open data policies.

License

This dataset and accompanying code are released under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication.

A decade of digital competency: Emerging themes and trends

Data files

Abstract

README