Data from: Assessing ChatGPT for taxonomic and floristic studies
Data files
Apr 02, 2026 version files 94.45 MB
-
Chat_Analys.xlsx
485.64 KB
-
Chats_2.zip
26.17 MB
-
README.md
1.47 KB
-
Script_Similarity_Matrix.txt
694 B
-
Supp._Mat._1-7.zip
67.79 MB
Abstract
The advancement of biological sciences has long been linked to technological progress. ChatGPT, a generative artificial intelligence chatbot, produces human-like conversational responses and offers potential support for research across diverse scientific disciplines. However, its utility in natural sciences, particularly botanical research, remains largely unexplored. This study systematically evaluates ChatGPT’s performance in plant identification, creation of taxonomic keys, morphological description analysis, distribution mapping, and image-based species recognition. Across multiple tests involving taxa from different regions, the chatbot frequently produced inconsistent or incorrect outputs, including misidentification of species, erroneous synonymy assignments, fabricated references, and non-functional visualisations. Performance was especially limited for closely related species, hybrids, and microphotographs, while genus-level identification and text refinement were comparatively more reliable. These findings highlight the risks of relying on ChatGPT for botanical research, particularly for students and early-career researchers, emphasising the necessity of critical verification. Despite current limitations, ongoing improvements in AI suggest that future versions may offer more consistent and accurate support in biodiversity studies.
Dataset DOI: 10.5061/dryad.6t1g1jxdf
Description of the data and file structure
These are supplementary materials to the article "Assessing ChatGPT for taxonomic and floristic studies" in the Nordic Journal of Botany.
Code/software
Excel, PDF, R
Access information
Other publicly accessible locations of the data:
Data was derived from the following sources:
- As chats of the ChatGPT to test AI for botanical research (30 questions were asked in 8 countries) and their analysis.
Files
Script_Similarity_Matrix.txt (694 B)
This R script for difining of text similarity (%) of analysed ChatGPT chats using TF-IDF (Term Frequency – Inverse Document Frequency).
Supp._Mat._1-7.zip (67.79 MB)
This .zip folders contains 7 PDF files with pre-testing of ChatGPT in 2024 and methodology with 30 questions.
Chats_2.zip (26.17 MB)
This .zip folders contains 8 PDF files with testing of ChatGPT in 2025 in 8 countries using 30 questions from the methodology.
Chat_Analys.xlsx 485.64 KB
This Excel file with analysis and comparison of responses to the 30 questions from 8 chats, which was conducted manually.
Human subjects data
Some files may contain the names of co-authors of the dataset.
