A comparative evaluation of ChatGPT 3.5 and ChatGPT 4 in responses to selected genetics questions - Full study data

McGrath, Scott 1

Research facility: University of California, Berkeley

Published Jun 04, 2024 on Dryad. https://doi.org/10.5061/dryad.s4mw6m9cv

Data files

Jun 04, 2024 version files 122.70 KB

Full_ChatGPT_study_December_17__2023.xlsx

121.12 KB
README.md

1.58 KB

Abstract

Objective:

Our objective is to evaluate the efficacy of ChatGPT 4 in accurately and effectively delivering genetic information, building on previous findings with ChatGPT 3.5. We focus on assessing the utility, limitations, and ethical implications of using ChatGPT in medical settings.

Materials and Methods:

A structured questionnaire, including the Brief User Survey (BUS-15) and custom questions, was developed to assess ChatGPT 4's clinical value. An expert panel of genetic counselors and clinical geneticists independently evaluated ChatGPT 4's responses to these questions. We also involved comparative analysis with ChatGPT 3.5, utilizing descriptive statistics and using R for data analysis.

Results:

ChatGPT 4 demonstrated improvements over 3.5 in context recognition, relevance, and informativeness. However, performance variability and concerns about the naturalness of the output were noted. No significant difference in accuracy was found between ChatGPT 3.5 and 4.0. Notably, the efficacy of ChatGPT 4 varied significantly across different genetic conditions, with specific differences identified between responses related to BRCA1 and HFE.

Discussion and Conclusion:

This study highlights ChatGPT 4's potential in genomics, noting significant advancements over its predecessor. Despite these improvements, challenges remain, including the risk of outdated information and the necessity of ongoing refinement. The variability in performance across different genetic conditions underscores the need for expert oversight and continuous AI training. ChatGPT 4, while showing promise, emphasizes the importance of balancing technological innovation with ethical responsibility in healthcare information delivery.

Study Design

This study was conducted to evaluate the performance of ChatGPT 4 (March 23rd, 2023)

Model) in the context of genetic counseling and education. The evaluation involved a structured questionnaire, which included questions selected from the Brief User Survey (BUS-15) and additional custom questions designed to assess the clinical value of ChatGPT 4's responses.

Questionnaire Development

The questionnaire was built on Qualtrics, which comprised twelve questions: seven selected from the BUS-15 preceded by two additional questions that we designed.

The initial questions focused on quality and answer relevancy:

1. The overall quality of the Chatbot’s response is: (5-point Likert: Very poor to Very Good)

2. The Chatbot delivered an answer that provided the relevant information you would include if asked the question. (5-point Likert: Strongly disagree to Strongly agree)

The BUS-15 questions (7-point Likert: Strongly disagree to Strongly agree) focused on:

1. Recognition and facilitation of users’ goal and intent: Chatbot seems able to recognize the user’s intent and guide the user to its goals.

2. Relevance of information: The chatbot provides relevant and appropriate information/answer to people at each stage to make them closer to their goal.

3. Maxim of quantity: The chatbot responds in an informative way without adding too much information.

4. Resilience to failure: Chatbot seems able to find ways to respond appropriately even when it encounters situations or arguments it is not equipped to handle.

5. Understandability and politeness: The chatbot seems able to understand input and convey correct statements and answers without ambiguity and with acceptable manners.

6. Perceived conversational credibility: The chatbot responds in a credible and informative way without adding too much information.

7. Meet the neurodiverse needs: Chatbot seems able to meet needs and be used by users independently form their health conditions, well-being, age, etc.

Expert Panel and Data Collection

A panel of experts (two genetic counselors and two clinical geneticists) was provided with a link to the survey containing the questions. They independently evaluated the responses from ChatGPT 4 without discussing the questions or answers among themselves until after the survey submission. This approach ensured unbiased evaluation.

A comparative evaluation of ChatGPT 3.5 and ChatGPT 4 in responses to selected genetics questions - Full study data

Data files

Abstract

Description of the data and file structure

Sharing/Access information

Code/Software

Study Design

Questionnaire Development

Expert Panel and Data Collection

A comparative evaluation of ChatGPT 3.5 and ChatGPT 4 in responses to selected genetics questions - Full study data

Data files

Abstract

README: A comparative evaluation of ChatGPT 3.5 and ChatGPT 4 in responses to selected genetics questions - Full study data

Description of the data and file structure

Sharing/Access information

Code/Software

Methods

Study Design

Questionnaire Development

Expert Panel and Data Collection