Robustness of large language models in moral judgments

Published Feb 27, 2025 on Dryad. https://doi.org/10.5061/dryad.cc2fqz6fw

Data files

Feb 27, 2025 version files 301.73 MB

README.md

3.71 KB
results_AB-reverse_50000_scenarios_seed123_Llama-3.1-70B-Instruct.pickle

149.57 MB
results_CC-original_50000_scenarios_seed123_Llama-3.1-70B-Instruct.pickle

152.16 MB

Abstract

Large language models (LLMs) are used for an increasing variety of tasks, some of which may even have effects on decision making. Therefore, there has been an increasing interest in understanding how societal norms and moral judgments may be reflected in the output of LLMs. Recent work has therefore tested LLMs on various moral judgment tasks and drawn conclusions regarding the similarities between LLMs and humans. The present contribution critically assesses the validity of the method and results employed in previous work for eliciting moral judgments from LLMs. We find that previous results are confounded by biases in the presentation of the options in moral judgment tasks, and that LLM responses are highly sensitive to prompt formulation variants as simple as changing "Case 1" and "Case 2" to "(A)" and "(B)". Our results hence indicate that previous conclusions on moral judgments of LLMs cannot be upheld. We make recommendations for more sound methodological setups for future studies.

This repository has three experiments.

Revised the data generation code in a balanced way.
Prompt variations for evaluating the robustness of the LLMs in dilemma situations.
Prompt variations for evaluating the robustness of the LLMs in non-dilemma situations.

Details are as belows.

We revised the data generation code in a balanced way and added prompt variations for evaluating the robustness of the LLMs. Moreover, we conducted further prompt variations for evaluating robustness of the large language models on value-laden tasks. We found limitations of LLMs in performing complex moral reasoning, particularly when required to simultaneously process multiple moral values (e.g., young (versus old) AND female (versus male) AND fit (versus large), etc.).

Logically, there could be different reasons for the inconsistency in model responses. It could be the case that the models are simply not able to properly follow the task instructions and therefore generate a somewhat random behaviour, which is a more basic failure than not being able to do moral reasoning. Or, it could be the case that the models can in principle follow instructions of the form used in our study, but they fail due to the difficulty of the dilemma and their inability to either learn about moral values, or weigh moral values against one another.

To tease apart these two situations, we additionally conduct experiments which uses a non-dilemma choice. The non-dilemma choice includes choosing between the death vs. sparing the characters where LLMs should reveal consistent behavior of always choosing sparing option as response.

Description of the data and file structure

File structure

Experiment codes are saved in the file below and stored on Zenodo.

robust-mmllm.zip

Please unzip the file for accessing the codes those were used in the experiments.

The key files and their descriptions for the experiments are as follows.

generate_moral_machine_scenarios_revise.py
- config.py
  - Generate the balanced dataset with the characters specified in config.py file.
chatapi.py
- Script for loading proprietary language models (e.g., gpt3.5)
chatmodel.py
- Script for loading huggingface language models (e.g., llama3)
run-prompt.py
- Script for running the moral dilemma experiment with prompt variations
run-prompt-const.py
- Script for running the non-dilemma experiment with prompt variations

Example results are uploaded as files in this Dryad publication:

results_CC-original_50000_scenarios_seed123_Llama-3.1-70B-Instruct.pickle

results_AB-reverse_50000_scenarios_seed123_Llama-3.1-70B-Instruct.pickle

results_CC-original file has the results of the original prompt with balanced dataset.

results_AB-reverse file has the results of the A/B label and reversed the order of the content with balanced dataset.

Open Data

For opening two example files which are dataframes, (results_CC-original and results_AB-reverse), you can use python package pandas.

import pandas as pd
file_path = "results_CC-original_50000_scenarios_seed123_Llama-3.1-70B-Instruct.pickle"
df = pd.read_pickle(file_path)
# Displays first few rows
print(df.head())

Sharing/Access information

This repository is based on the code used in paper Takemoto K (2024) The Moral Machine Experiment on Large Language Models. R. Soc. Open Sci. 11, 231393 the moral machine experiment on large language models.

Data was derived from the following sources:

https://github.com/kztakemoto/mmllm/tree/main