Generative AI enhances individual creativity but reduces the collective diversity of novel content
Data files
Jun 14, 2024 version files 2.48 MB
Abstract
Creativity is core to being human. Generative AI—made readily available by powerful large language models (LLMs)—holds promise for humans to be more creative by offering new ideas, or less creative by anchoring on generative AI ideas. We study the causal impact of generative AI ideas on the production of short stories in an online experiment where some writers obtained story ideas from an LLM. We find that access to generative AI ideas causes stories to be evaluated as more creative, better written, and more enjoyable, especially among less creative writers. However, generative AI-enabled stories are more similar to each other than stories by humans alone. These results point to an increase in individual creativity at the risk of losing collective novelty. This dynamic resembles a social dilemma: with generative AI, writers are individually better off, but collectively a narrower scope of novel content is produced. Our results have implications for researchers, policy-makers, and practitioners interested in bolstering creativity.
README: Dataset and Code for "Generative artificial intelligence enhances creativity but reduces the diversity of novel content"
by Anil R. Doshi and Oliver P. Hauser
Introduction
We recommend downloading the file "GenAI_creativity_data_and_scripts.zip" which contains all data (raw and processed) as well as the analysis code. Then please follow the steps below.
We provide two methods to perform the data analysis.
- Compile all files. This method processes the raw csv files and performs the analysis. Requires some knowledge of Python and an API key to OpenAI.
- Processed file analysis. This allows you to run the analysis on the already processed dta files.
1. Compile all files method
If you would like to compile all files, please follow these steps to ensure your machine is set up to run all the necessary scripts.
Machine setup
- Ensure Python is installed (tested with Python 3)
- Install the following packages in Python
- numpy
- scipy
- openai
It may be necessary to install these packages from within Stata. Do so by first entering the Python environment in Stata using the python
command and then pip install numpy
and so forth.
- Ensure Stata is able to call Python (see
help python
Stata help documentation). - Download
dat.py
from https://github.com/jayolson/divergent-association-task. Place file inscripts
folder. - Download
words.txt
from https://github.com/jayolson/divergent-association-task. Place file inscripts/words_glove
folder. - Download and extract
glove.840B.300d.zip
from https://nlp.stanford.edu/projects/glove/. Place file inscripts/words_glove
folder. - Obtain an OpenAI API key (see https://platform.openai.com/docs/api-reference/introduction).
- Install the following packages in Stata:
- coefplot
- colorpalette
- dstat
- estout
- grc1leg2
- marginsplot
- moremata (version 2.0.1 or newer)
- violinplot
Files setup
- In
scripts/00_control_center.do
, change the following:
- Change local macro
mywd
to the filepath of theGenAI_creativity_scripts
folder. - Set local macro
compile_type
to "all" (local compile_type = "all"
).
- In
scripts/ai_human_story_similarity.py
, change the following:
- Change the
openai.api_key
variable by replacing 'XXX' with your API key.
- In compute_dats.py, change the following:
- Change the genai_folder variable to the filepath of the
GenAI_creativity_scripts
folder.
- Run the
scripts/00_control_center.do
file.
Important note: Sometimes the call to the OpenAI API will fail. If that occurs, the do file will quit with an error. It will be necessary to rerun the script.
2. Processed file analysis
If you would like to use the already processed dta files (found in the already_processed_data
folder) and perform the analysis, please follow these steps to ensure your machine is setup to run all the necessary scripts.
Machine setup
- Install the following packages in Stata:
- coefplot
- colorpalette
- dstat
- estout
- grc1leg2
- marginsplot
- moremata (version 2.0.1 or newer)
- violinplot
Files setup
- In
scripts/00_control_center.do
, change the following:
- Change local macro
mywd
to the filepath of theGenAI_creativity_scripts
folder. - Set local macro
compile_type
to "analysis" (local compile_type = "analysis"
).
- Run the
scripts/00_control_center.do
file.
Methods
This dataset is based on a pre-registered, two-phase experimental online study. In the first phase of our study, we recruited a group of N=293 participants (“writers”) who are asked to write a short, eight sentence story. Participants are randomly assigned to one of three conditions: Human only, Human with 1 GenAI idea, and Human with 5 GenAI ideas. In our Human only baseline condition, writers are assigned the task with no mention of or access to GenAI. In the two GenAI conditions, we provide writers with the option to call upon a GenAI technology (OpenAI’s GPT-4 model) to provide a three-sentence starting idea to inspire their own story writing. In one of the two GenAI conditions (Human with 5 GenAI ideas), writers can choose to receive up to five GenAI ideas, each providing a possibly different inspiration for their story. After completing their story, writers are asked to self-evaluate their story on novelty, usefulness, and several emotional characteristics.
In the second phase, the stories composed by the writers are then evaluated by a separate group of N=600 participants (“evaluators”). Evaluators read six randomly selected stories without being informed about writers being randomly assigned to access GenAI in some conditions (or not). All stories are evaluated by multiple evaluators on novelty, usefulness, and several emotional characteristics. After disclosing to evaluators whether GenAI was used during the creative process, we ask evaluators to rate the extent to which ownership and hypothetical profits should be split between the writer and the AI. Finally, we elicit evaluators’ general views on the extent to which they believe that the use of AI in producing creative output is ethical, how story ownership and hypothetical profits should be shared between AI creators and human creators, and how AI should be credited in the involvement of the creative output.
The data was collected on the online study platform Prolific. The data was then cleaned, processed and analyzed with Stata. For the Writer Study, of the 500 participants who began the study, 169 exited the study prior to giving consent, 22 were dropped for not giving consent, and 13 dropped out prior to completing the study. Three participants in the Human only condition admitted to using GenAI during their story writing exercise and—as per our pre-registration—they were therefore dropped from the analysis, resulting in a total number of writers and stories of 293. For the Evaluator Study, each evaluator was shown 6 stories (2 stories from each topic). The evaluations associated with the writers who did not complete the writer study and those in the Human only condition who acknowledged using AI to complete the story were dropped. Thus, there are a total of 3,519 evaluations of 293 stories made by 600 evaluators. Four evaluations remained for five evaluators, five evaluations remained for 71, and all six remained for 524 evaluators.