Skip to main content
Dryad

Generative AI enhances individual creativity but reduces the collective diversity of novel content

Cite this dataset

Doshi, Anil; Hauser, Oliver (2024). Generative AI enhances individual creativity but reduces the collective diversity of novel content [Dataset]. Dryad. https://doi.org/10.5061/dryad.qfttdz0pm

Abstract

Creativity is core to being human. Generative AI—made readily available by powerful large language models (LLMs)—holds promise for humans to be more creative by offering new ideas, or less creative by anchoring on generative AI ideas. We study the causal impact of generative AI ideas on the production of short stories in an online experiment where some writers obtained story ideas from an LLM. We find that access to generative AI ideas causes stories to be evaluated as more creative, better written, and more enjoyable, especially among less creative writers. However, generative AI-enabled stories are more similar to each other than stories by humans alone. These results point to an increase in individual creativity at the risk of losing collective novelty. This dynamic resembles a social dilemma: with generative AI, writers are individually better off, but collectively a narrower scope of novel content is produced. Our results have implications for researchers, policy-makers, and practitioners interested in bolstering creativity.

README: Dataset and Code for "Generative artificial intelligence enhances creativity but reduces the diversity of novel content"

by Anil R. Doshi and Oliver P. Hauser

Introduction

We recommend downloading the file "GenAI_creativity_data_and_scripts.zip" which contains all data (raw and processed) as well as the analysis code. Then please follow the steps below.

We provide two methods to perform the data analysis.

  1. Compile all files. This method processes the raw csv files and performs the analysis. Requires some knowledge of Python and an API key to OpenAI.
  2. Processed file analysis. This allows you to run the analysis on the already processed dta files.

1. Compile all files method

If you would like to compile all files, please follow these steps to ensure your machine is set up to run all the necessary scripts.

Machine setup

  1. Ensure Python is installed (tested with Python 3)
  2. Install the following packages in Python
  • numpy
  • scipy
  • openai

It may be necessary to install these packages from within Stata. Do so by first entering the Python environment in Stata using the python command and then pip install numpy and so forth.

  1. Ensure Stata is able to call Python (see help python Stata help documentation).
  2. Download dat.py from https://github.com/jayolson/divergent-association-task. Place file in scripts folder.
  3. Download words.txt from https://github.com/jayolson/divergent-association-task. Place file in scripts/words_glove folder.
  4. Download and extract glove.840B.300d.zip from https://nlp.stanford.edu/projects/glove/. Place file in scripts/words_glove folder.
  5. Obtain an OpenAI API key (see https://platform.openai.com/docs/api-reference/introduction).
  6. Install the following packages in Stata:
  • coefplot
  • colorpalette
  • dstat
  • estout
  • grc1leg2
  • marginsplot
  • moremata (version 2.0.1 or newer)
  • violinplot

Files setup

  1. In scripts/00_control_center.do, change the following:
  • Change local macro mywd to the filepath of the GenAI_creativity_scripts folder.
  • Set local macro compile_type to "all" (local compile_type = "all").
  1. In scripts/ai_human_story_similarity.py, change the following:
  • Change the openai.api_key variable by replacing 'XXX' with your API key.
  1. In compute_dats.py, change the following:
  • Change the genai_folder variable to the filepath of the GenAI_creativity_scripts folder.
  1. Run the scripts/00_control_center.do file.

Important note: Sometimes the call to the OpenAI API will fail. If that occurs, the do file will quit with an error. It will be necessary to rerun the script.

2. Processed file analysis

If you would like to use the already processed dta files (found in the already_processed_data folder) and perform the analysis, please follow these steps to ensure your machine is setup to run all the necessary scripts.

Machine setup

  1. Install the following packages in Stata:
  • coefplot
  • colorpalette
  • dstat
  • estout
  • grc1leg2
  • marginsplot
  • moremata (version 2.0.1 or newer)
  • violinplot

Files setup

  1. In scripts/00_control_center.do, change the following:
  • Change local macro mywd to the filepath of the GenAI_creativity_scripts folder.
  • Set local macro compile_type to "analysis" (local compile_type = "analysis").
  1. Run the scripts/00_control_center.do file.

Methods

This dataset is based on a pre-registered, two-phase experimental online study. In the first phase of our study, we recruited a group of N=293 participants (“writers”) who are asked to write a short, eight sentence story. Participants are randomly assigned to one of three conditions: Human only, Human with 1 GenAI idea, and Human with 5 GenAI ideas. In our Human only baseline condition, writers are assigned the task with no mention of or access to GenAI. In the two GenAI conditions, we provide writers with the option to call upon a GenAI technology (OpenAI’s GPT-4 model) to provide a three-sentence starting idea to inspire their own story writing. In one of the two GenAI conditions (Human with 5 GenAI ideas), writers can choose to receive up to five GenAI ideas, each providing a possibly different inspiration for their story. After completing their story, writers are asked to self-evaluate their story on novelty, usefulness, and several emotional characteristics.

In the second phase, the stories composed by the writers are then evaluated by a separate group of N=600 participants (“evaluators”). Evaluators read six randomly selected stories without being informed about writers being randomly assigned to access GenAI in some conditions (or not). All stories are evaluated by multiple evaluators on novelty, usefulness, and several emotional characteristics. After disclosing to evaluators whether GenAI was used during the creative process, we ask evaluators to rate the extent to which ownership and hypothetical profits should be split between the writer and the AI. Finally, we elicit evaluators’ general views on the extent to which they believe that the use of AI in producing creative output is ethical, how story ownership and hypothetical profits should be shared between AI creators and human creators, and how AI should be credited in the involvement of the creative output.

The data was collected on the online study platform Prolific. The data was then cleaned, processed and analyzed with Stata. For the Writer Study, of the 500 participants who began the study, 169 exited the study prior to giving consent, 22 were dropped for not giving consent, and 13 dropped out prior to completing the study. Three participants in the Human only condition admitted to using GenAI during their story writing exercise and—as per our pre-registration—they were therefore dropped from the analysis, resulting in a total number of writers and stories of 293. For the Evaluator Study, each evaluator was shown 6 stories (2 stories from each topic). The evaluations associated with the writers who did not complete the writer study and those in the Human only condition who acknowledged using AI to complete the story were dropped. Thus, there are a total of 3,519 evaluations of 293 stories made by 600 evaluators. Four evaluations remained for five evaluators, five evaluations remained for 71, and all six remained for 524 evaluators.