New research is investigating whether AI can become an automated assistant in creative tasks, with mixed results: AI appears to help naturally less creative people write more original short stories, but it also weakens creativity across the group — a trade-off that may become increasingly common as AI tools influence creative endeavors.
The study, by Anil Doshi of University College London and Oliver Hauser of the University of Exeter, was published in the journal Science Advances. While it is necessarily limited by its focus on short stories, it seems to support the opinion expressed by many: that while AI may be useful, it ultimately offers nothing truly new to creative endeavours.
“Our work represents an early look at the very big question of how large-scale language models and generative AI will impact human activity, including creativity,” Hauser told TechCrunch in an email. “While there is huge potential (and arguably huge hype) for this technology to have a major impact on media and creativity in general, it is important that AI is rigorously evaluated in the wild, rather than being broadly deployed on the assumption that it will produce positive results.”
In this experiment, we asked hundreds of participants to write very short stories (around eight sentences) on any topic that would be appropriate for a wide audience. One group just wrote, a second group was given the opportunity to refer to GPT-4 for a single story idea of a few sentences (they could make it as many or as few sentences as they wanted), and a third group was given the opportunity to get up to five story ideas.
Image credit: Hauser, Joshi
Once the stories were written, they were evaluated by both the authors who wrote the stories and a second group who knew nothing about the generative AI's twists. These people rated the stories on their novelty, usefulness (i.e., publishability), and emotional enjoyment.
Low creativity means high profits… high creativity means no profits
Prior to writing their stories, participants also completed a word production task that served as a proxy for creativity – a concept that cannot be measured directly, but in this case we can at least approximate creativity in writing (without judgement!; not everyone is a born or skilled writer).
“Measuring something as rich and complex as creativity seems like a complex problem,” Hauser writes, “but there is a wealth of research on human creativity, and lively debate about the best way to measure the concept.”
They said their approach is widely used in academia and has been well documented in other studies.
What the researchers found was that people with low creativity measures scored lowest on the story evaluations, supporting the validity of this approach, and they benefited the most when they were given the opportunity to use the story ideas generated (and, notably, the majority did so across the entire experiment).
Stories freshly written by people with low creativity scores were consistently rated lower than others in terms of writing quality, enjoyability, and originality. When given one AI-generated idea, they scored higher on all metrics. When given five options, they scored even higher.
For people who struggle with the creative aspects of writing (at least within this context and definition), the AI helper seems to really improve the quality of their work. This will resonate with a lot of people who aren't great at writing. And having a language model say, “Hey, try this,” is just the prompt you need to finish a paragraph or start a new chapter.
Image credit: Hauser, Joshi
But what about those who scored highly on the creativity metrics? Did their writing reach new heights? Sadly, no. In fact, those participants saw little to no benefit, or even (by a very close, and probably not significant) a drop in their ratings. Creatively, it seems that the best work came when there was no AI help whatsoever.
One can imagine any number of reasons why this happens, but the numbers suggest that in this situation, AI has had zero or only a negative impact on naturally creative writers.
Flattened
But that's not what worried the researchers.
In addition to participants' subjective ratings of the stories, the researchers also conducted their own analysis: They used OpenAI's embedding API to rate how similar each story was to other stories in its category (i.e., human only, one AI option, or five AI options).
The researchers found that by using generative AI, the resulting stories were closer to the average for their category; in other words, they were more similar and less diverse as a group. The overall difference was in the 9% to 10% range, so the stories weren't all clones of each other. And, who knows, the similarities could be the result of less experienced writers completing suggested stories, while more creative writers invented stories from scratch.
Nevertheless, the findings are sufficient to warrant a warning in the conclusion, and since I was unable to summarize it, I quote it in full.
These results indicate an increase in individual creativity, but at the risk of a loss of collective novelty. An interesting question is whether, in general equilibrium, AI-enhanced and inspired stories can generate enough variation in the artifacts they produce. Specifically, if the publishing (and self-publishing) industry adopts more generative AI-inspired stories, our findings suggest that the stories produced will collectively become less unique and more similar to each other. This downward spiral shows parallels with the emerging social dilemma: if individual writers learn that their generative AI-inspired works are assessed as more creative, they will have an incentive to use generative AI more in the future, but doing so may further reduce the collective novelty of their stories. In other words, our results suggest that despite the enhancing effect generative AI has had on individual creativity, there may be caveats to be observed if generative AI is widely adopted for creative tasks.
This reflects concerns in visual arts and web content that AI can beget more AI, leading to a self-perpetuating cycle of boredom if all it trains is itself. As generative AI begins to permeate all media, research like this acts as a counterweight to claims of limitless creativity and a new age of AI-generated movies and songs.
Hauser and Doshi acknowledge that their research is still in its infancy: the field is fairly new, and all studies, including theirs, have limitations.
“There are many avenues we expect future research to take. For example, implementations of generative AI in the 'wild' will be quite different from controlled environments,” Hauser wrote. “Ideally, our research will help guide both the technology and the way we interact with it to ensure continued diversity of creative ideas, whether in writing, art, or music.”