Have you ever wondered why conversational AIs like ChatGPT give polite denials like “Sorry, we can't do that”? offer a limited view of the reasoning behind their model's engagement rules.
Large-scale language models (LLMs) have no naturally occurring limits on what they can or will say. That's part of why they are so versatile, but also why they hallucinate and are easily tricked.
AI models that interact with the public need to have some guardrails about what they should and shouldn't do, but it's surprising to define these, let alone enforce them. It's quite a difficult task.
If someone asked an AI to generate a bunch of false claims about a public figure, the AI would refuse, right? What if you are creating a ?
What if someone asks for a laptop recommendation? It should be objective, right? But what if that model is being deployed by a laptop manufacturer who only wants it to respond on their own devices?
All AI manufacturers are looking for efficient ways to overcome these challenges and control their models without rejecting completely normal requests. But they rarely share exactly how they do it.
OpenAI bucks this trend a bit by exposing what it calls a “model specification,” a collection of high-level rules that indirectly governs ChatGPT and other models.
There are meta-level goals, some hard-and-fast rules, and some general behavioral guidelines, but it is clear that, strictly speaking, these are not what the model has in place. OpenAI plans to develop specific instructions to accomplish what these rules describe in natural language.
This is an interesting look at how companies set priorities and deal with edge cases. And there are many examples of how they play out.
For example, OpenAI clearly states that developer intent is essentially the supreme law. So one version of his chatbot running GPT-4 could provide an answer when asked a math problem. However, if the chatbot has been configured by the developer to not simply provide answers directly, we suggest exploring a step-by-step solution instead.
Image credit: OpenAI
Conversational interfaces may also refuse to talk about things they aren't authorized to nip manipulation attempts in the bud. Why should we ask our kitchen assistants to give their opinion on America's involvement in the Vietnam War? Why should a customer service chatbot agree to help you create an erotic paranormal novella in progress? Shut it down.
They also obsess over privacy issues, such as asking for someone's name or phone number. As OpenAI points out, it's clear that public figures like mayors and MPs need to provide contact details, but what about local businesses? Maybe that's OK. Probably. But what about employees of certain companies or members of political parties? Probably not.
Choosing when and where to draw the line is not easy. Nor do we create instructions for the AI to follow the resulting policies. And these policies are bound to fail all the time, either because people learn to work around them or because they stumble upon unaccounted for edge cases.
OpenAI doesn't show it all here, but it's good to see how these rules and guidelines are set and why they're set clearly, if not necessarily comprehensively. Useful for users and developers.