Anthropic releases "System Prompts" that move Claude

Generative AI models aren't actually human-like: they have no intelligence or personality — they're simply statistical systems that predict the next word in a sentence — but like an intern at an autocratic workplace, they will unapologetically follow instructions, including initial “system prompts” that teach the model its basic behavior and what it should and shouldn't do.

All generative AI vendors, from OpenAI to Anthropic, use system prompts to control the overall tone and emotion of a model’s responses, to prevent (or at least try to prevent) the model from misbehaving. For example, a prompt might tell a model to be polite but never apologize, or to be honest about the fact that you can’t know everything.

However, vendors typically keep the system prompt secret, possibly for competitive reasons, but also because knowing it might suggest a workaround. For example, the only way to expose GPT-4o's system prompt is through a prompt injection attack. Even then, you cannot fully trust the system output.

But as part of its ongoing efforts to position itself as a more ethical and transparent AI vendor, Anthropic has rolled out system prompts for its latest models (Claude 3.5 Opus, Sonnet, and Haiku) in the Claude iOS and Android apps, and on the web.

Alex Albert, Anthropic's head of developer relations, said in an X post that Anthropic plans to make these types of disclosures periodically as it updates and tweaks its system prompts.

Added a new system prompts release notes section to the documentation to document the changes we made to the default system prompts in the Claude dot ai and mobile apps. (System prompts do not impact the API.) pic.twitter.com/9mBwv2SgB1

— Alex Albert (@alexalbert__) August 26, 2024

The latest prompts, dated July 12, explain very clearly what the Claude model cannot do. For example, “Claude cannot open URLs, links, or videos.” Facial recognition is a big no-no. Claude 3.5 Opus system prompts instruct the model to “always respond as if the face is completely invisible” and to “avoid identifying or naming people whose faces are visible in the image.” [images].”

But the prompts also describe specific personality traits and characteristics that Anthropic wants its Claude model to exemplify.

For example, in the Opus prompt, Claude says “[is] These include being “very intelligent and intellectually curious” and “enjoying hearing what people think about issues and debating a wide range of topics.” The guidelines also instruct Claude to be fair and objective about controversial topics, to offer “careful thought” and “clear information,” and to never begin responses with the words “certainly” or “absolutely.”

To me, as a human, these system prompts seem a bit odd, written in the style of a stage actor writing a character analysis sheet. Opus' prompt ends with “Claude is connecting with a human,” giving the impression that Claude is some kind of sentient entity on the other side of the screen whose sole purpose is to fulfill the whims of his human conversation partner.

But of course that's an illusion. If Claude's instructions teach us anything, it's that without human guidance and help, these models are frighteningly blank slates.

With this new system prompt changelog, a first for a major AI vendor, Anthropic is putting pressure on its competitors to publish similar changelogs. It remains to be seen whether this tactic is successful.

Source link

Subscribe to Updates

What's Hot

Anthropic releases “System Prompts” that move Claude

Related Posts

Leave A Reply Cancel Reply

Subscribe to Updates