How Anthropic found a trick to give AI answers it wasn't supposed to

If you build it, people will try to tear it down. Sometimes even the people who make things end up breaking them. Anthropic and its latest research demonstrates an interesting vulnerability in his current LLM technology. More or less, if you keep asking questions, you might end up breaking guardrails and letting the larger language model tell you that it wasn't designed that way. How to make a bomb.

Of course, given advances in open source AI technology, you could launch your own LLM locally and ask for anything, but for more consumer-oriented functionality, this is a question worth pondering. What's interesting about AI today is how rapidly it's advancing, and how well (or not) we're doing as a species to better understand what we're building. is.

If you'll forgive me, I suspect that as LLMs and other new AI model types get smarter and bigger, we'll see more questions and problems of the kind outlined by Anthropic . I'm probably doing the same thing myself. But the closer we get to generalized AI intelligence, the more AI should resemble thinking entities rather than computers that can be programmed. If so, it might become even harder to identify edge cases to the point where the task becomes unfeasible? Anyway, let's talk about what Anthropic recently shared.

Source link

Subscribe to Updates

What's Hot

How Anthropic found a trick to give AI answers it wasn't supposed to

Related Posts

Leave A Reply Cancel Reply

Subscribe to Updates