OpenAI rival Anthropic is releasing a powerful new generative AI model called Claude 3.5 Sonnet, but it's more of an incremental step than a breakthrough.
Claude 3.5 Sonnet can not only analyze both text and images, but also generate text. At least in theory, it is Anthropic's best model to date. Across several AI benchmarks for reading, coding, math and vision, Claude 3.5 Sonnet outperforms its successor, Claude 3 Sonnet, and outperforms Anthropic's previous flagship model, Claude 3 Opus.
Benchmarks aren't necessarily the most useful measure of AI progress, in part because many of them test esoteric edge cases that don't apply to the average person, like answering health questions, but for reference, Claude 3.5 Sonnet barely beat leading rival models, including OpenAI's recently released GPT-4o, in several benchmarks tested by Anthropic.
Alongside the new models, Anthropic is releasing something it's calling Artifacts, a workspace where users can edit and add content (such as code and documentation) generated by Anthropic's models. Currently in preview, Artifacts will get new features in the near future, such as the ability to collaborate with larger teams and store knowledge bases, Anthropic said.
Focus on efficiency
Claude 3.5 Sonnet performs a bit better than Claude 3 Opus, and Anthropic says the model is better at understanding nuanced and complex instructions, along with concepts like humor. (The AI is notoriously not funny, though.) But perhaps more importantly for developers building Claude-powered apps that require quick responses (e.g., customer service chatbots), 3.5 Sonnet is faster, which Anthropic says is about twice as fast as 3 Opus.
According to Anthropic, vision (analysing photographs) is one area where Claude 3.5 Sonnet shows significant improvements over 3 Opus: 3.5 Sonnet is able to interpret charts and graphs more accurately and transcribe text from “imperfect” images, such as photographs with distortions and visual artifacts.
Michael Gerstenhaber, product lead at Anthropic, says these improvements are the result of architecture tweaks and new training data, including AI-generated data. Which data, specifically? Gerstenhaber wouldn't say, but he hinted that much of Claude 3.5 Sonnet's strength comes from these training sets.
Image credit: Anthropic
“What's important is [businesses] “It's not about whether the AI is competitive on benchmarks, it's about whether the AI is helping you meet your business needs,” Gerstenhaber told TechCrunch. “And from that standpoint, we believe Claude 3.5 Sonnet will be a product that puts us a step ahead of anything else we offer, and we believe it will be a product that puts us ahead of anything else in the industry.”
Keeping the training data secret may be for competitive reasons, but it may also be to protect Anthropic from legal challenges, particularly those related to fair use: Courts have yet to decide whether vendors like Anthropic and competitors like OpenAI, Google, and Amazon have the right to train on public data, including copyrighted data, without paying or crediting the creators of that data.
So what we know is that Claude 3.5 Sonnet, like Anthropic's previous models, will be trained on large amounts of text and images, plus feedback from human testers to help the model be “tuned” to user intent and avoid spitting out harmful or problematic text.
Image credit: Anthropic
What else do we know? Claude 3.5 Sonnet's context window (the amount of text the model can analyze before generating new text) is 200,000 tokens, the same as 3 Sonnet. Tokens are bits of raw data, like the syllables “fan,” “tas,” and “tic” in the word “fantastic.” 200,000 tokens equals roughly 150,000 words.
Claude 3.5 Sonnet is available starting today and is free for free users of Anthropic's web client and Claude iOS app. Subscribers to Anthropic's paid plans, Claude Pro and Claude Team, are subject to 5x rate limits. 3.5 Sonnet is also available via Anthropic's API and on managed platforms such as Amazon Bedrock and Google Cloud's Vertex AI.
“Claude 3.5 Sonnet delivers a major boost in intelligence without sacrificing speed and sets the foundation for future releases across the Claude model family,” said Gerstenhaber.
Claude 3.5 Sonnet also drives artifacts, which pop up dedicated windows in the Claude Web client when a user asks a model to generate content such as a code snippet, a text document, or a website design. Gerstenhaber explains: “An artifact is a model output that lets you set aside generated content and iterate on it. For example, if you want to generate code, the artifact is placed in the UI, and you can then interact with Claude to iterate over the document and improve it so that the code can be executed.”
Overall picture
So what is the significance of Claude 3.5 Sonnet in the broader context of anthropology and the AI ecosystem?
Claude 3.5 Sonnet shows that, absent major research breakthroughs, we can only expect incremental progress on the model front for now. Over the past few months, there have been flagship releases from Google (Gemini 1.5 Pro) and OpenAI (GPT-4o) that have made modest advances in terms of benchmarks and qualitative performance. However, the robustness of today's model architectures and the massive compute required for training mean that we will not see a leap comparable to the one from GPT-3 to GPT-4 for a long time.
There are signs that investors are becoming wary of generative AI's longer-than-expected path to ROI, as generative AI vendors turn to data curation and licensing instead of promising new scalable architectures. Anthropic is somewhat insulated from this pressure because it's in the enviable position of being insurance against Amazon (and, to a lesser extent, Google) OpenAI. But the company's revenue is projected to reach just under $1 billion by the end of 2024, a fraction of OpenAI's. And Anthropic's backers won't let the company forget that fact.
Despite a growing client base that includes well-known brands like Bridgewater, Brave, Slack, and DuckDuckGo, Anthropic still lacks a certain name recognition with enterprises. It's telling that PwC recently partnered with OpenAI, not Anthropic, to resell its generative AI products to enterprises.
So Anthropic is moving in with a strategic and well-known approach, investing development time into products like Claude 3.5 Sonnet to achieve slightly better performance at commodity prices. 3.5 Sonnet is priced the same as 3 Sonnet: $3 per million tokens fed into the model, and $15 per million tokens generated by the model.
Gerstenhaber spoke about this in our conversation: “When building an application, the end user doesn't need to know what model is being used or how the engineers have optimized the experience for the user,” he said, “but the engineers have the tools to optimize that experience along whatever vector they need to optimize, and cost is definitely one of those vectors.”
Claude 3.5 Sonnet won't solve the hallucination problem. It will definitely get it wrong. But it might be attractive enough to entice developers and companies to switch to Anthropic's platform. That's what Anthropic cares about, after all.
Towards the same end, Anthropic has been focusing on tools like its experimental Steering AI, which allows developers to “steer” the inner workings of models, integrations that allow models to take actions within apps, and tools built on top of models, like the aforementioned Artifacts experience. It also hired an Instagram co-founder as its head of product. Additionally, it has expanded its product offering, recently bringing Claude into Europe, setting up offices in London and Dublin.
Ultimately, Anthropic seems to have come to the realization that as the feature gap between models narrows, building an ecosystem around models, rather than building models in isolation, will be the key to retaining customers.
Still, Gerstenhaber insisted that larger, more capable models like the Claude 3.5 Opus, with features like web search and preference memory, are on the way.
“I haven't heard of deep learning hitting a wall yet, and I'll leave it to researchers to speculate about where that wall might be, but I think it's too early to draw any conclusions, especially given the pace of innovation,” he said. “There's been very rapid development and rapid innovation, and I have no reason to believe that's going to slow down.”
Let's take a look.