The Rabbit r1 was the must-have gadget of early 2024, but that hope quickly faded when the company's grand promises fell flat. CEO Jesse Lyu admitted that “our expectations were too high on day one,” but an update to the device scheduled for this month will finally see the much-vaunted Large Action Model made available for free on the web.
Skeptics may (rightly) see this as too little, too late, or a change of goal, but Rabbit's goal of building a platform-independent agent for web and mobile apps has fundamental value, even if it remains largely theoretical.
In an interview with TechCrunch, Lyu said the past six months have been hectic with shipping, fixing bugs, improving response times, and adding minor features, but despite 16 over-the-air updates to r1, it's essentially limited to interacting with LLM and accessing seven specific services, including Uber and Spotify.
“It was the first version of LAM, trained on records that we collected from data workers, but it wasn't generic, it just connected to those services,” he said. Whether it would have been called LAM or not is largely a theoretical matter at this point. Whatever the model, it didn't offer the capabilities that Rabbit detailed in its debut.
Generalist Web-Based Agent
But Rabbit is ready to release the first general-purpose version of LAM, which Lyu demonstrated to me, that isn't specific to any particular app or interface.
This version is a web-based agent that infers the steps to perform any common task, such as buying concert tickets, registering on a website, or playing an online game.
“Our goal is very clear: At the end of September, suddenly with r1 you'll be able to do a lot more. Everything that any website can do will be supported,” Lyu said.
When it is given a task, it starts performing the task by first breaking it down into steps and analyzing what appears on the screen, regardless of its position or appearance: buttons, fields, images, etc. It then interacts with the appropriate elements based on what it has learned about the general behavior of websites.
I asked (through Lyu, who was operating remotely) to register a new website for the film festival. Every few seconds, he took action, searched domain registries on Google, selected one of them (a sponsored one, I think), typed film festival in the domain box, and selected “filmfestival2023.com” from the list of options that appeared for $14. Strictly speaking, I didn't give any constraints, like “for 2025” or “horror festival”.
Similarly, when Lyu searched for the r1 and instructed him to buy it, he was immediately taken to eBay, where dozens were for sale. A good result for the user, but not for the company founder, who was giving a presentation to the press. He laughed it off, added that people should only buy from the official website, and ran the prompt again. The agent was successful.
Next, he had it play Dictionary.com's daily word game. It took a bit of quick engineering (the model found a loophole that allowed it to quickly quit by pressing “end game”), but it did the trick.
But which browser will they use? Lyu said that they will use a new clean browser in the cloud, but are also working on a local version, like a Chrome extension, that will allow you to use your existing session and avoid the need to log in to the service.
As such, the agent does not have the credentials, as users are naturally (and understandably) wary of giving a company full access to them. Lyu suggested that in the future it may be possible to privately invoke a small, isolated language model that contains the credentials to perform the login. How this would work appears to be an open question, but is somewhat expected given how new this field is.
Still learning
In-app UI analysis example from Rabbit's website. Image courtesy of Rabbit
A few things came out of the demo: First, giving the company and its developers the benefit of the doubt that this isn't all an elaborate hoax (as some believe), this appears to be an actual working general-purpose web agent, and if not the first of its kind, it certainly represents the first that is easily accessible to consumers.
“There are companies that do verticals like Excel and legal documents, but I think this is one of the first universal agents for consumers,” Lyu says. “The idea is that anything that can be achieved through a website, you can do it. We'll have a universal agent for websites first, and then we'll have an agent for apps.”
Second, we found that rapid engineering is still very much needed: how a request is phrased can easily mean the difference between success and failure, and that's probably not something the general consumer would be willing to accept.
Liu cautioned that this is a “playground version” and by no means final, and that while it's a fully functional general Web agent, there's still room for improvement in many areas. For example, he said, “the model is smart enough to make a plan, but not smart enough to skip steps.” It can't “learn” that users don't want to buy electronics on eBay, or that after a search they should scroll down to avoid the wall of sponsored results.
No user data will be collected to improve the model… for now. Lyu says that's because such systems essentially have no way of evaluating them, making it hard to quantitatively determine whether improvements have been made. However, a “teach mode” is also coming, where you can show the robot how to perform certain types of tasks.
Interestingly, the company is also working on a desktop agent that can interact with apps like word processors, music players, and of course browsers. This is still in its early stages, but it's working: “You don't even have to type in a destination. You just try to use the computer. If there's an interface, you can control it.”
Third, there's still no “killer app,” at least not an obvious one. Agents are great, but unfortunately they're not very useful for me as I sit in front of a browser for eight hours a day. I'm almost certain there will be some great applications, but I can't think of one that shows browser-based automata are as obviously useful as, say, a robot vacuum cleaner.
Also, is it not possible to use the app?
Rabbit R1 in use. Hand model: Chris Velazco of The Washington Post. Image credit: Devin Coldewey / TechCrunch
I raised a general objection to Rabbit's entire business model, which was basically “this could be an app.”
Liu had clearly heard this criticism many times and was confident in his answer.
“If you do the math, it doesn't make sense,” he said. “Yes, it's technically feasible, but it would piss off Apple and Google from day one. They would never allow it to be better than Siri or Gemini. Apple's intelligence would never be able to control Google's, or vice versa. Plus they'd take 30% of the revenue! If they'd built an app from scratch, it would never have taken off like it did.”
Rabbit's basic pitch is that third-party AIs and devices can access and interact with all other services from the outside, just like users can. Lyu calls it a “cross-platform, general-purpose agent system.” “It controls all the UI. The website is a good start. Then it works its way to Windows, MacOS, and phones.”
By the way, “We never said we wouldn't make mobile phones in the future.” Does this seem at odds with the original claim of smaller, simpler devices? Maybe it is, or maybe not.
In the meantime, they are about to start delivering on a promise they made earlier this year: the new model should be available to all r1 owners once the OTA update is released sometime this week, at which point they will also receive instructions on how to activate it. With his usual understatement, Lyu cautioned expectant users:
“We're setting the expectations right. It's not perfect,” he said. “It's just the best that humanity has ever achieved.”
The kicker: A phone call…?