Covariant this week announced the launch of RFM-1 (Robotics Foundation Model 1). Peter Chen, co-founder and CEO of the University of California, Berkeley's artificial intelligence spinout, said the platform is “basically a large-scale language model (LLM), but for robot languages,” he told TechCrunch. told.
RFM-1 is the result of, among other things, a large amount of data collected from the deployment of Covariant's Brain AI platform. With the customer's consent, the startup has been building a robot equivalent to her LLM database.
“The vision for RFM-1 is to power billions of robots in the future,” Chen said. “At Covariant, we have already deployed many robots in our warehouses with great success. But that is not the limit we want to reach. , and we really want to bring robots into people's homes.”
The platform launches as more robotics companies discuss the future of “general purpose” systems. The sudden onslaught of humanoid robot companies like Agility, Figure, 1X, and Apptronik are playing a pivotal role in this conversation. This form factor lends itself particularly well to adaptability (as well as the human being modeled), but the robustness of the onboard AI/software system is an entirely different matter.
For now, Covariant's software is primarily deployed on industrial robotic arms to perform a variety of familiar warehouse tasks, such as box picking. Although not currently deployed in humanoids, the company promises some degree of hardware independence.
“We like a lot of the work being done in the area of more general-purpose robotic hardware,” Chen says. “The combination of the intelligence inflection point and the hardware inflection point will lead to a further explosion in robotics applications, many of which are yet to be fully realized, especially on the hardware side. Hmm. It's very difficult to top stage footage. How many of you have had direct contact with humanoids? That tells you the level of maturity.”
But Covariant doesn't shy away from drawing comparisons to humans regarding the role RFM-1 plays in the robot's decision-making process. According to its press materials, the platform “provides robots with human-like reasoning capabilities and represents the first time that Generative AI has succeeded in giving commercial robots a deeper understanding of language and the physical world.”
This is one area where you have to be careful with your claims, both in terms of comparison to abstract, even philosophical, concepts, and their actual, real-world validity over time. “Human-like reasoning ability” is a broad concept and has many different meanings to many different people. Here, this concept is applied to a system's ability to process real-world data and determine the optimal course of action to perform the task at hand.
This is a departure from traditional robotic systems that program one job over and over again. These single-purpose robots have thrived in highly structured environments, including automotive assembly lines. As long as there are minimal changes to the task at hand, the robotic arm can repeat the task as many times as it wants without being hindered, eventually finishing the task and receiving gold as compensation for its years of loyal service. pocket watch can be recovered.
However, even the slightest deviation can cause things to break down quickly. Suppose an object is not placed precisely on the conveyor belt or the lighting is adjusted which affects the on-board camera. Such differences can have a significant impact on the robot's ability to perform. Now imagine having that robot work with new parts, new materials, or perform completely different tasks. That's even more difficult.
This is traditionally the point where the programmer intervenes. The robot needs to be reprogrammed. Often someone from outside the factory floor intervenes. This is a huge waste of resources and time. If you want to avoid this, you need to do one of two things. 1. The people working on the floor need to learn to code, or 2. There needs to be a new, more natural way to interact with the robot.
While it would be great to do the former, it seems unlikely that companies will invest the money and wait the necessary time. The latter is exactly what Covariant is trying to do with his RFM-1. “His ChatGPT for robots” isn't a perfect analogy, but it's a reasonable shorthand (especially considering the founder and his ties to OpenAI).
From a customer perspective, the platform appears as a text field, similar to current iterations of consumer-facing generative AI. When you type or speak a text command, such as “pick up an apple,” the system uses its training data (shape, color, size, etc.) to identify the object in front of you that best matches that description. To do.
RFM-1 then generates video results (essentially a simulation) and uses past training to determine the best course of action. This last part is similar to how our brain calculates the potential consequences of an action before performing it.
During the live demo, the system responds to inputs such as “pick up the red object” and the more semantically complex “pick up the object you put on your foot before putting it on.” This allows the robot to pick up objects correctly. An apple and a sock respectively.
There are many big ideas on the table when discussing the future of this system. At the very least, Covariant's founder has an impressive pedigree. Chen studied AI at Berkeley under Pieter Abbeel, Covariant's co-founder and principal investigator. Abbeel also became an early employee of OpenAI in 2016, one month after Chen joined his ChatGPT company. Covariant was founded the following year.
Chen said the company expects the new RFM-1 platform to work with “the vast majority” of hardware where Covariant software is already deployed.