The competition for high-quality AI-generated video is heating up.
On Monday, Runway, a company that develops generative AI tools for film and visual content creators, announced Gen-3 Alpha, its latest AI model that generates video clips from text descriptions and still images. Runway said Gen-3 offers “significant” improvements in generation speed and fidelity over Runway's previous flagship video model, Gen-2, while giving users greater control over the structure, style, and movement of the videos they create.
Gen-3 will be available soon to Runway subscribers, including enterprise customers and companies participating in Runway's Creative Partner Program.
“Gen-3 Alpha excels at generating expressive human characters with a wide range of actions, gestures and emotions,” Runway said in a blog post. “It's designed to interpret a wide range of styles and cinematic vocabulary. [and enable] Imaginative transitions and precise keyframing of elements within a scene.”
Gen-3 Alpha has limitations, perhaps most notably a maximum footage length of 10 seconds, but Runway co-founder Anastasis Germanidis promises that Gen-3 is just the first (and smallest) of several video generation models coming in a next-generation family of models trained on the upgraded infrastructure.
“Models can struggle with complex character and object interactions, and generation doesn't always follow the laws of physics exactly,” Germanidis told TechCrunch in an interview this morning. “This initial rollout will support 5- and 10-second high-resolution generation, with generation times noticeably faster than Gen-2; 5-second clips will take 45 seconds to generate, and 10-second clips will take 90 seconds to generate.”
Gen-3 Alpha, like all video generative models, was trained on a huge number of video and image examples so it could “learn” patterns from those examples to generate new clips. Where did the training data come from? Runway wouldn't say. Few generative AI vendors volunteer such information these days, in part because they see training data as a competitive advantage and keep it and the information associated with it secret.
“We have an in-house research team that oversees all training, and we train our models using curated in-house datasets,” Germanidis said, without saying more.
A sample of Runway's Gen-3 model. Note that the blurriness and low resolution are due to the video-to-GIF conversion tool TechCrunch used, not the Gen-3. Image credit: Runway
If vendors trained on publicly available data, including copyrighted data from the web, details of the training data could also be a potential source of intellectual property litigation, another deterrent to disclosing the details. Several cases currently before the courts have rejected vendors' fair use training data defenses, alleging that generative AI tools replicate artists' styles without the artists' permission, allowing users to generate new works that resemble the artists' originals without the artists being paid.
In the blog post announcing Gen-3 Alpha, Runway addressed copyright issues at some point, saying it consulted with artists in developing the models (it's unclear which artists). This is similar to what Germanidis told me during a fireside chat at TechCrunch's Disrupt conference in 2023.
“We are working closely with artists to identify the best approach to address this issue,” he said. “We are exploring different data partnerships to enable further growth and build the next generation of our models.”
In a blog post, Runway also said it will be releasing Gen-3 with a new set of safeguards, including a moderation system that will block attempts to generate videos from copyrighted images or content that violates Runway's terms of service. It is also developing a provenance system compatible with the C2PA standard, backed by Microsoft, Adobe, OpenAI and others, that will identify videos as coming from Gen-3.
“Our new and improved in-house visual and text moderation systems employ automated monitoring to filter out inappropriate or harmful content,” Germanidis said. “C2PA certification verifies the origin and authenticity of media produced on all Gen-3 models. As our models' capabilities and ability to generate high-fidelity content improve, we will continue to invest heavily in tuning and safety efforts.”
Image credit: Runway
Today's post also revealed that Runway is partnering with “leading entertainment and media organizations” to create custom versions of Gen-3 that allow for more “stylistically controlled” and consistent characters, targeting their “specific artistic and narrative requirements.” The company adds: “This means that generated characters, environments, and elements can maintain a consistent look and behavior across different scenes.”
A big unsolved problem with generative video models is control: getting the model to generate consistent video that aligns with the artistic intent of its creators. As my colleague Devin Coldewey recently wrote, even simple problems in traditional filmmaking, like choosing the color of a character's clothing, require workarounds with generative models because each shot is created independently of other shots. And even workarounds sometimes don't work, leaving editors with a huge amount of manual work.
Runway has raised more than $236.5 million from investors such as Google (which holds cloud computing credits) and Nvidia, as well as venture capital firms such as Amplify Partners, Felicis, and Coatue, and has been working closely with the creative industry as investment in generative AI technology grows. Runway operates Runway Studios, an entertainment division that acts as a production partner for enterprise clients, and hosts the AI Film Festival, one of the first events showcasing films produced entirely (or in part) by AI.
But the competition is becoming more intense.
Image credit: Runway
Last week, generative AI startup Luma unveiled Dream Machine, a video generator that's making waves for its ability to animate memes, and just a few months ago Adobe revealed it was developing its own video generation model trained on content from its Adobe Stock media library.
Others include existing tools like OpenAI's Sora, which remains heavily restricted, but OpenAI is encouraging marketing agencies and indie and Hollywood directors to use it. (OpenAI CTO Mira Murati attended the 2024 Cannes Film Festival.) This year's TriBeCa Film Festival, which is partnering with Runway to curate films made with AI tools, screened short films made with Sora by directors who were given early access.
Google is also making its image generation model, Veo, available to select creators, including Donald Glover (aka Childish Gambino) and his creative agency Gilga, and is working to bring Veo to products like YouTube Shorts.
Regardless of how the various collaborations play out, one thing is clear: generative AI video tools threaten to upend the film and TV industry as we know it.
Image credit: Runway
Film director Tyler Perry recently said he canceled plans for an $800 million expansion for his production studio because of Sora's potential, and Joe Russo, director of Marvel's flagship films such as “Avengers: Endgame,” predicts that AI will be able to make full-fledged movies within a year.
A 2024 study commissioned by the Animation Guild, a union representing Hollywood animators and cartoonists, found that 75% of film production companies that adopted AI cut, consolidated, or eliminated jobs after deploying the technology. The study also estimated that more than 100,000 jobs in the U.S. entertainment industry will be destroyed by generative AI by 2026.
Extremely strong labor protections will be needed to prevent video generation tools from following in the footsteps of other generative AI technologies and causing a sharp decline in demand for creative jobs.