Amazon Web Services (AWS), Amazon's cloud computing division, announced a new family of multimodal generative AI models called Nova at Tuesday's re:Invent conference.
There are four text generation models: Micro, Lite, Pro, and Premier. Amazon CEO Andy Jassy said on stage that Micro, Lite and Pro will be available to AWS customers on Tuesday, with Premier expected to be available in early 2025.
In addition to these, there is an image generation model, Nova Canvas, and a video generation model, Nova Reel. Both were also released on AWS this morning.
“We have continued to work on our own frontier model,” says Jassy. And we thought that if we found value in them, maybe you would find value in them too. ”
Micro, Light, Pro, Premier
Text generation Nova models are optimized for 15 languages (primarily English) and vary in size and functionality.
Micro can only take in text and output text, but it has the lowest latency of the series and is the fastest to process text and generate a response.
Lite can process images, videos, and text input fairly quickly. Pro offers a balanced combination of accuracy, speed, and cost for a variety of tasks. Premier is the most feature-rich and designed for complex workloads.
Pro and Premier, like Lite, can analyze text, images, and videos. All three are suitable for tasks such as summarizing documents, graphs, meetings, and diagrams. However, AWS positions Premier as a “teacher” model for creating tailored custom models rather than a model to use alone.
Micro has a 128,000-token context window, which means it can process up to about 100,000 words. Lite and Pro have a context window of 300,000 tokens, which equates to approximately 225,000 words, 15,000 lines of computer code, or 30 minutes of footage.
According to AWS, in early 2025, the context window for certain Nova models will be expanded to support more than 2 million tokens.
Jassy claims that the Nova model is one of the fastest in its class and has the lowest cost to run. These are available on AWS Bedrock, Amazon's AI development platform, where you can fine-tune and extract text, images, and videos for increased speed and efficiency.
“We have optimized these models to work with our own systems and APIs, so they make it easier to perform multiple coordinated automated steps (agent actions). ” Jassy added. “So I think these are very compelling.”
canvas and reel
Canvas and Reel are the most powerful features yet for AWS generated media.
Canvas allows users to generate and edit images using prompts (such as removing backgrounds) and provides control over the color scheme and layout of the generated images. Reel, the more ambitious of the two models, creates videos up to 6 seconds long from prompts or optional reference images. Reel allows users to adjust camera movement to produce videos that include panning, 360-degree rotation, and zooming.
Reel is currently limited to 6-second videos (which take about 3 minutes to generate), but AWS says a version that can create 2-minute videos is “coming soon.”
A sample is shown below.
Image credit: AWS
And one more thing:
Image credit: AWS
Here are the images from Canvas:
According to AWS, Canvas can generate images in a variety of styles, enhance existing images, and insert objects into your scene. Image credit: AWS
Jassy emphasized that both Canvas and Reel have “built-in” controls for responsible use, such as watermarking and content moderation. “[We’re trying] This is to limit the generation of harmful content,” he said.
AWS details the safeguards in a blog post and says Nova will “extend” them. [its] Safety measures to combat the spread of misinformation, child sexual abuse content, chemical, biological, radiological, or nuclear risks. ” However, it is not clear what this means in practice or what form the measure will take.
AWS also remains vague about exactly what data it uses to train all of its generative models. The company previously told TechCrunch only that this is a combination of proprietary and licensed data.
Few vendors actively disclose such information. They view training data as a competitive advantage, so they keep the data and related information strictly confidential. Training data details are also a potential source of intellectual property-related litigation, preventing much from being revealed.
In return for transparency, AWS offers an indemnity policy that covers customers if one of its models spits out a potentially copyrighted still image (i.e., spits out a mirror copy of it). I am.
So what's next for Nova? Jassy said AWS is working on a voice-to-speech model (one that inputs audio and outputs a transformed version of it) for Q1 2025, with an “any-to-any” model by mid-2025. ” He is said to be working on a model. .
Image credit: Frederic Lardinois/TechCrunch
Amazon says the speech synthesis model can also interpret verbal and nonverbal cues, such as tone and rhythm, to provide natural, “human-like” speech. As for the any-to-any model, it theoretically powers applications ranging from translators to content editors to AI assistants.
Of course, this assumes that you are not experiencing any problems.
“You'll be able to input text, audio, images, or video and output text, audio, images, or video,” Jassy said of the any-to-any model. “This is the future of how frontier models will be built and consumed.”