Nvidia launches set of microservices for optimized inference

Nvidia today announced Nvidia NIM at the GTC conference, a new software platform designed to streamline the deployment of custom and pre-trained AI models into production environments. NIM is about taking the software work that Nvidia has done on model inference and optimization, combining a specific model with an optimized inference engine, packing this into a container, and making it accessible as a microservice. , so you can easily access your model.

Nvidia says it typically takes developers weeks, if not months, to ship similar containers, but that's if they have in-house AI talent. With NIM, NVIDIA is creating an ecosystem of AI-enabled containers using its own hardware as the foundation layer and selected microservices as the core software layer for enterprises looking to accelerate their AI roadmaps. It clearly aims to create.

NIM currently includes support for models from NVIDIA, A121, Adept, Cohere, Getty Images, and Shutterstock, as well as open models from Google, Hugging Face, Meta, Microsoft, Mistral AI, and Stability AI. Nvidia is already working with Amazon, Google, and Microsoft to make these his NIM microservices available on SageMaker, Kubernetes Engine, and Azure AI, respectively. These will also be integrated into frameworks such as Deepset, LangChain, and LlamaIndex.

Image credit: Nvidia

“We believe Nvidia GPUs are the best place to run inference for these models. […]And we believe NVIDIA NIM is the best software package, the best runtime, to build on top of, so developers can focus on their enterprise applications. And it would be best for developers if they left the task of creating these models to her NVIDIA. “Nvidia's head of enterprise computing, Manubir Das, said in a press conference ahead of today's announcement: ”

As for the inference engine, Nvidia uses Triton Inference Server, TensorRT and TensorRT-LLM. Some of the Nvidia microservices available through NIM include Riva for customizing speech and translation models, cuOpt for routing optimization, and Earth-2 models for weather and climate simulation.

The company plans to add features over time, for example by making Nvidia RAG LLM operators available as NIMs, which will make it much easier to build generative AI chatbots that can ingest custom data. It is expected that.

It wouldn't be a developer conference without some announcements from customers and partners. NIM's current users include Box, Cloudera, Cohesity, Datastax, Dropbox, and more.
And NetApp.

“Established enterprise platforms have a treasure trove of data that can be transformed into generative AI co-pilots,” said NVIDIA Founder and CEO Jensen Huang. “These containerized AI microservices, created in collaboration with our ecosystem of partners, are the building blocks for companies in any industry to become AI companies.”

Source link

Subscribe to Updates

What's Hot

Nvidia launches set of microservices for optimized inference

Related Posts

Leave A Reply Cancel Reply

Subscribe to Updates