NVIDIA Unveils New NIMs for Mistral and Mixtral AI Models

Iris Coleman
Jul 16, 2024 03:33

NVIDIA introduces new NIMs for Mistral and Mixtral models, enhancing AI project deployment with optimized performance and scalability.

Large language models (LLMs) are increasingly being adopted by enterprise organizations to enhance their AI applications. According to the NVIDIA Technical Blog, the company has introduced new NVIDIA NIMs (Neural Interface Modules) for Mistral and Mixtral models to streamline AI project deployments.

New NVIDIA NIMs for LLMs

Foundation models serve as powerful starting points for various enterprise needs, but they often require customization to perform optimally in production environments. NVIDIA’s new NIMs for Mistral and Mixtral models aim to simplify this process, offering prebuilt, cloud-native microservices that integrate seamlessly into existing infrastructure. These microservices are continuously updated to ensure optimal performance and access to the latest AI inference advancements.

Mistral 7B NIM

The Mistral 7B Instruct model is designed for tasks such as text generation, language translation, and chatbots. This model fits on a single GPU and, when deployed on NVIDIA H100 data center GPUs, can achieve up to 2.3x performance improvement in tokens per second for content generation compared to non-NIM deployments.

Mixtral-8x7B and Mixtral-8x22B NIMs

The Mixtral-8x7B and Mixtral-8x22B models utilize a Mixture of Experts (MoE) architecture, offering fast and cost-effective inference solutions. These models excel in tasks like summarization, question answering, and code generation, making them ideal for applications that require real-time responses. The Mixtral-8x7B NIM can see up to 4.1x improved throughput on four H100s, while the Mixtral-8x22B NIM can achieve up to 2.9x improved throughput on eight H100s for content generation and translation use cases.

Accelerating AI Application Deployments with NVIDIA NIM

Developers can leverage NIM to accelerate the deployment of AI applications, enhance AI inference efficiency, and reduce operational costs. The containerized models offer several benefits:

Performance and Scale

NIM provides low-latency, high-throughput AI inference that can easily scale, offering up to 5x higher throughput with the Llama 3 70B NIM. This allows for precise, fine-tuned models without the need for building from scratch.

Ease of Use

With streamlined integration into existing systems and optimized performance on NVIDIA-accelerated infrastructure, developers can quickly bring AI applications to market. The APIs and tools are designed for enterprise use, maximizing AI capabilities.

Security and Manageability

NVIDIA AI Enterprise ensures robust control and security for AI applications and data. NIM supports flexible, self-hosted deployments on any infrastructure, providing enterprise-grade software, rigorous validation, and direct access to NVIDIA AI experts.

The Future of AI Inference: NVIDIA NIMs and Beyond

NVIDIA NIM represents a significant advancement in AI inference. As the need for AI-powered applications grows, deploying these applications efficiently becomes crucial. Enterprises can use NVIDIA NIM to incorporate prebuilt, cloud-native microservices into their systems, speeding up product launches and staying ahead in innovation.

The future of AI inference involves linking multiple NVIDIA NIMs to create a network of microservices that can work together and adapt to various tasks. This will transform how technology is used across industries. For more information on deploying NIM inference microservices, visit the NVIDIA Technical Blog.

Image source: Shutterstock