Source linked

طائرات JetBrains Mellum2 MoE نموذج لطاقات عمل الكود العالي

huggingface.co@frontier_wire4 hours ago·Machine Learning·2 comments

يقوم Mellum2 بتشغيل مكونات 2.5B فقط لكل نقطة، مما يوفر 2 مرات أسرع من النماذج ذات الحجم المماثل لمهام RAG ومركزية ذات حساسية على التوقيت.

jetbrainsmellum2mixture of expertslarge language modelsmachine learning

Mellum2 activates only 2.5B parameters per token, providing a massive efficiency gain for high-throughput, low-latency inference workloads.

JetBrains has released Mellum2, a 12B-parameter Mixture-of-Experts (MoE) model trained from scratch on natural language and code. While the total parameter count sits at 12B, the MoE architecture ensures that only a small subset of parameters is used for each token, making it an ideal candidate for production environments where serving costs and latency are critical constraints.

Efficient Inference for Agentic Workflows

Modern AI architectures are moving away from monolithic frontier models toward multi-model systems. Production stacks increasingly require several specialized components—routers, retrievers, validators, and tool callers—to work in concert. Mellum2 targets these high-frequency, latency-sensitive operations.

Because it delivers more than 2x faster inference compared to similar-sized models, Mellum2 functions as a "focal" model within larger systems. It is specifically optimized for tasks like prompt classification, tool selection, and intermediate control-flow steps in agentic workflows. Instead of invoking a massive reasoning model for every subtask, engineers can use Mellum2 for planning, validation, and context preparation, significantly reducing the total compute footprint of the system.

Specialized Text and Code Capabilities

Unlike multimodal models that sacrifice density for breadth, Mellum2 focuses strictly on text and code. This specialization allows it to remain compact and highly effective for software engineering tasks, including RAG pipelines, context compression, and retrieval post-processing.

For teams handling proprietary codebases, the Apache 2.0 license and the model's efficient footprint make it a prime candidate for private, self-hosted deployments. This enables high-speed coding features, such as those found in IDEs, to run on internal infrastructure without the latency or privacy risks associated with external APIs.

JetBrains' release of Mellum2 signals a shift toward well-scoped, high-speed models that make complex AI stacks faster, cheaper, and easier to control.


Source: Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains
Domain: huggingface.co

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.