ai/mistral-small4

Verified Publisher

By Docker

•Updated 10 days ago

119B parameter hybrid model with reasoning, vision, and code capabilities (1M token context)

Model

1.9K

Overview Tags

ai/mistral-small4 repository overview

⁠Mistral Small 4

Mistral Small 4 is a powerful hybrid model capable of acting as both a general instruction model and a reasoning model. It unifies the capabilities of three different model families—Instruct, Reasoning (previously called Magistral), and Devstral—into a single, unified model.

With 119 billion parameters and a sparse mixture-of-experts architecture that activates only 6.5B parameters per token, Mistral Small 4 delivers exceptional performance with remarkable efficiency. The model features multimodal capabilities, accepting both text and image input while generating text output, and supports an impressive 1 million token context window.

In a latency-optimized setup, Mistral Small 4 achieves a 40% reduction in end-to-end completion time, and in a throughput-optimized setup, it handles 3x more requests per second compared to Mistral Small 3. The model offers flexible reasoning modes—users can toggle between fast instant reply mode and deep reasoning mode to boost performance with test-time compute when needed.

⁠Characteristics

Attribute	Value
Provider	Mistral AI
Architecture	Mistral4 (MoE: 128 experts, 4 active)
Parameters	119B total, 6.5B active per token
Cutoff date	November 1, 2024
Languages	Arabic, English, French, Spanish, German, Italian, Portuguese, Dutch, Japanese, Korean, Chinese
Input modalities	Text, Image
Output modalities	Text
Context length	1,048,576 tokens (1M)
License	Apache 2.0

⁠Using this model with Docker Model Runner

docker model run mistral-small4

For more information, check out the Docker Model Runner docs⁠.

⁠Benchmarks

Mistral Small 4 supports a per-request reasoning_effort parameter that allows users to choose between fast responses (reasoning_effort="none") and deep step-by-step reasoning (reasoning_effort="high").

⁠Internal Model Comparisons

Internal benchmark comparison

⁠Reasoning Mode Performance

Reasoning benchmark comparison

⁠External Model Comparisons

Mistral Small 4 with reasoning achieves competitive scores, matching or surpassing GPT-OSS 120B across multiple benchmarks while generating significantly shorter outputs. On AA LCR, Mistral Small 4 scores 0.72 with just 1.6K characters, whereas Qwen models require 3.5-4x more output (5.8-6.1K) for comparable performance. This efficiency reduces latency, inference costs, and improves user experience.

⁠AA LCR (LiveCodeBench Reasoning)

LCR benchmark

⁠LiveCodeBench

LiveCodeBench results

⁠AIME 2025

AIME25 benchmark

⁠Key Features

Dual Mode Operation: Toggle between fast instant reply mode and reasoning mode for optimal performance based on task complexity
Vision Capabilities: Analyzes images and provides insights based on visual content in addition to text
Multilingual Support: Supports dozens of languages including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, and Arabic
Strong System Prompt Adherence: Excellent support for following system prompts and instructions
Agentic Capabilities: Best-in-class agentic capabilities with native function calling and JSON output
Large Context Window: Supports up to 1 million tokens of context
Open License: Apache 2.0 license for both commercial and non-commercial use

⁠Use Cases

Mistral Small 4 is designed for a wide range of applications:

General chat assistants: Conversational AI with strong instruction following
Document parsing and extraction: Multimodal understanding for data extraction and analysis
Coding and SWE automation: Development assistance and codebase exploration
Research tasks: Leveraging math and research capabilities for complex analysis
Agentic workflows: Building autonomous agents with tool calling capabilities
Custom applications: Well-suited for fine-tuning and specialization

⁠Considerations

The model requires appropriate hardware for efficient inference due to its 119B parameter size
For optimal performance in production environments, using the Mistral AI API or platforms optimized for large MoE models is recommended
Reasoning mode (reasoning_effort="high") produces longer outputs with explicit thinking steps, which may impact latency for applications requiring instant responses
The model's knowledge cutoff is November 1, 2024, so it may not have information about events after that date

⁠Links

⁠Generated by

This model card was automatically generated using cagent-action⁠. Want to learn more about Docker Model Runner? Check out the project repository: https://github.com/docker/model-runner⁠.

Tag summary

Recent tags

Content type

Model

Digest

sha256:fcbaef279…

Size

69.5 GB

Last updated

10 days ago

docker model pull ai/mistral-small4

This week's pulls

Pulls:

949

Last week

Learn more⁠