119B parameter hybrid model with reasoning, vision, and code capabilities (1M token context)
1.9K
Mistral Small 4 is a powerful hybrid model capable of acting as both a general instruction model and a reasoning model. It unifies the capabilities of three different model families—Instruct, Reasoning (previously called Magistral), and Devstral—into a single, unified model.
With 119 billion parameters and a sparse mixture-of-experts architecture that activates only 6.5B parameters per token, Mistral Small 4 delivers exceptional performance with remarkable efficiency. The model features multimodal capabilities, accepting both text and image input while generating text output, and supports an impressive 1 million token context window.
In a latency-optimized setup, Mistral Small 4 achieves a 40% reduction in end-to-end completion time, and in a throughput-optimized setup, it handles 3x more requests per second compared to Mistral Small 3. The model offers flexible reasoning modes—users can toggle between fast instant reply mode and deep reasoning mode to boost performance with test-time compute when needed.
| Attribute | Value |
|---|---|
| Provider | Mistral AI |
| Architecture | Mistral4 (MoE: 128 experts, 4 active) |
| Parameters | 119B total, 6.5B active per token |
| Cutoff date | November 1, 2024 |
| Languages | Arabic, English, French, Spanish, German, Italian, Portuguese, Dutch, Japanese, Korean, Chinese |
| Input modalities | Text, Image |
| Output modalities | Text |
| Context length | 1,048,576 tokens (1M) |
| License | Apache 2.0 |
docker model run mistral-small4
For more information, check out the Docker Model Runner docs.
Mistral Small 4 supports a per-request reasoning_effort parameter that allows users to choose between fast responses (reasoning_effort="none") and deep step-by-step reasoning (reasoning_effort="high").


Mistral Small 4 with reasoning achieves competitive scores, matching or surpassing GPT-OSS 120B across multiple benchmarks while generating significantly shorter outputs. On AA LCR, Mistral Small 4 scores 0.72 with just 1.6K characters, whereas Qwen models require 3.5-4x more output (5.8-6.1K) for comparable performance. This efficiency reduces latency, inference costs, and improves user experience.



Mistral Small 4 is designed for a wide range of applications:
reasoning_effort="high") produces longer outputs with explicit thinking steps, which may impact latency for applications requiring instant responsesThis model card was automatically generated using cagent-action. Want to learn more about Docker Model Runner? Check out the project repository: https://github.com/docker/model-runner.
Content type
Model
Digest
sha256:fcbaef279…
Size
69.5 GB
Last updated
10 days ago
docker model pull ai/mistral-small4Pulls:
949
Last week