ai/mistral-small4

Verified Publisher

By Docker

Updated 10 days ago

119B parameter hybrid model with reasoning, vision, and code capabilities (1M token context)

Model
0

1.9K

ai/mistral-small4 repository overview

Mistral Small 4

Mistral Small 4 is a powerful hybrid model capable of acting as both a general instruction model and a reasoning model. It unifies the capabilities of three different model families—Instruct, Reasoning (previously called Magistral), and Devstral—into a single, unified model.

With 119 billion parameters and a sparse mixture-of-experts architecture that activates only 6.5B parameters per token, Mistral Small 4 delivers exceptional performance with remarkable efficiency. The model features multimodal capabilities, accepting both text and image input while generating text output, and supports an impressive 1 million token context window.

In a latency-optimized setup, Mistral Small 4 achieves a 40% reduction in end-to-end completion time, and in a throughput-optimized setup, it handles 3x more requests per second compared to Mistral Small 3. The model offers flexible reasoning modes—users can toggle between fast instant reply mode and deep reasoning mode to boost performance with test-time compute when needed.


Characteristics

AttributeValue
ProviderMistral AI
ArchitectureMistral4 (MoE: 128 experts, 4 active)
Parameters119B total, 6.5B active per token
Cutoff dateNovember 1, 2024
LanguagesArabic, English, French, Spanish, German, Italian, Portuguese, Dutch, Japanese, Korean, Chinese
Input modalitiesText, Image
Output modalitiesText
Context length1,048,576 tokens (1M)
LicenseApache 2.0

Using this model with Docker Model Runner

docker model run mistral-small4

For more information, check out the Docker Model Runner docs.

Benchmarks

Mistral Small 4 supports a per-request reasoning_effort parameter that allows users to choose between fast responses (reasoning_effort="none") and deep step-by-step reasoning (reasoning_effort="high").

Internal Model Comparisons

Internal benchmark comparison

Reasoning Mode Performance

Reasoning benchmark comparison

External Model Comparisons

Mistral Small 4 with reasoning achieves competitive scores, matching or surpassing GPT-OSS 120B across multiple benchmarks while generating significantly shorter outputs. On AA LCR, Mistral Small 4 scores 0.72 with just 1.6K characters, whereas Qwen models require 3.5-4x more output (5.8-6.1K) for comparable performance. This efficiency reduces latency, inference costs, and improves user experience.

AA LCR (LiveCodeBench Reasoning)

LCR benchmark

LiveCodeBench

LiveCodeBench results

AIME 2025

AIME25 benchmark

Key Features

  • Dual Mode Operation: Toggle between fast instant reply mode and reasoning mode for optimal performance based on task complexity
  • Vision Capabilities: Analyzes images and provides insights based on visual content in addition to text
  • Multilingual Support: Supports dozens of languages including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, and Arabic
  • Strong System Prompt Adherence: Excellent support for following system prompts and instructions
  • Agentic Capabilities: Best-in-class agentic capabilities with native function calling and JSON output
  • Large Context Window: Supports up to 1 million tokens of context
  • Open License: Apache 2.0 license for both commercial and non-commercial use

Use Cases

Mistral Small 4 is designed for a wide range of applications:

  • General chat assistants: Conversational AI with strong instruction following
  • Document parsing and extraction: Multimodal understanding for data extraction and analysis
  • Coding and SWE automation: Development assistance and codebase exploration
  • Research tasks: Leveraging math and research capabilities for complex analysis
  • Agentic workflows: Building autonomous agents with tool calling capabilities
  • Custom applications: Well-suited for fine-tuning and specialization

Considerations

  • The model requires appropriate hardware for efficient inference due to its 119B parameter size
  • For optimal performance in production environments, using the Mistral AI API or platforms optimized for large MoE models is recommended
  • Reasoning mode (reasoning_effort="high") produces longer outputs with explicit thinking steps, which may impact latency for applications requiring instant responses
  • The model's knowledge cutoff is November 1, 2024, so it may not have information about events after that date
Generated by

This model card was automatically generated using cagent-action. Want to learn more about Docker Model Runner? Check out the project repository: https://github.com/docker/model-runner.

Tag summary

Content type

Model

Digest

sha256:fcbaef279

Size

69.5 GB

Last updated

10 days ago

docker model pull ai/mistral-small4

This week's pulls

Pulls:

949

Last week