ai/smolvlm

Verified Publisher

By Docker

Updated 7 months ago

SmolVLM: lightweight multimodal model for video, image, and text analysis, optimized for devices.

Model
3

10K+

ai/smolvlm repository overview

SmolVLM

Description

SmolVLM is a lightweight multimodal model designed to analyze video content. The model processes videos, images, and text inputs to generate text outputs - whether answering questions about media files, comparing visual content, or transcribing text from images. Despite its compact size, requiring only 1.8GB of GPU RAM for video inference, it delivers robust performance on complex multimodal tasks. This efficiency makes it particularly well-suited for on-device applications where computational resources may be limited.

Characteristics

AttributeDetails
ProviderHugging Face
ArchitectureLlama
Cutoff date-
LanguagesEnglish
Tool calling
Input modalitiesText, Image
Output modalitiesText
LicenseApache 2.0

Available model variants

Model variantParametersQuantizationContext windowVRAM¹Size
ai/smolvlm:500M

ai/smolvlm:500M-F16

ai/smolvlm:latest
500MMOSTLY_F168K tokens1.15 GiB780.71 MB
ai/smolvlm:500M-Q8_0500MMOSTLY_Q8_08K tokens0.84 GiB414.86 MB

¹: VRAM estimated based on model characteristics.

latest500M

Use this AI model with Docker Model Runner

docker model run ai/smolvlm

Tag summary

Content type

Model

Digest

sha256:0cae12900

Size

972.7 MB

Last updated

7 months ago

docker model pull ai/smolvlm:500M

This week's pulls

Pulls:

176

Last week