
Mistral AI has launched its first multimodal model, Pixtral 12B, offering powerful vision-and-text capabilities under an open license.
Mistral AI continues its rapid release cycle with Pixtral 12B, its first native multimodal model. Built on the architecture of Mistral NeMo, Pixtral adds a new vision encoder that allows the model to process images of varying resolutions and aspect ratios alongside text input.
Pixtral 12B is released under the Apache 2.0 license, making it highly accessible for developers and enterprises who want to host their own multimodal applications. It excels at tasks such as image captioning, visual question answering, and optical character recognition (OCR) from complex documents.
The release of Pixtral highlights the growing trend of high-performance multimodal models moving away from proprietary black boxes. By providing open weights, Mistral allows for better auditing, fine-tuning, and integration into specialized workflows that require visual understanding.
