
Meta's latest release features multimodal capabilities and small models optimized for mobile and wearable devices.
Mark Zuckerberg has announced the release of Llama 3.2, the first major update to the Llama family that includes vision-capable models. The update includes 11B and 90B parameter vision models, as well as ultra-lightweight 1B and 3B models designed specifically for mobile hardware. This marks a turning point for the open-source community, which now has access to high-performance multimodal models that can run locally.
The vision models are capable of high-level reasoning over images, such as reading charts, identifying objects, and generating descriptive captions. Meanwhile, the smaller 1B and 3B models have been optimized using techniques like pruning and distillation, allowing them to fit within the memory constraints of modern smartphones. Meta is partnering with hardware vendors like Qualcomm and MediaTek to ensure these models run at peak performance on consumer devices.
By democratizing access to vision-language models, Meta is enabling a new wave of application development. Developers can now build privacy-first applications that process images and text locally, without the need to send sensitive data to a cloud server. This release further solidifies Meta's strategy of using open source to set the industry standard for AI infrastructure.
