Llama is a collection of open models developed by Meta. You can fine-tune and deploy these models on Vertex AI. Llama offers pre-trained and instruction-tuned
generative text and multimodal models. This document describes the Llama models available on Vertex AI, including the following: To help you choose a Llama model for your use case, the following table compares the available model families. The Llama 4 family of models is a collection of multimodal models that use the
Mixture-of-Experts (MoE) architecture. The MoE architecture allows models with large parameter counts to activate only a subset of parameters for any given input, which results in more efficient inference. Additionally, Llama
4 uses early fusion, which integrates text and vision information from the
initial processing stages. This method helps Llama 4 models better
understand complex relationships between text and images.
Model Garden on Vertex AI offers two Llama 4 models: Llama 4
Scout and Llama 4 Maverick. For more information, see the Llama
4 model card in
Model Garden or view the Introducing Llama 4 on Vertex AI
blog post. Llama 4 Maverick is the largest and most capable Llama 4 model. It performs well on coding, reasoning, and image benchmarks. It
features 17 billion active parameters out of 400 billion total parameters with
128 experts. Llama 4 Maverick uses alternating dense and MoE layers, where each
token activates a shared expert plus one of the 128 routed experts. You can use
the model as a pretrained (PT) model or instruction-tuned (IT) model with FP8
support. The model is pretrained on 200 languages and is optimized for high-quality
chat interactions through a refined post-training pipeline. Llama 4 Maverick is a multimodal model with a 1M context length. It is suited for use cases that require advanced intelligence and image understanding, such as the following: Llama 4 Scout delivers strong performance for its size class. With a 10 million token context window, it performs well on several benchmarks compared to previous Llama generations and other open and proprietary models. It features 17 billion
active parameters out of the 109 billion total parameters with 16 experts and is
available as a pretrained (PT) model or instruction-tuned (IT) model. Llama 4 Scout is suited for tasks that require reasoning over large amounts of information, such as the following: Llama 3.3 is a 70B parameter, text-only, instruction-tuned model. For text-only applications, it offers enhanced performance compared to Llama 3.1 70B and Llama 3.2 90B. For some applications, Llama 3.3 70B
approaches the performance of Llama 3.1 405B. For more information, see the Llama
3.3 model card in
Model Garden. Llama 3.2 models help you build and deploy generative AI applications
that use Llama's capabilities for features like image reasoning. Llama 3.2 is also designed for
on-device applications. Key features of Llama 3.2 include the following: The 1B and 3B models are lightweight text-only models that support on-device use
cases such as multilingual local knowledge retrieval, summarization, and
rewriting. The 11B and 90B models are small and medium-sized multimodal models with image
reasoning capabilities. For example, they can analyze visual data from charts to provide more
accurate responses and extract details from images to generate text
descriptions. For more information, see the Llama
3.2 model card in
Model Garden. When you use the 11B and 90B models, there are no restrictions for
text-only prompts. However, if you include an image in your prompt, the image
must be at the beginning of your prompt, and you can include only one image. You
cannot, for example, include some text and then an image. The Llama 3.1 family of models is a collection of multilingual, pre-trained, and instruction-tuned generative text models available in 8B, 70B, and 405B sizes. The Llama 3.1 instruction-tuned models are optimized for multilingual dialogue use cases and perform well on common industry
benchmarks compared to many available open-source and proprietary chat models. For more information, see the Llama
3.1 model card in
Model Garden. The Llama 3 instruction-tuned models are a collection of LLMs optimized for
dialogue use cases. Llama 3 models perform well on common industry benchmarks compared to many available open-source
chat models. For more information, see the Llama
3 model card in
Model Garden. The Llama 2 LLMs are a collection of pre-trained and fine-tuned generative text
models, ranging in size from 7B to 70B parameters. For more information, see the Llama
2 model card in
Model Garden. The Code Llama models from Meta are designed for code synthesis,
understanding, and instruction. For more information, see the Code
Llama model card in
Model Garden. Llama Guard 3 builds on the capabilities of Llama Guard 2, adding
three new categories: Defamation, Elections, and Code Interpreter Abuse.
Additionally, this model is multilingual and has a prompt format that is
consistent with Llama 3 or later instruct models. For more information, see the Llama
Guard model card in
Model Garden. For more information about Model Garden, see
Explore AI models in Model Garden.
Model Family
Description
Primary Use Case
Llama 4
Multimodal models (text, image) with a Mixture-of-Experts (MoE) architecture. Includes Scout (long context) and Maverick (highest capability).
Advanced image analysis, visual Q&A, creative text generation, and reasoning over large documents or codebases.
Llama 3.3
A 70B parameter, text-only, instruction-tuned model with enhanced performance for text applications.
High-performance text-only tasks where it can approach the performance of much larger models.
Llama 3.2
Efficient multimodal models (text, image) designed for a range of applications, including on-device use cases.
Image reasoning, chart analysis, on-device summarization, and multilingual knowledge retrieval.
Llama 3.1
Multilingual text-only models (8B, 70B, 405B) optimized for dialogue.
Multilingual dialogue and chat applications.
Llama 3
Instruction-tuned text-only models optimized for dialogue.
General dialogue and chat applications.
Llama 2
A collection of pre-trained and fine-tuned generative text models (7B to 70B).
General-purpose generative text tasks.
Code Llama
Text-to-code models based on Llama 2.
Code generation, completion, and debugging.
Llama Guard 3
A safety model for classifying content against a risk taxonomy. Multilingual and enhanced over previous versions.
Content moderation and implementing safety layers for generative AI applications.
Llama 4
Llama 4 Maverick
Llama 4 Scout
Llama 3.3
Llama 3.2
Considerations
Llama 3.1
Llama 3
Llama 2
Code Llama
Llama Guard 3
Resources
Self-deployed Llama models
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-18 UTC.