[[["わかりやすい","easyToUnderstand","thumb-up"],["問題の解決に役立った","solvedMyProblem","thumb-up"],["その他","otherUp","thumb-up"]],[["わかりにくい","hardToUnderstand","thumb-down"],["情報またはサンプルコードが不正確","incorrectInformationOrSampleCode","thumb-down"],["必要な情報 / サンプルがない","missingTheInformationSamplesINeed","thumb-down"],["翻訳に関する問題","translationIssue","thumb-down"],["その他","otherDown","thumb-down"]],["最終更新日 2025-09-04 UTC。"],[],[],null,["# Self-deployed Llama models\n\nLlama is a collection of open models developed by Meta that you can fine-tune\nand deploy on Vertex AI. Llama offers pre-trained and instruction-tuned\ngenerative text and multimodal models.\n\nLlama 4\n-------\n\nThe Llama 4 family of models is a collection of multimodal models that use the\nMixture-of-Experts (MoE) architecture. By using the MoE architecture, models\nwith very large parameter counts can activate a subset of those parameters for\nany given input, which leads to more efficient inferences. Additionally, Llama\n4 uses early fusion, which integrates text and vision information from the\ninitial processing stages. This method enables Llama 4 models to more\neffectively grasp complex, nuanced relationships between text and images.\nModel Garden on Vertex AI offers two Llama 4 models: Llama 4\nScout and Llama 4 Maverick.\n\nFor more information, see the [Llama\n4](https://console.cloud.google.com/vertex-ai/publishers/meta/model-garden/llama4) model card in\nModel Garden or view the [Introducing Llama 4 on Vertex AI\nblog post](https://www.googlecloudcommunity.com/gc/Community-Blogs/Introducing-Llama-4-on-Vertex-AI/ba-p/892578).\n\n### Llama 4 Maverick\n\nLlama 4 Maverick is the largest and most capable Llama 4 model, offering\nindustry-leading capabilities on coding, reasoning, and image benchmarks. It\nfeatures 17 billion active parameters out of 400 billion total parameters with\n128 experts. Llama 4 Maverick uses alternating dense and MoE layers, where each\ntoken activates a shared expert plus one of the 128 routed experts. You can use\nthe model as a pretrained (PT) model or instruction-tuned (IT) model with FP8\nsupport. The model is pretrained on 200 languages and optimized for high-quality\nchat interactions through a refined post-training pipeline.\n\nLlama 4 Maverick is multimodal and has a 1M context length. It is suited for\nadvanced image captioning, analysis, precise image understanding, visual\nQ\\&A, creative text generation, general-purpose AI assistants, and sophisticated\nchatbots requiring top-tier intelligence and image understanding.\n\n### Llama 4 Scout\n\nLlama 4 Scout delivers state-of-the-art results for its size class with a large\n10 million token context window, outperforming previous Llama generations and\nother open and proprietary models on several benchmarks. It features 17 billion\nactive parameters out of the 109 billion total parameters with 16 experts and is\navailable as a pretrained (PT) or instruction-tuned (IT) model. Llama 4 Scout is\nsuited for retrieval tasks within long contexts and tasks that demand reasoning\nover large amounts of information, such as summarizing multiple large documents,\nanalyzing extensive user interaction logs for personalization and reasoning\nacross large codebases.\n\nLlama 3.3\n---------\n\nLlama 3.3 is a text-only 70B instruction-tuned model that provides enhanced\nperformance relative to Llama 3.1 70B and to Llama 3.2 90B when used for\ntext-only applications. Moreover, for some applications, Llama 3.3 70B\napproaches the performance of Llama 3.1 405B.\n\nFor more information, see the [Llama\n3.3](https://console.cloud.google.com/vertex-ai/publishers/meta/model-garden/llama3-3) model card in\nModel Garden.\n\nLlama 3.2\n---------\n\nLlama 3.2 enables developers to build and deploy the latest generative AI models\nand applications that use Llama's capabilities to ignite new innovations,\nsuch as image reasoning. Llama 3.2 is also designed to be more accessible for\non-device applications. The following list highlights Llama 3.2 features:\n\n- Offers a more private and personalized AI experience, with on-device processing for smaller models.\n- Offers models that are designed to be more efficient, with reduced latency and improved performance, making them suitable for a wide range of applications.\n- Built on top of the Llama Stack, which makes building and deploying applications easier. Llama Stack is a standardized interface for building canonical toolchain components and agentic applications.\n- Supports vision tasks, with a new model architecture that integrates image encoder representations into the language model.\n\nThe 1B and 3B models are lightweight text-only models that support on-device use\ncases such as multilingual local knowledge retrieval, summarization, and\nrewriting.\n\nLlama 11B and 90B models are small and medium-sized multimodal models with image\nreasoning. For example, they can analyze visual data from charts to provide more\naccurate responses and extract details from images to generate text\ndescriptions.\n\nFor more information, see the [Llama\n3.2](https://console.cloud.google.com/vertex-ai/publishers/meta/model-garden/llama3-2) model card in\nModel Garden.\n\n### Considerations\n\nWhen using the 11B and 90B, there are no restriction when you send\ntext-only prompts. However, if you include an image in your prompt, the image\nmust be at beginning of your prompt, and you can include only one image. You\ncannot, for example, include some text and then an image.\n\nLlama 3.1\n---------\n\nLlama 3.1 collection of multilingual large language models (LLMs) is a\ncollection of pre-trained and instruction-tuned generative models in 8B, 70B and\n405B sizes (text in/text out). The Llama 3.1 instruction tuned text-only models\n(8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform\nmany of the available open source and closed chat models on common industry\nbenchmarks.\n\nFor more information, see the [Llama\n3.1](https://console.cloud.google.com/vertex-ai/publishers/meta/model-garden/llama3_1) model card in\nModel Garden.\n\nLlama 3\n-------\n\nThe Llama 3 instruction-tuned models are a collection of LLMs optimized for\ndialogue use cases. Llama 3 models outperform many of the available open source\nchat models on common industry benchmarks.\n\nFor more information, see the [Llama\n3](https://console.cloud.google.com/vertex-ai/publishers/meta/model-garden/llama3) model card in\nModel Garden.\n\nLlama 2\n-------\n\nThe Llama 2 LLMs is a collection of pre-trained and fine-tuned generative text\nmodels, ranging in size from 7B to 70B parameters.\n\nFor more information, see the [Llama\n2](https://console.cloud.google.com/vertex-ai/publishers/meta/model-garden/llama2) model card in\nModel Garden.\n\nCode Llama\n----------\n\nMeta's Code Llama models are designed for code synthesis,\nunderstanding, and instruction.\n\nFor more information, see the [Code\nLlama](https://console.cloud.google.com/vertex-ai/publishers/meta/model-garden/codellama-7b-hf) model card in\nModel Garden.\n\nLlama Guard 3\n-------------\n\nLlama Guard 3 builds on the capabilities of Llama Guard 2, adding\nthree new categories: Defamation, Elections, and Code Interpreter Abuse.\nAdditionally, this model is multilingual and has a prompt format that is\nconsistent with Llama 3 or later instruct models.\n\nFor more information, see the [Llama\nGuard](https://console.cloud.google.com/vertex-ai/publishers/meta/model-garden/llama-guard) model card in\nModel Garden.\n\nResources\n---------\n\nFor more information about Model Garden, see\n[Explore AI models in Model Garden](/vertex-ai/generative-ai/docs/model-garden/explore-models)."]]