Llama models on Google Cloud’s Vertex AI 

Deploy Llama models (from technology company Meta) on Vertex AI to build production-ready AI agents and applications. With a range of model sizes and capabilities, you can choose the right Llama model for your specific use case, from lightweight, efficient models to multimodal versions. Access these models as serverless APIs and leverage their native multimodal and multilingual abilities for highly efficient text and visual intelligence.

Vertex AI and Llama text logo with cartoon llama between them
Llama models overview

Llama's open source large language models (LLMs) provide developers with the transparency and flexibility needed for innovation, enabling easy deployment, cost-efficiency, and scalable performance. When you build with Llama on Vertex AI, you combine the advantages of cutting-edge open models with the enterprise-grade security, scalability, and managed tools of Google Cloud's comprehensive AI platform.

Openly accessible LLMs, built to scale

Llama 4 Maverick, with 17 billion active parameters, is a 128-expert trained multimodal model, offering an optimal balance of intelligence, cost, and speed. Llama 4 Maverick offers image and text understanding, enabling the creation of sophisticated AI applications that bridge language barriers. Llama 4 models offer coding, reasoning, and image capabilities, and feature mixture-of-experts (MoE) architecture of neural networks.

Top use cases include language translation, multi-document summarization, and content creation. It can also provide personal assistance, support education and learning, aid in research, and reason over vast codebases.


Openly accessible LLMs, built to scale

Llama 4 Maverick, with 17 billion active parameters, is a 128-expert trained multimodal model, offering an optimal balance of intelligence, cost, and speed. Llama 4 Maverick offers image and text understanding, enabling the creation of sophisticated AI applications that bridge language barriers. Llama 4 models offer coding, reasoning, and image capabilities, and feature mixture-of-experts (MoE) architecture of neural networks.

Top use cases include language translation, multi-document summarization, and content creation. It can also provide personal assistance, support education and learning, aid in research, and reason over vast codebases.


Llama 4 Scout: a class-leading native multimodal model

Llama 4 Scout is a powerful multimodal AI model, with strong performance on highly complex tasks. It can navigate open-ended prompts and sight unseen scenarios with fluency.

Top use cases include multimodal assistant apps such as building chatbots with text and images, debugging code generation tasks, long context applications, multi-agent workflows, and data driven-decision making.


Llama 3.3 70B: open source AI model

Llama 3.3 70B is a text-only model deployable through the Vertex AI platform that’s highly optimized for performance and efficiency in handling a wide array of language-based tasks, giving developers a lightweight application that reduces costs.

Top use cases include deployment in customer service, code generation and debugging, and training data.


Llama 3.2 90B: a lightweight, multimodal model

Llama 3.2 90B, a multimodal, efficient, and flexible model that can understand high resolution images.

Top use cases include visual search functionality allowing users to find products using images such as e-commerce, medical scans, data analysis for complex documents, content generation, and can be context aware.


Benefits and capabilities of Llama models on Vertex AI

Accelerate AI development

Vertex AI provides an integrated environment to evaluate, deploy, and manage Llama-enabled applications quickly and at scale.

Optimize performance and cost

Simplify how you deploy and scale Llama models with a fully managed infrastructure designed for AI workloads, and the option to select from flexible pricing models like dedicated endpoints, or pay-as-you-go pricing.

Build sophisticated AI agents

Develop agents with Vertex AI’s tools and the advanced capabilities of Llama models.

Built-in security, compliance, and data governance

Leverage Google Cloud's built-in security, privacy, data governance, and compliance capabilities tailored to adhere to enterprise-level standards.


Maximize the power of your data

Integrate your enterprise data with Llama's advanced capabilities, leveraging tools like BigQuery to extract valuable insights and drive informed decision-making.

Enhanced capabilities

Llama models demonstrate advanced abilities in complex reasoning, vision analysis, code generation, and multilingual processing. These models can follow intricate instructions and generate nuanced, comprehensive outputs.

Build with Llama on Vertex AI


Google Cloud