Llama models on Google Cloud’s Vertex AI
Deploy Llama models (from technology company Meta) on Vertex AI to build production-ready AI agents and applications. With a range of model sizes and capabilities, you can choose the right Llama model for your specific use case, from lightweight, efficient models to multimodal versions. Access these models as serverless APIs and leverage their native multimodal and multilingual abilities for highly efficient text and visual intelligence.
Llama's open source large language models (LLMs) provide developers with the transparency and flexibility needed for innovation, enabling easy deployment, cost-efficiency, and scalable performance. When you build with Llama on Vertex AI, you combine the advantages of cutting-edge open models with the enterprise-grade security, scalability, and managed tools of Google Cloud's comprehensive AI platform.
Openly accessible LLMs, built to scale
Llama 4 Maverick, with 17 billion active parameters, is a 128-expert trained multimodal model, offering an optimal balance of intelligence, cost, and speed. Llama 4 Maverick offers image and text understanding, enabling the creation of sophisticated AI applications that bridge language barriers. Llama 4 models offer coding, reasoning, and image capabilities, and feature mixture-of-experts (MoE) architecture of neural networks.
Top use cases include language translation, multi-document summarization, and content creation. It can also provide personal assistance, support education and learning, aid in research, and reason over vast codebases.
Llama 4 Scout: a class-leading native multimodal model
Llama 4 Scout is a powerful multimodal AI model, with strong performance on highly complex tasks. It can navigate open-ended prompts and sight unseen scenarios with fluency.
Top use cases include multimodal assistant apps such as building chatbots with text and images, debugging code generation tasks, long context applications, multi-agent workflows, and data driven-decision making.
Llama 3.3 70B: open source AI model
Llama 3.3 70B is a text-only model deployable through the Vertex AI platform that’s highly optimized for performance and efficiency in handling a wide array of language-based tasks, giving developers a lightweight application that reduces costs.
Top use cases include deployment in customer service, code generation and debugging, and training data.
Llama 3.2 90B: a lightweight, multimodal model
Llama 3.2 90B, a multimodal, efficient, and flexible model that can understand high resolution images.
Top use cases include visual search functionality allowing users to find products using images such as e-commerce, medical scans, data analysis for complex documents, content generation, and can be context aware.
Openly accessible LLMs, built to scale
Llama 4 Maverick, with 17 billion active parameters, is a 128-expert trained multimodal model, offering an optimal balance of intelligence, cost, and speed. Llama 4 Maverick offers image and text understanding, enabling the creation of sophisticated AI applications that bridge language barriers. Llama 4 models offer coding, reasoning, and image capabilities, and feature mixture-of-experts (MoE) architecture of neural networks.
Top use cases include language translation, multi-document summarization, and content creation. It can also provide personal assistance, support education and learning, aid in research, and reason over vast codebases.
Llama 4 Scout: a class-leading native multimodal model
Llama 4 Scout is a powerful multimodal AI model, with strong performance on highly complex tasks. It can navigate open-ended prompts and sight unseen scenarios with fluency.
Top use cases include multimodal assistant apps such as building chatbots with text and images, debugging code generation tasks, long context applications, multi-agent workflows, and data driven-decision making.
Llama 3.3 70B: open source AI model
Llama 3.3 70B is a text-only model deployable through the Vertex AI platform that’s highly optimized for performance and efficiency in handling a wide array of language-based tasks, giving developers a lightweight application that reduces costs.
Top use cases include deployment in customer service, code generation and debugging, and training data.
Llama 3.2 90B: a lightweight, multimodal model
Llama 3.2 90B, a multimodal, efficient, and flexible model that can understand high resolution images.
Top use cases include visual search functionality allowing users to find products using images such as e-commerce, medical scans, data analysis for complex documents, content generation, and can be context aware.
Vertex AI provides an integrated environment to evaluate, deploy, and manage Llama-enabled applications quickly and at scale.
Simplify how you deploy and scale Llama models with a fully managed infrastructure designed for AI workloads, and the option to select from flexible pricing models like dedicated endpoints, or pay-as-you-go pricing.
Develop agents with Vertex AI’s tools and the advanced capabilities of Llama models.
Leverage Google Cloud's built-in security, privacy, data governance, and compliance capabilities tailored to adhere to enterprise-level standards.
Integrate your enterprise data with Llama's advanced capabilities, leveraging tools like BigQuery to extract valuable insights and drive informed decision-making.
Llama models demonstrate advanced abilities in complex reasoning, vision analysis, code generation, and multilingual processing. These models can follow intricate instructions and generate nuanced, comprehensive outputs.