Overview of Generative AI on Vertex AI

Generative AI on Vertex AI (also known as genAI or gen AI) gives you access to Gemini models and other large generative AI models so you can evaluate, tune, and deploy them for use in your AI-powered applications. This page gives you an overview of the generative AI workflow on Vertex AI, the available APIs and models, including Vertex AI API for Gemini, and directs you to resources for getting started.

Generative AI workflow in Vertex AI

The following diagram shows a high level overview of the generative AI workflow.

Generative AI workflow diagram



The generative AI workflow typically starts with prompting. A prompt is a request sent to a generative AI model to elicit a response back. Depending on the model, a prompt can contain text, images, videos, audio, documents, and other modalities or even multiple modalities (multimodal).

Creating a prompt to get the desired response from the model is a practice called prompt design. While prompt design is a process of trial and error, there are prompt design principles and strategies that you can use to nudge the model to behave in the desired way. Vertex AI Studio offers a prompt management tool to help you manage your prompts.

Foundation models

Foundation models

Prompts are sent to a generative AI model for response generation. Vertex AI has a variety of generative AI foundation models that are accessible through a managed API, including the following:

  • Gemini API: Advanced reasoning, multiturn chat, code generation, and multimodal prompts.
  • Imagen API: Image generation, image editing, and visual captioning.
  • MedLM: Medical question answering and summarization. (Private GA)

The models differ in size, modality, and cost. You can explore Google models, as well as open models and models from Google partners, in Model Garden.

Model customization

Model customization

You can customize the default behavior of Google's foundation models so that they consistently generate the desired results without using complex prompts. This customization process is called model tuning. Model tuning helps you reduce the cost and latency of your requests by allowing you to simplify your prompts.

Vertex AI also offers model evaluation tools to help you evaluate the performance of your tuned model. After your tuned model is production-ready, you can deploy it to an endpoint and monitor performance like in standard MLOps workflows.

Request augmentation


Vertex AI offers multiple request augmentation methods that give the model access to external APIs and real-time information.

  • Grounding: Connects model responses to a source of truth, such as your own data or web search, helping to reduce hallucinations.
  • RAG: Connects models to external knowledge sources, such as documents and databases, to generate more accurate and informative responses.
  • Function calling: Lets the model interact with external APIs to get real-time information and perform real-world tasks.

Citation check

Citation check

After the response is generated, Vertex AI checks whether citations need to be included with the response. If a significant amount of the text in the response comes from a particular source, that source is added to the citation metadata in the response.

Responsible AI and safety

Responsible AI and safety

The last layer of checks that the prompt and response go through before being returned is the safety filters. Vertex AI checks both the prompt and response for how much the prompt or response belongs to a safety category. If the threshold is exceed for one or more categories, the response is blocked and Vertex AI returns a fallback response.



If the prompt and response passes the safety filter checks, the response is returned. Typically, the response is returned all at once. However, with Vertex AI you can also receive responses progressively as it generates by enabling streaming.

Generative AI APIs and models

The generative AI models available in Vertex AI, also called foundation models, are categorized by the type of content that it's designed to generate. This content includes text, chat, image, code, video, multimodal data, and embeddings. Each model is exposed through a publisher endpoint that's specific to your Google Cloud project so there's no need to deploy the foundation model unless you need to tune it for a specific use case.

Gemini API offerings

The Vertex AI Gemini API contains the publisher endpoints for the Gemini models developed by Google DeepMind. You can try the Vertex AI API for Gemini in this quickstart.

  • Gemini 1.5 Flash is a multimodal model you can use to create text generation and chat applications. You can include text, images, audio, video, and PDF files in your prompt requests and it has the same context window as Gemini 1.5 Pro to process large amounts of multimodal data. Gemini 1.5 Flash is smaller and faster than Gemini 1.5 Pro which makes it a good option to create chat assistants and on-demand content generation applications.
  • Gemini 1.5 Pro supports multimodal prompts. You can include text, images, audio, video, and PDF files in your prompt requests and get text or code responses. Gemini 1.5 Pro can process larger collections of images, larger text documents, and longer videos than Gemini 1.0 Pro Vision.
  • Gemini 1.0 Pro is designed to handle natural language tasks, multiturn text and code chat, and code generation.
  • Gemini 1.0 Pro Vision supports multimodal prompts. You can include text, images, video, and PDFs in your prompt requests and get text or code responses.

The following table shows some differences between the Gemini models that can help you choose which is best for you:

Gemini model Modalities Context window
Gemini 1.5 Flash
  • Text, code, images, audio, video, and PDF.
  • Up to 3,000 images.
  • Audio up to 8.4 hours.
  • Video without audio up to 1 hour. Video with audio up to 50 minutes.
  • 1M tokens in
  • 8,192 tokens out
Gemini 1.5 Pro
  • Text, code, images, audio, video, and PDF.
  • Up to 3,000 images.
  • Audio up to 8.4 hours.
  • Video without audio up to 1 hour. Video with audio up to 50 minutes.
  • 1M tokens in
  • 8,192 tokens out
Gemini 1.0 Pro / Gemini 1.0 Pro Vision
  • Text, code, and PDF (Gemini 1.0 Pro Vision).
  • Up to 16 images.
  • Video up to 2 minutes.
  • 8,192 tokens in
  • 2,048 tokens out

Other Generative AI offerings

  • Text embedding generates vector embeddings for input text. You can use embeddings for tasks like semantic search, recommendation, classification, and outlier detection.

  • Multimodal embedding generates vector embeddings based on image and text inputs. These embeddings can later be used for other subsequent tasks like image classification or content recommendations.

  • Imagen, our text-to-image foundation model, lets you generate and customize studio-grade images at scale.

  • Partner models are a curated list of generative AI models developed by Google's partner companies. These generative AI models are offered as managed APIs. For example, Anthropic provides its Claude models as a service on Vertex AI.

  • Open models, like Llama, are available for you to deploy on Vertex AI or other platforms.

  • MedLM is a family of foundation models fine-tuned for the healthcare industry.

Certifications and security controls

Vertex AI supports CMEK, VPC Service Controls, Data Residency, and Access Transparency. There are some limitations for Generative AI features. For more information, see Generative AI security controls.

Vertex AI Studio console experiences

When using Vertex AI Studio with the free trial or without signing in to Google Cloud, some features are not available. To try Vertex AI Studio, accept the Vertex AI Studio Terms of Service window in the Google Cloud console.

Use without a Google Cloud account Use with a Google Cloud free trial account Use with an existing Google Cloud account
Sign in required No Yes Yes
Queries per minute (QPM) 2 QPM N/A N/A
Credits offered $0 Up to $300 for 90 days $0
Prompt gallery No Yes Yes
Prompt designer Yes Yes Yes
Save prompts No Yes Yes
Prompt history No Yes Yes
Advanced parameters No No Yes
Tuning No No Yes
API usage No Yes Yes
Billing required No No Yes
How to get started Go to Vertex AI Studio Sign up for a free trial Try Vertex AI Studio in your console

More ways to get started