Overview of Generative AI on Vertex AI

Generative AI on Vertex AI (also known as genAI or gen AI) gives you access to Google's large generative AI models so you can test, tune, and deploy them for use in your AI-powered applications. This page gives you an overview of the generative AI workflow on Generative AI on Vertex AI, the features and models available, and directs you to resources for getting started.

Generative AI workflow

The following diagram shows a high level overview of the generative AI workflow.

Generative AI workflow diagram



The generative AI workflow typically starts with prompting. A prompt is a natural language request sent to a language model to elicit a response back. Writing a prompt to get the desired response from the model is a practice called prompt design. While prompt design is a process of trial and error, there are prompt design principles and strategies that you can use to nudge the model to behave in the desired way.

Foundation models

Foundation models

Prompts are sent to a model for response generation. Generative AI on Vertex AI has a variety of generative AI foundation models that are accessible through an API, including the following:

  • Gemini API: Advanced reasoning, multiturn chat, code generation, and multimodal prompts.
  • PaLM API: Natural language tasks, text embeddings, and multiturn chat.
  • Codey APIs: Code generation, code completion, and code chat.
  • Imagen API: Image generation, image editing, and visual captioning.
  • MedLM: Medical question answering and summarization. (Private GA)

The models differ in size, modality, and cost. You can explore Google's proprietary models and OSS models in Model Garden.

Model customization

Model customization

You can customize the default behavior of Google's foundation models so that they consistently generate the desired results without using complex prompts. This customization process is called model tuning. Model tuning helps you reduce the cost and latency of your requests by allowing you to simplify your prompts.

Generative AI on Vertex AI also offers model evaluation tools to help you evaluate the performance of your tuned model. After your tuned model is production-ready, you can deploy it to an endpoint and monitor performance like in standard MLOps workflows.

Generative AI on Vertex AI Grounding service


If you need model responses to be grounded on a source of truth, such as your own data corpus, you can use grounding in Generative AI on Vertex AI. Grounding helps reduce model hallucinations, especially on unknown topics, and also gives the model access to new information.

Citation check

Citation check

After the response is generated, Generative AI on Vertex AI checks whether citations need to be included with the response. If a significant amount of the text in the response comes from a particular source, that source is added to the citation metadata in the response.

Responsible AI and safety

Responsible AI and safety

The last layer of checks that the prompt and response go through before being returned is the safety filters. Generative AI on Vertex AI checks both the prompt and response for how much the prompt or response belongs to a safety category. If the threshold is exceed for one or more categories, the response is blocked and Generative AI on Vertex AI returns a fallback response.



If the prompt and response passes the safety filter checks, the response is returned. Typically, the response is returned all at once. However, you can also receive responses progressively as it generates by enabling streaming.

Generative AI APIs and models

The generative AI models available in Generative AI on Vertex AI, also called foundation models, are categorized by the type of content that it's designed to generate. This content includes text, chat, image, code, video, multimodal data, and embeddings. Each model is exposed through a publisher endpoint that's specific to your Google Cloud project so there's no need to deploy the foundation model unless you need to tune it for a specific use case.

Gemini API offerings

The Vertex AI Gemini API contains the publisher endpoints for the Gemini models developed by Google DeepMind.

  • Gemini 1.0 Pro is designed to handle natural language tasks, multiturn text and code chat, and code generation.
  • Gemini 1.0 Pro Vision supports multimodal prompts. You can include text, images, and video in your prompt requests and get text or code responses.

PaLM API offerings

The Vertex AI PaLM API contains the publisher endpoints for Google's Pathways Language Model 2 (PaLM 2), which are large language models (LLMs) that generate text and code in response to natural language prompts.

  • PaLM API for text is fine-tuned for language tasks such as classification, summarization, and entity extraction.
  • PaLM API for chat is fine-tuned for multi-turn chat, where the model keeps track of previous messages in the chat and uses it as context for generating new responses.

Other Generative AI offerings

  • The Codey APIs generate code. The Codey APIs include three models that generate code, suggest code for code completion, and let developers chat to get help with code-related questions. For more information, see Code models overview.

  • The Text Embedding API generates vector embeddings for input text. You can use embeddings for tasks like semantic search, recommendation, classification, and outlier detection.

  • Multimodal embeddings generates embedding vectors based on image and text inputs. These embeddings can later be used for other subsequent tasks like image classification or content recommendations. For more information, see the multimodal embeddings page.

  • Imagen, our text-to-image foundation model, lets organizations generate and customize studio-grade images at scale for any business need. For more information, see the Imagen on Vertex AI overview.

  • MedLM is a family of foundation models fine-tuned for the healthcare industry. For more information, see the MedLM models overview.

Vertex AI Studio

Vertex AI Studio is a Google Cloud console tool for rapidly prototyping and testing generative AI models. You can test sample prompts, design your own prompts, and customize foundation models to handle tasks that meet your application's needs. This page introduces the different tasks that you can perform in Vertex AI Studio, including the following:

  • Test models using prompt samples.
  • Design and save your own prompts.
  • Tune a foundation model.
  • Convert between speech and text.

Test models using prompt samples

Prompt Gallery, in the Language section of Vertex AI Studio, contains a variety of sample prompts that are predesigned to help demonstrate model capabilities. The sample prompts are categorized by the task type, such as summarization, classification, and extraction. Each prompt is preconfigured with a specified model and parameter values so you can just open the sample prompt and click Submit to get the model to generate a response.


Design and save your own prompts

Prompt design is the process of manually creating prompts that elicit the desired response from a language model. By carefully crafting prompts, you can nudge the model to generate a desired result. Prompt design can be an efficient way to experiment with adapting a language model for a specific use case.

You can create and save your own prompts in Vertex AI Studio. When creating a new prompt, you enter the prompt text, specify the model to use, configure parameter values, and test the prompt by generating a response. You can iterate on the prompt and its configurations until you get the desired results. When you are done designing the prompt, you can save it in Vertex AI Studio.

Response citations

If you are using a text model in Vertex AI Studio like text-bison, you receive text responses based on your input. Our features are intended to produce original content and not replicate existing content at length. If Vertex AI Studio quotes at length from a web page, it cites that page in the output.


You can change the quality of responses by tweaking the temperature (output randomness), and experimenting with other response parameters in Vertex AI Studio.

Citations are available in Vertex AI Studio and are available in the API. To learn more about Responsible AI and citations, see Citation metadata.

Explore generative AI models in Model Garden

Model Garden is a platform that helps you discover, test, customize, and deploy Google proprietary and select OSS models and assets. To explore the generative AI models and APIs that are available on Generative AI on Vertex AI, go to Model Garden in the Google Cloud console.

Go to Model Garden

To learn more about Model Garden, including available models and capabilities, see Explore AI models in Model Garden.

Tune a foundation model

While prompt design is great for quick experimentation, if training data is available, higher quality can be achieved by tuning the model itself. Tuning a model lets you customize the model's response based on examples of the task that you want the model to perform.

To learn how to tune a foundation model, see Tune foundation models.

Convert between speech and text

In the speech tool of Vertex AI Studio, you can take a snippet of text and convert it into a speech audio file that you can playback and download. You can select from several voices and adjust the speaking rate.

Conversely, if you have an audio file of speech, you can also upload it to Vertex AI Studio and get it transcribed into text.

To learn more, see the following pages:

Try Vertex AI Studio

Vertex AI Studio is in the Generative AI on Vertex AI page of the Google Cloud console.

Go to Vertex AI Studio

Certifications and security controls

Generative AI on Vertex AI supports CMEK, VPC Service Controls, Data Residency, and Access Transparency. There are some limitations for Generative AI features. For more information, see Generative AI security controls.

Get started