Overview of Generative AI on Vertex AI

Generative AI on Vertex AI (also known as genai) gives you access to Google's large generative AI models so you can test, tune, and deploy them for use in your AI-powered applications. This page gives you an overview of the generative AI workflow on Vertex AI, the features and models available, and directs you to resources for getting started.

Generative AI workflow

The following diagram shows a high level overview of the generative AI workflow.

Generative AI workflow diagram



The generative AI workflow typically starts with prompting. A prompt is a natural language request sent to a language model to elicit a response back. Writing a prompt to get the desired response from the model is a practice called prompt design. While prompt design is a process of trial and error, there are prompt design principles and strategies that you can use to nudge the model to behave in the desired way.

Foundation models

Foundation models

Prompts are sent to a model for response generation. Vertex AI has a variety of generative AI foundation models that you can select from, including the following:

  • PaLM 2 for Text (text-bison)
  • PaLM 2 for Chat (chat-bison)
  • Codey for Code Generation (code-bison)
  • Codey for Code Chat (codechat-bison)
  • Codey for Code Completion (code-gecko)
  • Embeddings for Text (textembedding-gecko)
  • Embeddings for Multimodal (multimodalembedding)
  • Imagen for Image Generation (imagegeneration)

The models differ in size, modality, and cost. You can explore Google's proprietary models and OSS models in Model Garden.

Model customization

Model customization

You can customize the default behavior of Google's foundation models so that they consistently generate the desired results without using complex prompts. This customization process is called model tuning. Model tuning helps you reduce the cost and latency of your requests by allowing you to simplify your prompts.

Vertex AI also offers model evaluation tools to help you evaluate the performance of your tuned model. After your tuned model is production-ready, you can deploy it to an endpoint and monitor performance like in standard MLOps workflows.

Vertex AI Grounding service


If you need model responses to be grounded on a source of truth, such as your own data corpus, you can use grounding in Vertex AI. Grounding helps reduce model hallucinations, especially on unknown topics, and also gives the model access to new information.

Citation check

Citation check

After the response is generated, Vertex AI checks whether citations need to be included with the response. If a significant amount of the text in the response comes from a particular source, that source is added to the citation metadata in the response.

Responsible AI and safety

Responsible AI and safety

The last layer of checks that the prompt and response go through before being returned is the safety filters. Vertex AI checks both the prompt and response for how much the prompt or response belongs to a safety category. If the threshold is exceed for one or more categories, the response is blocked and Vertex AI returns a fallback response.



If the prompt and response passes the safety filter checks, the response is returned. Typically, the response is returned all at once. However, you can also receive responses progressively as it generates by enabling streaming.

Generative AI models

The generative AI models available in Vertex AI, also called foundation models, are categorized by the type of content that it's designed to generate. This content includes text and chat, image, code, video, and embeddings. Each model is exposed through a publisher endpoint that's specific to your Google Cloud project so there's no need to deploy the foundation model unless you need to tune them for a specific use case.

PaLM 2 is the underlying model that is driving the PaLM API. PaLM 2 is a state-of-the-art language model with improved multilingual, reasoning, and coding capabilities. To learn more about PaLM 2, see Introducing PaLM 2.

PaLM API offerings

PaLM API logo The Vertex AI PaLM API contains the publisher endpoints for Google's Pathways Language Model 2 (PaLM 2), which are large language models (LLMs) that generate text and code in response to natural language prompts.

  • PaLM API for text is fine-tuned for language tasks such as classification, summarization, and entity extraction.
  • PaLM API for chat is fine-tuned for multi-turn chat, where the model keeps track of previous messages in the chat and uses it as context for generating new responses.

Other Generative AI offerings

  • The Codey APIs generate code. The Codey APIs include three models that generate code, suggest code for code completion, and let developers chat to get help with code-related questions. For more information, see Code models overview.

  • The Text Embedding API generates vector embeddings for input text. You can use embeddings for tasks like semantic search, recommendation, classification, and outlier detection.

  • Multimodal embeddings generates embedding vectors based on image and text inputs. These embeddings can later be used for other subsequent tasks like image classification or content recommendations. For more information, see the multimodal embeddings page.

  • Imagen, our text-to-image foundation model, lets organizations generate and customize studio-grade images at scale for any business need. For more information, see the Imagen on Vertex AI overview.

Generative AI Studio

Generative AI Studio is a Google Cloud console tool for rapidly prototyping and testing generative AI models. You can test sample prompts, design your own prompts, and customize foundation models to handle tasks that meet your application's needs. This page introduces the different tasks that you can perform in Generative AI Studio, including the following:

  • Test models using prompt samples.
  • Design and save your own prompts.
  • Tune a foundation model.
  • Convert between speech and text.

Test models using prompt samples

Prompt Gallery, in the Language section of Generative AI Studio, contains a variety of sample prompts that are predesigned to help demonstrate model capabilities. The sample prompts are categorized by the task type, such as summarization, classification, and extraction. Each prompt is preconfigured with a specified model and parameter values so you can just open the sample prompt and click Submit to get the model to generate a response.


Design and save your own prompts

Prompt design is the process of manually creating prompts that elicit the desired response from a language model. By carefully crafting prompts, you can nudge the model to generate a desired result. Prompt design can be an efficient way to experiment with adapting a language model for a specific use case.

You can create and save your own prompts in Generative AI Studio. When creating a new prompt, you enter the prompt text, specify the model to use, configure parameter values, and test the prompt by generating a response. You can iterate on the prompt and its configurations until you get the desired results. When you are done designing the prompt, you can save it in Generative AI Studio.

Response citations

If you are using a text model in Generative AI Studio like text-bison, you receive text responses based on your input. Our features are intended to produce original content and not replicate existing content at length. If Generative AI Studio quotes at length from a web page, it cites that page in the output.


You can change the quality of responses by tweaking the temperature (output randomness), and experimenting with other response parameters in Generative AI Studio.

Citations are available in Generative AI Studio and are available in the API. To learn more about Responsible AI and citations, see Citation metadata.

Explore generative AI models in Model Garden

Model Garden is a platform that helps you discover, test, customize, and deploy Google proprietary and select OSS models and assets. To explore the generative AI models and APIs that are available on Vertex AI, go to Model Garden in the Google Cloud console.

Go to Model Garden

To learn more about Model Garden, including available models and capabilities, see Explore AI models in Model Garden.

Tune a foundation model

While prompt design is great for quick experimentation, if training data is available, higher quality can be achieved by tuning the model itself. Tuning a model lets you customize the model's response based on examples of the task that you want the model to perform.

To learn how to tune a foundation model, see Tune foundation models.

Convert between speech and text

In the speech tool of Generative AI Studio, you can take a snippet of text and convert it into a speech audio file that you can playback and download. You can select from several voices and adjust the speaking rate.

Conversely, if you have an audio file of speech, you can also upload it to Generative AI Studio and get it transcribed into text.

To learn more, see the following pages:

Try Generative AI Studio

Generative AI Studio is in the Vertex AI page of the Google Cloud console.

Go to Generative AI Studio

Certifications and security controls

Vertex AI supports CMEK, VPC Service Controls, Data Residency, and Access Transparency. There are some limitations for Generative AI features. For more information, see Generative AI security controls.

Get started