Generative AI on Vertex AI (also known as genAI or gen AI) gives you access to Gemini models and other large generative AI models so you can evaluate, tune, and deploy them for use in your AI-powered applications. This page gives you an overview of the generative AI workflow on Vertex AI, the available APIs and models, including Vertex AI API for Gemini, and directs you to resources for getting started.
Generative AI workflow in Vertex AI
The following diagram shows a high level overview of the generative AI workflow.
Prompt
The generative AI workflow typically starts with prompting. A prompt is a request sent to a generative AI model to elicit a response back. Depending on the model, a prompt can contain text, images, videos, audio, documents, and other modalities or even multiple modalities (multimodal). Creating a prompt to get the desired response from the model is a practice called prompt design. While prompt design is a process of trial and error, there are prompt design principles and strategies that you can use to nudge the model to behave in the desired way. Vertex AI Studio offers a prompt management tool to help you manage your prompts. |
Foundation models
Prompts are sent to a generative AI model for response generation. Vertex AI has a variety of generative AI foundation models that are accessible through a managed API, including the following:
The models differ in size, modality, and cost. You can explore Google models, as well as open models and models from Google partners, in Model Garden. |
Model customization
You can customize the default behavior of Google's foundation models so that they consistently generate the desired results without using complex prompts. This customization process is called model tuning. Model tuning helps you reduce the cost and latency of your requests by allowing you to simplify your prompts. Vertex AI also offers model evaluation tools to help you evaluate the performance of your tuned model. After your tuned model is production-ready, you can deploy it to an endpoint and monitor performance like in standard MLOps workflows. |
Request augmentation
Vertex AI offers multiple request augmentation methods that give the model access to external APIs and real-time information.
|
Citation check
After the response is generated, Vertex AI checks whether citations need to be included with the response. If a significant amount of the text in the response comes from a particular source, that source is added to the citation metadata in the response. |
Responsible AI and safety
The last layer of checks that the prompt and response go through before being returned is the safety filters. Vertex AI checks both the prompt and response for how much the prompt or response belongs to a safety category. If the threshold is exceed for one or more categories, the response is blocked and Vertex AI returns a fallback response. |
Response
If the prompt and response passes the safety filter checks, the response is returned. Typically, the response is returned all at once. However, with Vertex AI you can also receive responses progressively as it generates by enabling streaming. |
Generative AI APIs and models
The generative AI models available in Vertex AI, also called foundation models, are categorized by the type of content that it's designed to generate. This content includes text, chat, image, code, video, multimodal data, and embeddings. Each model is exposed through a publisher endpoint that's specific to your Google Cloud project so there's no need to deploy the foundation model unless you need to tune it for a specific use case.
Gemini API offerings
The Vertex AI Gemini API contains the publisher endpoints for the Gemini models developed by Google DeepMind. You can try the Vertex AI API for Gemini in this quickstart.
- Gemini 1.5 Flash is a multimodal model you can use to create text generation and chat applications. You can include text, images, audio, video, and PDF files in your prompt requests and it has the same context window as Gemini 1.5 Pro to process large amounts of multimodal data. Gemini 1.5 Flash is smaller and faster than Gemini 1.5 Pro which makes it a good option to create chat assistants and on-demand content generation applications.
- Gemini 1.5 Pro supports multimodal prompts. You can include text, images, audio, video, and PDF files in your prompt requests and get text or code responses. Gemini 1.5 Pro can process larger collections of images, larger text documents, and longer videos than Gemini 1.0 Pro Vision.
- Gemini 1.0 Pro is designed to handle natural language tasks, multiturn text and code chat, and code generation.
- Gemini 1.0 Pro Vision supports multimodal prompts. You can include text, images, video, and PDFs in your prompt requests and get text or code responses.
The following table shows some differences between the Gemini models that can help you choose which is best for you:
Gemini model | Modalities | Context window |
---|---|---|
Gemini 1.5 Flash |
|
|
Gemini 1.5 Pro |
|
|
Gemini 1.0 Pro |
|
|
Gemini 1.0 Pro Vision |
|
|
Other Generative AI offerings
Text embedding generates vector embeddings for input text. You can use embeddings for tasks like semantic search, recommendation, classification, and outlier detection.
Multimodal embedding generates vector embeddings based on image and text inputs. These embeddings can later be used for other subsequent tasks like image classification or content recommendations.
Imagen, our text-to-image foundation model, lets you generate and customize studio-grade images at scale.
Partner models are a curated list of generative AI models developed by Google's partner companies. These generative AI models are offered as managed APIs. For example, Anthropic provides its Claude models as a service on Vertex AI.
Open models, like Llama, are available for you to deploy on Vertex AI or other platforms.
MedLM is a family of foundation models fine-tuned for the healthcare industry.
Certifications and security controls
Vertex AI supports CMEK, VPC Service Controls, Data Residency, and Access Transparency. There are some limitations for Generative AI features. For more information, see Generative AI security controls.
Vertex AI Studio console experiences
When using Vertex AI Studio with the free trial or without signing in to Google Cloud, some features are not available. To try Vertex AI Studio, accept the Vertex AI Studio Terms of Service window in the Google Cloud console.
Use without a Google Cloud account | Use with a Google Cloud free trial account | Use with an existing Google Cloud account | |
---|---|---|---|
Sign in required | No | Yes | Yes |
Queries per minute (QPM) | 2 QPM | N/A | N/A |
Credits offered | $0 | Up to $300 for 90 days | $0 |
Prompt gallery | No | Yes | Yes |
Prompt designer | Yes | Yes | Yes |
Save prompts | No | Yes | Yes |
Prompt history | No | Yes | Yes |
Advanced parameters | No | No | Yes |
Tuning | No | No | Yes |
API usage | No | Yes | Yes |
Billing required | No | No | Yes |
How to get started | Go to Vertex AI Studio | Sign up for a free trial | Try Vertex AI Studio in your console |
More ways to get started
- Try a quickstart tutorial using Vertex AI Studio or the Vertex AI API.
- Explore pretrained models in Model Garden.
- Explore the Vertex AI Gemini API SDK reference for Python, Node.js, Java, Go, or C#.
- Learn how to tune a foundation model.
- Learn about responsible AI best practices and Vertex AI's safety filters.
- Learn about quotas and limits.
- Learn about pricing.
- Learn about calling Gemini by using the OpenAI library.