This document shows how to use the OpenAI-compatible Chat Completions API to interact with Vertex AI models. This document covers the following topics:
- Supported models: Learn which Gemini and self-deployed Model Garden models are compatible with the API.
- Supported parameters: Review the list of standard OpenAI parameters that you can use.
- Multimodal input parameters: See how to use multimodal inputs like audio and images.
- Gemini-specific parameters: Discover how to use Gemini-specific features through the
extra_body
andextra_part
fields.
The Chat Completions API is an OpenAI-compatible endpoint that lets you use OpenAI Python and REST libraries to interact with Gemini on Vertex AI. If you already use the OpenAI libraries, this API offers a way to switch between OpenAI models and Vertex AI hosted models to compare output, cost, and scalability with minimal changes to your existing code. If you don't use the OpenAI libraries, we recommend using the Google Gen AI SDK.
Supported models
The Chat Completions API supports both Gemini models and select self-deployed models from Model Garden.
Gemini models
The Chat Completions API supports the following Gemini models:
Self-deployed models from Model Garden
The Hugging Face Text Generation Interface (HF TGI) and Vertex AI Model Garden prebuilt vLLM containers support the Chat Completions API. However, not every model deployed to these containers supports the Chat Completions API. The following table includes the most popular supported models by container:
HF TGI |
vLLM |
---|---|
Supported parameters
For Google models, the Chat Completions API supports the following OpenAI parameters. For a description of each parameter, see OpenAI's documentation on Creating chat completions. Parameter support for third-party models varies by model. To see which parameters are supported, consult the model's documentation.
messages |
|
model |
|
max_completion_tokens |
Alias for max_tokens . |
max_tokens |
|
n |
|
frequency_penalty |
|
presence_penalty |
|
reasoning_effort |
Configures how much time and how many tokens are used on a response.
reasoning_effort or extra_body.google.thinking_config
may be specified.
|
response_format |
|
seed |
Corresponds to GenerationConfig.seed . |
stop |
|
stream |
|
temperature |
|
top_p |
|
tools |
|
tool_choice |
|
web_search_options |
Corresponds to the GoogleSearch tool. No sub-options are
supported. |
function_call |
This field is deprecated, but supported for backwards compatibility. |
functions |
This field is deprecated, but supported for backwards compatibility. |
If you pass any unsupported parameter, it is ignored.
Multimodal input parameters
The Chat Completions API supports select multimodal inputs.
input_audio |
|
image_url |
|
In general, the data
parameter can be a URI or a combination of MIME type and
base64 encoded bytes in the form "data:<MIME-TYPE>;base64,<BASE64-ENCODED-BYTES>"
.
For a full list of MIME types, see GenerateContent
.
For more information on OpenAI's base64 encoding, see their documentation.
For usage, see our multimodal input examples.
Gemini-specific parameters
To use features that are supported by Gemini but not by OpenAI models, pass them as parameters within an extra_content
or extra_body
field. If you pass these features outside of these fields, they are ignored.
extra_body
features
To use Gemini-specific extra_body
features, include them in a google
field.
{
...,
"extra_body": {
"google": {
...,
// Add extra_body features here.
}
}
}
safety_settings |
This corresponds to Gemini's SafetySetting . |
cached_content |
This corresponds to Gemini's GenerateContentRequest.cached_content . |
thinking_config |
This corresponds to Gemini's GenerationConfig.ThinkingConfig . |
thought_tag_marker |
Used to separate a model's thoughts from its responses for models with Thinking available. If not specified, no tags will be returned around the model's thoughts. If present, subsequent queries will strip the thought tags and mark the thoughts appropriately for context. This helps preserve the appropriate context for subsequent queries. |
extra_part
features
The extra_part
field lets you specify additional settings for each Part
. To use Gemini-specific extra_part
features, include them in a google
field.
{
...,
"extra_part": {
"google": {
...,
// Add extra_part features here.
}
}
}
extra_content |
A field for adding Gemini-specific content that shouldn't be ignored. |
thought |
This will explicitly mark if a field is a thought (and take precedence over
thought_tag_marker ). This should be used to specify whether a tool call
is part of a thought or not. |
What's next
- Learn more about authentication and credentialing with the OpenAI-compatible syntax.
- See examples of calling the Chat Completions API with the OpenAI-compatible syntax.
- See examples of calling the Inference API with the OpenAI-compatible syntax.
- See examples of calling the Function Calling API with OpenAI-compatible syntax.
- Learn more about the Gemini API.
- Learn more about migrating from Azure OpenAI to the Gemini API.