Using OpenAI libraries with Vertex AI
Stay organized with collections
Save and categorize content based on your preferences.
The Chat Completions API works as an Open AI-compatible endpoint, designed to
make it easier to interface with Gemini on Vertex AI by
using the OpenAI libraries for Python and REST. If you're already using the
OpenAI libraries, you can use this API as a low-cost way to switch between
calling OpenAI models and Vertex AI hosted models to compare
output, cost, and scalability, without changing your existing code.
If you aren't already using the OpenAI libraries, we recommend that you
use the Google Gen AI SDK.
Supported models
The Chat Completions API supports both Gemini models and select
self-deployed models from Model Garden.
Gemini models
The following models provide support for the Chat Completions API:
Self-deployed models from Model Garden
The
Hugging Face Text Generation Interface (HF TGI)
and
Vertex AI Model Garden prebuilt vLLM
containers support the Chat Completions API. However,
not every model deployed to these containers supports the Chat Completions API.
The following table includes the most popular supported models by container:
Supported parameters
For Google models, the Chat Completions API supports the following OpenAI
parameters. For a description of each parameter, see OpenAI's documentation on
Creating chat completions.
Parameter support for third-party models varies by model. To see which parameters
are supported, consult the model's documentation.
messages |
System message
User message : The text and
image_url types are supported. The
image_url type supports images stored a
Cloud Storage URI or a base64 encoding in the form
"data:<MIME-TYPE>;base64,<BASE64-ENCODED-BYTES>" . To
learn how to create a Cloud Storage bucket and upload a file to it,
see
Discover object storage.
The detail option is not supported.
Assistant message
Tool message
Function message : This field is deprecated, but supported for backwards compatibility.
|
model |
max_completion_tokens |
Alias for max_tokens . |
max_tokens |
n |
frequency_penalty |
presence_penalty |
reasoning_effort |
Configures how much time and how many tokens are used on a response.
low : 1024
medium : 8192
high : 24576
As no thoughts are included in the response, only one of
reasoning_effort or extra_body.google.thinking_config
may be specified.
|
response_format |
json_object : Interpreted as passing "application/json" to the
Gemini API.
json_schema .
Fully recursive schemas are not supported. additional_properties
is supported.
text : Interpreted as passing "text/plain" to the Gemini
API.
- Any other MIME type is passed as is to the model, such as passing
"application/json" directly.
|
seed |
Corresponds to GenerationConfig.seed . |
stop |
stream |
temperature |
top_p |
tools |
type
function
name
description
parameters : Specify parameters by using the
OpenAPI specification.
This differs from the OpenAI parameters field, which is
described as a JSON Schema object. To learn about keyword
differences between OpenAPI and JSON Schema, see the
OpenAPI guide.
|
tool_choice |
none
auto
required : Corresponds to the mode ANY in the
FunctionCallingConfig .
validated : Corresponds to the mode VALIDATED
in the FunctionCallingConfig . This is Google-specific.
|
web_search_options |
Corresponds to the GoogleSearch tool. No sub-options are
supported. |
function_call |
This field is deprecated, but supported for backwards
compatibility. |
functions |
This field is deprecated, but supported for backwards
compatibility. |
If you pass any unsupported parameter, it is ignored.
The Chat Completions API supports select multimodal inputs.
input_audio |
data: Any URI or valid blob format. We support all blob types,
including image, audio, and video. Anything supported by
GenerateContent is supported (HTTP, Cloud Storage, etc.).
format: OpenAI supports both wav (audio/wav)
and mp3 (audio/mp3). Using Gemini, all valid MIME
types are supported.
|
image_url |
data: Like input_audio , any URI or valid blob
format is supported.
Note that image_url as a URL will default to the image/* MIME-type
and image_url as blob data can be used as any multimodal input.
detail: Similar to
media resolution,
this determines the maximum tokens per image for the request. Note that while
OpenAI's field is per-image, Gemini enforces the same detail across
the request, and passing multiple detail types in one request will throw
an error.
|
In general, the data
parameter can be a URI or a combination of MIME type and
base64 encoded bytes in the form "data:<MIME-TYPE>;base64,<BASE64-ENCODED-BYTES>"
.
For a full list of MIME types, see GenerateContent
.
For more information on OpenAI's base64 encoding, see their documentation.
For usage, see our multimodal input examples.
Gemini-specific parameters
There are several features supported by Gemini that are not available in OpenAI models.
These features can still be passed in as parameters, but must be contained within an
extra_content
or extra_body
or they will be ignored.
extra_body
features
Include a google
field to contain any Gemini-specific
extra_body
features.
{
...,
"extra_body": {
"google": {
...,
// Add extra_body features here.
}
}
}
safety_settings |
This corresponds to Gemini's SafetySetting . |
cached_content |
This corresponds to Gemini's GenerateContentRequest.cached_content . |
thinking_config |
This corresponds to Gemini's GenerationConfig.ThinkingConfig . |
thought_tag_marker |
Used to separate a model's thoughts from its responses for models with Thinking available.
If not specified, no tags will be returned around the model's thoughts. If present, subsequent queries
will strip the thought tags and mark the thoughts appropriately for context. This helps
preserve the appropriate context for subsequent queries. |
extra_part
lets you specify additional settings at a per-Part
level.
Include a google
field to contain any Gemini-specific
extra_part
features.
{
...,
"extra_part": {
"google": {
...,
// Add extra_part features here.
}
}
}
extra_content |
A field for adding Gemini-specific content that shouldn't be
ignored. |
thought |
This will explicitly mark if a field is a thought (and take precedence over
thought_tag_marker ). This should be used to specify whether a tool call
is part of a thought or not. |
What's next