The Chat Completions API lets you send requests to Vertex AI models by using the OpenAI libraries for Python and REST. If you're already using the OpenAI libraries, you can use this API to switch between calling OpenAI models and Vertex AI hosted models to compare output, cost, and scalability, without changing your existing code. If you aren't already using the OpenAI libraries, we recommend that you call the Gemini API directly.
Supported models
The Chat Completions API supports both Gemini models and select self-deployed models from Model Garden.
Gemini models
The following table shows the Gemini models that are supported:
Model | Version |
---|---|
Gemini 2.0 Flash | google/gemini-2.0-flash-exp |
Gemini 1.5 Flash | google/gemini-1.5-flash |
Gemini 1.5 Pro | google/gemini-1.5-pro |
Gemini 1.0 Pro Vision | google/gemini-1.0-pro-vision google/gemini-1.0-pro-vision-001 |
Gemini 1.0 Pro | google/gemini-1.0-pro-002 google/gemini-1.0-pro-001 google/gemini-1.0-pro |
Self-deployed models from Model Garden
The Hugging Face Text Generation Interface (HF TGI) and Vertex AI Model Garden prebuilt vLLM containers support the Chat Completions API. However, not every model deployed to these containers supports the Chat Completions API. The following table includes the most popular supported models by container:
HF TGI |
vLLM |
---|---|
Authenticate
To use the OpenAI Python libraries, install the OpenAI SDK:
pip install openai
To authenticate with the Chat Completions API, you can either modify your client setup or change your environment configuration to use Google authentication and a Vertex AI endpoint. Choose whichever method that's easier, and follow the steps for setting up depending on whether you want to call Gemini models or self-deployed Model Garden models.
Certain models in Model Garden and
supported Hugging Face models
need to be
deployed to a Vertex AI endpoint
first before they can serve requests.
When
calling these self-deployed models from the Chat Completions API, you need to
specify the endpoint ID. To list your
existing Vertex AI endpoints, use the
gcloud ai endpoints list
command.
Client setup
To programmatically get Google credentials in Python, you can use the
google-auth
Python SDK:
pip install google-auth
pip install requests
Change the OpenAI SDK to point to the Vertex AI chat completions endpoint:
# Programmatically get an access token
creds, project = google.auth.default()
auth_req = google.auth.transport.requests.Request()
creds.refresh(auth_req)
# Note: the credential lives for 1 hour by default (https://cloud.google.com/docs/authentication/token-types#at-lifetime); after expiration, it must be refreshed.
# Pass the Vertex endpoint and authentication to the OpenAI SDK
PROJECT_ID = 'PROJECT_ID'
LOCATION = 'LOCATION'
##############################
# Choose one of the following:
##############################
# If you are calling a Gemini model, set the MODEL_ID variable and set
# your client's base URL to use openapi.
MODEL_ID = 'MODEL_ID'
client = openai.OpenAI(
base_url = f'https://{LOCATION}-aiplatform.googleapis.com/v1beta1/projects/{PROJECT_ID}/locations/{LOCATION}/endpoints/openapi',
api_key = creds.token)
# If you are calling a self-deployed model from Model Garden, set the
# ENDPOINT_ID variable and set your client's base URL to use your endpoint.
MODEL_ID = 'MODEL_ID'
client = openai.OpenAI(
base_url = f'https://{LOCATION}-aiplatform.googleapis.com/v1beta1/projects/{PROJECT_ID}/locations/{LOCATION}/endpoints/{ENDPOINT}',
api_key = creds.token)
By default, access tokens last for 1 hour. You can
extend the life of your access token
or periodically refresh your token and update the openai.api_key
variable.
Environment variables
Install the Google Cloud CLI. The OpenAI library can
read the OPENAI_API_KEY
and OPENAI_BASE_URL
environment
variables to change the authentication and endpoint in their default client.
Set the following variables:
$ export PROJECT_ID=PROJECT_ID
$ export LOCATION=LOCATION
$ export OPENAI_API_KEY="$(gcloud auth application-default print-access-token)"
To call a Gemini model, set the MODEL_ID
variable and use the openapi
endpoint:
$ export MODEL_ID=MODEL_ID
$ export OPENAI_BASE_URL="https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/openapi"
To call a self-deployed model from Model Garden, set the ENDPOINT
variable and use that in your URL instead:
$ export ENDPOINT=ENDPOINT_ID
$ export OPENAI_BASE_URL="https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/${ENDPOINT}"
Next, initialize the client:
client = openai.OpenAI()
The Gemini Chat Completions API uses OAuth to authenticate
with a
short-lived access token.
By default, access tokens last for 1 hour. You can
extend the life of your access token
or periodically refresh your token and update the OPENAI_API_KEY
environment variable.
Call Gemini with the Chat Completions API
The following sample shows you how to send non-streaming requests:
curl
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/openapi/chat/completions \ -d '{ "model": "google/${MODEL_ID}", "messages": [{ "role": "user", "content": "Write a story about a magic backpack." }] }'
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
The following sample shows you how to send streaming requests to a Gemini model by using the Chat Completions API:
curl
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/openapi/chat/completions \ -d '{ "model": "google/${MODEL_ID}", "stream": true, "messages": [{ "role": "user", "content": "Write a story about a magic backpack." }] }'
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Call a self-deployed model with the Chat Completions API
The following sample shows you how to send non-streaming requests:
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://us-central1-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/us-central1/endpoints/${ENDPOINT}/chat/completions \ -d '{ "messages": [{ "role": "user", "content": "Write a story about a magic backpack." }] }'
The following sample shows you how to send streaming requests to a self-deployed model by using the Chat Completions API:
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://us-central1-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/us-central1/endpoints/${ENDPOINT}/chat/completions \ -d '{ "stream": true, "messages": [{ "role": "user", "content": "Write a story about a magic backpack." }] }'
Supported parameters
For Google models, the Chat Completions API supports the following OpenAI parameters. For a description of each parameter, see OpenAI's documentation on Creating chat completions. Parameter support for third-party models varies by model. To see which parameters are supported, consult the model's documentation.
messages |
|
model |
|
max_tokens |
|
n |
|
frequency_penalty |
|
presence_penalty |
|
response_format |
|
stop |
|
stream |
|
temperature |
|
top_p |
|
tools |
|
tool_choice |
|
function_call |
This field is deprecated, but supported for backwards compatibility. |
functions |
This field is deprecated, but supported for backwards compatibility. |
If you pass any unsupported parameter, it is ignored.
Refresh your credentials
The following example shows how to refresh your credentials automatically as needed:
Python
What's next
- See examples of calling the Inference API with the OpenAI-compatible syntax.
- See examples of calling the Function Calling API with OpenAI-compatible syntax.
- Learn more about the Gemini API.
- Learn more about migrating from Azure OpenAI to the Gemini API.