The Gemini Chat Completions API lets you send requests to the Vertex AI Gemini API by using the OpenAI libraries for Python and REST. If you are already using the OpenAI libraries, you can use this API to switch between calling OpenAI models and Gemini models to compare output, cost, and scalability, without changing your existing code. If you are not already using the OpenAI libraries, we recommend that you call the Gemini API directly.
Supported models
Model | Version |
---|---|
Gemini 1.5 Flash | google/gemini-1.5-flash-001 |
Gemini 1.5 Pro | google/gemini-1.5-pro-001 |
Gemini 1.0 Pro Vision | google/gemini-1.0-pro-vision google/gemini-1.0-pro-vision-001 |
Gemini 1.0 Pro | google/gemini-1.0-pro-002 google/gemini-1.0-pro-001 google/gemini-1.0-pro |
Authenticate
To use the OpenAI Python libraries, install the OpenAI SDK:
pip install openai
To authenticate with the Gemini Chat Completions API, you can either modify your client setup or change your environment configuration to use Google authentication and a Vertex AI endpoint. Choose one of the following, whichever is easier:
Client setup
To programmatically get Google credentials in Python, you can use the
google-auth
Python SDK:
pip install google-auth
pip install requests
Change the OpenAI SDK to point to the Vertex AI chat completions endpoint:
# Programmatically get an access token
creds, project = google.auth.default()
auth_req = google.auth.transport.requests.Request()
creds.refresh(auth_req)
# Note: the credential lives for 1 hour by default (https://cloud.google.com/docs/authentication/token-types#at-lifetime); after expiration, it must be refreshed.
# Pass the Vertex endpoint and authentication to the OpenAI SDK
PROJECT = 'PROJECT_ID'
LOCATION = 'LOCATION'
MODEL_ID = 'MODEL_ID'
client = openai.OpenAI(
base_url = f'https://{LOCATION}-aiplatform.googleapis.com/v1beta1/projects/{PROJECT}/locations/{LOCATION}/endpoints/openapi',
api_key = creds.token)
By default, access tokens last for 1 hour. You can
extend the life of your access token
or periodically refresh your token and update the openai.api_key
variable.
Environment variables
Install the Google Cloud CLI. The OpenAI library can
read the OPENAI_API_KEY
and OPENAI_BASE_URL
environment
variables to change the authentication and endpoint in their default client.
Set the following variables:
$ export PROJECT=PROJECT_ID
$ export LOCATION=LOCATION
$ export MODEL_ID=MODEL_ID
$ export OPENAI_API_KEY="$(gcloud auth application-default print-access-token)"
$ export OPENAI_BASE_URL="https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT}/locations/${LOCATION}/endpoints/openapi"
Next, initialize the client:
client = openai.OpenAI()
The Gemini Chat Completions API uses OAuth to authenticate
with a
short-lived access token.
By default, access tokens last for 1 hour. You can
extend the life of your access token
or periodically refresh your token and update the OPENAI_API_KEY
environment variable.
Call the Gemini Chat Completions API
The following sample shows you how to send non-streaming requests:
curl
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/openapi/chat/completions \ -d '{ "model": "google/${MODEL_ID}", "messages": [{ "role": "user", "content": "Write a story about a magic backpack." }] }'
Python
import openai client = openai.OpenAI() model_response = client.chat.completions.create( model = f"google/{MODEL_ID}", messages = [{"role": "user", "content": "Write a story about a magic backpack." }] ) print(model_response)
The following sample shows you how to send streaming requests:
curl
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/openapi/chat/completions \ -d '{ "model": "google/${MODEL_ID}", "stream": true, "messages": [{ "role": "user", "content": "Write a story about a magic backpack." }] }'
Python
import openai client = openai.OpenAI() model_response = client.chat.completions.create( model = f"google/{MODEL_ID}", stream = True, messages = [{"role": "user", "content": "Write a story about a magic backpack." }] ) print(model_response)
Supported parameters
The Gemini Chat Completions API supports the following OpenAI parameters. For a description of each parameter, see OpenAI's documentation on Creating chat completions.
messages |
|
model |
|
max_tokens |
|
n |
|
response_format |
|
stop |
|
stream |
|
temperature |
|
top_p |
|
tools |
|
tool_choice |
|
function_call |
This field is deprecated, but supported for backwards compatibility. |
functions |
This field is deprecated, but supported for backwards compatibility. |
If you pass any unsupported parameter, it is ignored.
What's next
- See examples of calling the Inference API with the OpenAI-compatible syntax.
- See examples of calling the Function Calling API with OpenAI-compatible syntax.
- Learn more about the Gemini API.
- Learn more about migrating from Azure OpenAI to the Gemini API.