Call Gemini by using the OpenAI library

The Gemini Chat Completions API lets you send requests to the Vertex AI Gemini API by using the OpenAI libraries for Python and REST. If you are already using the OpenAI libraries, you can use this API to switch between calling OpenAI models and Gemini models to compare output, cost, and scalability, without changing your existing code. If you are not already using the OpenAI libraries, we recommend that you call the Gemini API directly.

Supported models

Model Version
Gemini 1.5 Flash google/gemini-1.5-flash-001
Gemini 1.5 Pro google/gemini-1.5-pro-001
Gemini 1.0 Pro Vision google/gemini-1.0-pro-vision
google/gemini-1.0-pro-vision-001
Gemini 1.0 Pro google/gemini-1.0-pro-002
google/gemini-1.0-pro-001
google/gemini-1.0-pro

Authenticate

To use the OpenAI Python libraries, install the OpenAI SDK:

pip install openai

To authenticate with the Gemini Chat Completions API, you can either modify your client setup or change your environment configuration to use Google authentication and a Vertex AI endpoint. Choose one of the following, whichever is easier:

Client setup

To programmatically get Google credentials in Python, you can use the google-auth Python SDK:

pip install google-auth
pip install requests

Change the OpenAI SDK to point to the Vertex AI chat completions endpoint:

# Programmatically get an access token
creds, project = google.auth.default()
auth_req = google.auth.transport.requests.Request()
creds.refresh(auth_req)
# Note: the credential lives for 1 hour by default (https://cloud.google.com/docs/authentication/token-types#at-lifetime); after expiration, it must be refreshed.

# Pass the Vertex endpoint and authentication to the OpenAI SDK
PROJECT = 'PROJECT_ID'
LOCATION = 'LOCATION'
MODEL_ID = 'MODEL_ID'
client = openai.OpenAI(
    base_url = f'https://{LOCATION}-aiplatform.googleapis.com/v1beta1/projects/{PROJECT}/locations/{LOCATION}/endpoints/openapi',
    api_key = creds.token)

By default, access tokens last for 1 hour. You can extend the life of your access token or periodically refresh your token and update the openai.api_key variable.

Environment variables

Install the Google Cloud CLI. The OpenAI library can read the OPENAI_API_KEY and OPENAI_BASE_URL environment variables to change the authentication and endpoint in their default client. Set the following variables:

$ export PROJECT=PROJECT_ID
$ export LOCATION=LOCATION
$ export MODEL_ID=MODEL_ID
$ export OPENAI_API_KEY="$(gcloud auth application-default print-access-token)"
$ export OPENAI_BASE_URL="https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT}/locations/${LOCATION}/endpoints/openapi"

Next, initialize the client:

client = openai.OpenAI()

The Gemini Chat Completions API uses OAuth to authenticate with a short-lived access token. By default, access tokens last for 1 hour. You can extend the life of your access token or periodically refresh your token and update the OPENAI_API_KEY environment variable.

Call the Gemini Chat Completions API

The following sample shows you how to send non-streaming requests:

curl

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/openapi/chat/completions \
-d '{
  "model": "google/${MODEL_ID}",
  "messages": [{
    "role": "user",
    "content": "Write a story about a magic backpack."
  }]
}'

Python

import openai

client = openai.OpenAI()
model_response = client.chat.completions.create(
  model = f"google/{MODEL_ID}",
  messages = [{"role": "user", "content": "Write a story about a magic backpack." }]
)

print(model_response)

The following sample shows you how to send streaming requests:

curl

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/openapi/chat/completions \
-d '{
  "model": "google/${MODEL_ID}",
  "stream": true,
  "messages": [{
    "role": "user",
    "content": "Write a story about a magic backpack."
  }]
}'

Python

import openai

client = openai.OpenAI()
model_response = client.chat.completions.create(
  model = f"google/{MODEL_ID}",
  stream = True,
  messages = [{"role": "user", "content": "Write a story about a magic backpack." }]
)

print(model_response)

Supported parameters

The Gemini Chat Completions API supports the following OpenAI parameters. For a description of each parameter, see OpenAI's documentation on Creating chat completions.

messages
  • System message
  • User message: The text and image_url types are supported. The image_url type only supports images stored in Cloud Storage. To learn how to create a Cloud Storage bucket and upload a file to it, see Discover object storage. The detail option is not supported.
  • Assistant message
  • Tool message
  • Function message: This field is deprecated, but supported for backwards compatibility.
model
max_tokens
n
response_format
  • json_object: Interpreted as passing "application/json" to the Gemini API.
  • text: Interpreted as passing "text/plain" to the Gemini API.
  • Any other MIME type is passed as is to the model, such as passing "application/json" directly.
stop
stream
temperature
top_p
tools
  • type
  • function
    • name
    • description
    • parameters
tool_choice
  • none
  • auto
  • required: Corresponds to the mode ANY in the FunctionCallingConfig.
function_call This field is deprecated, but supported for backwards compatibility.
functions This field is deprecated, but supported for backwards compatibility.

If you pass any unsupported parameter, it is ignored.

What's next