Try Gemini 1.5 models, the latest multimodal models in Vertex AI, and see what you can build with up to a 2M token context window. Try Gemini 1.5 models, the latest multimodal models in Vertex AI, and see what you can build with up to a 2M token context window.

Call Gemini by using the OpenAI library

The Gemini Chat Completions API lets you send requests to the Vertex AI Gemini API by using the OpenAI libraries for Python and REST. If you are already using the OpenAI libraries, you can use this API to switch between calling OpenAI models and Gemini models to compare output, cost, and scalability, without changing your existing code. If you are not already using the OpenAI libraries, we recommend that you call the Gemini API directly.

Supported models

Model	Version
Gemini 1.5 Flash	`google/gemini-1.5-flash-001`
Gemini 1.5 Pro	`google/gemini-1.5-pro-001`
Gemini 1.0 Pro Vision	`google/gemini-1.0-pro-vision` `google/gemini-1.0-pro-vision-001`
Gemini 1.0 Pro	`google/gemini-1.0-pro-002` `google/gemini-1.0-pro-001` `google/gemini-1.0-pro`

Authenticate

To use the OpenAI Python libraries, install the OpenAI SDK:

pip install openai

To authenticate with the Gemini Chat Completions API, you can either modify your client setup or change your environment configuration to use Google authentication and a Vertex AI endpoint. Choose one of the following, whichever is easier:

Client setup

To programmatically get Google credentials in Python, you can use the google-auth Python SDK:

pip install google-auth
pip install requests

Change the OpenAI SDK to point to the Vertex AI chat completions endpoint:

# Programmatically get an access token
creds, project = google.auth.default()
auth_req = google.auth.transport.requests.Request()
creds.refresh(auth_req)
# Note: the credential lives for 1 hour by default (https://cloud.google.com/docs/authentication/token-types#at-lifetime); after expiration, it must be refreshed.

# Pass the Vertex endpoint and authentication to the OpenAI SDK
PROJECT = 'PROJECT_ID'
LOCATION = 'LOCATION'
MODEL_ID = 'MODEL_ID'
client = openai.OpenAI(
    base_url = f'https://{LOCATION}-aiplatform.googleapis.com/v1beta1/projects/{PROJECT}/locations/{LOCATION}/endpoints/openapi',
    api_key = creds.token)

By default, access tokens last for 1 hour. You can extend the life of your access token or periodically refresh your token and update the openai.api_key variable.

Environment variables

Install the Google Cloud CLI. The OpenAI library can read the OPENAI_API_KEY and OPENAI_BASE_URL environment variables to change the authentication and endpoint in their default client. Set the following variables:

$ export PROJECT=PROJECT_ID
$ export LOCATION=LOCATION
$ export MODEL_ID=MODEL_ID
$ export OPENAI_API_KEY="$(gcloud auth application-default print-access-token)"
$ export OPENAI_BASE_URL="https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT}/locations/${LOCATION}/endpoints/openapi"

Next, initialize the client:

client = openai.OpenAI()

The Gemini Chat Completions API uses OAuth to authenticate with a short-lived access token. By default, access tokens last for 1 hour. You can extend the life of your access token or periodically refresh your token and update the OPENAI_API_KEY environment variable.

Call the Gemini Chat Completions API

Text input

The following sample shows you how to send non-streaming requests:

curl

  curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
  https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/openapi/chat/completions \
  -d '{
    "model": "google/${MODEL_ID}",
    "messages": [{
      "role": "user",
      "content": "Write a story about a magic backpack."
    }]
  }'

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.

import vertexai
import openai

from google.auth import default, transport

# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"
# location = "us-central1"

vertexai.init(project=project_id, location=location)

# Programmatically get an access token
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
auth_request = transport.requests.Request()
credentials.refresh(auth_request)

# # OpenAI Client
client = openai.OpenAI(
    base_url=f"https://{location}-aiplatform.googleapis.com/v1beta1/projects/{project_id}/locations/{location}/endpoints/openapi",
    api_key=credentials.token,
)

response = client.chat.completions.create(
    model="google/gemini-1.5-flash-001",
    messages=[{"role": "user", "content": "Why is the sky blue?"}],
)

print(response)

The following sample shows you how to send streaming requests:

curl

  curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
  https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/openapi/chat/completions \
  -d '{
    "model": "google/${MODEL_ID}",
    "stream": true,
    "messages": [{
      "role": "user",
      "content": "Write a story about a magic backpack."
    }]
  }'

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.

import vertexai
import openai

from google.auth import default, transport

# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"
# location = "us-central1"

vertexai.init(project=project_id, location=location)

# Programmatically get an access token
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
auth_request = transport.requests.Request()
credentials.refresh(auth_request)

# OpenAI Client
client = openai.OpenAI(
    base_url=f"https://{location}-aiplatform.googleapis.com/v1beta1/projects/{project_id}/locations/{location}/endpoints/openapi",
    api_key=credentials.token,
)

response = client.chat.completions.create(
    model="google/gemini-1.5-flash-001",
    messages=[{"role": "user", "content": "Why is the sky blue?"}],
    stream=True,
)
for chunk in response:
    print(chunk)

Supported parameters

The Gemini Chat Completions API supports the following OpenAI parameters. For a description of each parameter, see OpenAI's documentation on Creating chat completions.

`messages`	`System message` `User message`: The `text` and `image_url` types are supported. The `image_url` type only supports images stored in Cloud Storage. To learn how to create a Cloud Storage bucket and upload a file to it, see Discover object storage. The `detail` option is not supported. `Assistant message` `Tool message` `Function message`: This field is deprecated, but supported for backwards compatibility.
`model`
`max_tokens`
`n`
`response_format`	`json_object`: Interpreted as passing "application/json" to the Gemini API. `text`: Interpreted as passing "text/plain" to the Gemini API. Any other MIME type is passed as is to the model, such as passing "application/json" directly.
`stop`
`stream`
`temperature`
`top_p`
`tools`	`type` `function` `name` `description` `parameters`: Specify parameters by using the OpenAPI specification. This differs from the OpenAI parameters field, which is described as a JSON Schema object. To learn about keyword differences between OpenAPI and JSON Schema, see the OpenAPI guide.
`tool_choice`	`none` `auto` `required`: Corresponds to the mode `ANY` in the `FunctionCallingConfig`.
`function_call`	This field is deprecated, but supported for backwards compatibility.
`functions`	This field is deprecated, but supported for backwards compatibility.

If you pass any unsupported parameter, it is ignored.

What's next

See examples of calling the Inference API with the OpenAI-compatible syntax.
See examples of calling the Function Calling API with OpenAI-compatible syntax.
Learn more about the Gemini API.
Learn more about migrating from Azure OpenAI to the Gemini API.