This guide provides examples of how to use the OpenAI-compatible Chat Completions API with Gemini models, covering the following topics: You can call the Chat Completions API in two ways: You can send requests as either non-streaming or streaming. The following sample shows how to send a non-streaming request.
Before trying this sample, follow the Python setup instructions in the
Vertex AI quickstart using
client libraries.
For more information, see the
Vertex AI Python API
reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials.
For more information, see
Set up authentication for a local development environment.
The following sample shows how to send a streaming request by setting
Before trying this sample, follow the Python setup instructions in the
Vertex AI quickstart using
client libraries.
For more information, see the
Vertex AI Python API
reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials.
For more information, see
Set up authentication for a local development environment.
The following sample shows how to send a multimodal request that includes text and an image.
Before trying this sample, follow the Python setup instructions in the
Vertex AI quickstart using
client libraries.
For more information, see the
Vertex AI Python API
reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials.
For more information, see
Set up authentication for a local development environment.
The following sample shows how to send a non-streaming request to a self-deployed model.
Before trying this sample, follow the Python setup instructions in the
Vertex AI quickstart using
client libraries.
For more information, see the
Vertex AI Python API
reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials.
For more information, see
Set up authentication for a local development environment.
The following sample shows how to send a streaming request to a self-deployed model.
Before trying this sample, follow the Python setup instructions in the
Vertex AI quickstart using
client libraries.
For more information, see the
Vertex AI Python API
reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials.
For more information, see
Set up authentication for a local development environment.
You can use the REST API: To pass parameters using the REST API, add them within a Python SDK: To pass parameters using the Python SDK, provide them in a dictionary to the You can use the With a string Per message in a multipart Per tool call You can use these The Chat Completions API supports a variety of multimodal input, including audio and video. You can use the
extra_body
examples: Explains how to pass additional Google-specific parameters in your requests.extra_content
examples: Demonstrates how to add extra content to messages or tool calls.curl
requests: Offers direct curl
examples for advanced use cases like multimodal input.
Method
Description
Use Case
Call a managed Gemini model
Send requests to a Google-managed endpoint for a specific Gemini model.
Best for general use cases, quick setup, and accessing the latest Google models without managing infrastructure.
Call a self-deployed model
Send requests to an endpoint that you create by deploying a model on Vertex AI.
Ideal when you need a dedicated endpoint for a fine-tuned model or require specific configurations not available on the default endpoint.
Call Gemini with the Chat Completions API
Request Type
Description
Pros
Cons
Non-streaming
The full response is generated and then sent back in a single chunk.
Simpler to implement; the complete response is available at once.
Higher perceived latency because the user waits for the entire response to be generated.
Streaming
The response is sent back in small chunks as it's being generated. To enable streaming, set
"stream": true
in the request body.Lower perceived latency; provides a more interactive experience as the response appears incrementally.
Requires more complex client-side logic to handle the incoming stream of data.
Send a non-streaming request
REST
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/openapi/chat/completions \
-d '{
"model": "google/${MODEL_ID}",
"messages": [{
"role": "user",
"content": "Write a story about a magic backpack."
}]
}'
Python
Send a streaming request
"stream": true
.REST
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/openapi/chat/completions \
-d '{
"model": "google/${MODEL_ID}",
"stream": true,
"messages": [{
"role": "user",
"content": "Write a story about a magic backpack."
}]
}'
Python
Send a prompt and an image to the Gemini API in Vertex AI
Python
Call a self-deployed model with the Chat Completions API
Send a non-streaming request
REST
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/global/endpoints/${ENDPOINT}/chat/completions \
-d '{
"messages": [{
"role": "user",
"content": "Write a story about a magic backpack."
}]
}'
Python
Send a streaming request
REST
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/global/endpoints/${ENDPOINT}/chat/completions \
-d '{
"stream": true,
"messages": [{
"role": "user",
"content": "Write a story about a magic backpack."
}]
}'
Python
extra_body
examplesextra_body
field to pass Google-specific parameters in your request.
google
object.{
...,
"extra_body": {
"google": {
...,
"thought_tag_marker": "..."
}
}
}
extra_body
argument.client.chat.completions.create(
...,
extra_body = {
'extra_body': { 'google': { ... } }
},
)
extra_content
examplesextra_content
field with the REST API to add extra information to messages or tool calls.
content
field{
"messages": [
{ "role": "...", "content": "...", "extra_content": { "google": { ... } } }
]
}
content
field{
"messages": [
{
"role": "...",
"content": [
{ "type": "...", ..., "extra_content": { "google": { ... } } }
]
}
}
{
"messages": [
{
"role": "...",
"tool_calls": [
{
...,
"extra_content": { "google": { ... } }
}
]
}
]
}
Sample
curl
requestscurl
requests to interact with the API directly, without using an SDK.Use
thinking_config
with extra_body
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/us-central1/endpoints/openapi/chat/completions \
-d '{ \
"model": "google/gemini-2.5-flash-preview-04-17", \
"messages": [ \
{ "role": "user", \
"content": [ \
{ "type": "text", \
"text": "Are there any primes number of the form n*ceil(log(n))" \
}] }], \
"extra_body": { \
"google": { \
"thinking_config": { \
"include_thoughts": true, "thinking_budget": 10000 \
}, \
"thought_tag_marker": "think" } }, \
"stream": true }'
Multimodal requests
Pass image data with
image_url
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT}/locations/us-central1/endpoints/openapi/chat/completions \
-d '{ \
"model": "google/gemini-2.0-flash-001", \
"messages": [{ "role": "user", "content": [ \
{ "type": "text", "text": "Describe this image" }, \
{ "type": "image_url", "image_url": "gs://cloud-samples-data/generative-ai/image/scones.jpg" }] }] }'
Pass audio data with
input_audio
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT}/locations/us-central1/endpoints/openapi/chat/completions \
-d '{ \
"model": "google/gemini-2.0-flash-001", \
"messages": [ \
{ "role": "user", \
"content": [ \
{ "type": "text", "text": "Describe this: " }, \
{ "type": "input_audio", "input_audio": { \
"format": "audio/mp3", \
"data": "gs://cloud-samples-data/generative-ai/audio/pixel.mp3" } }] }] }'
Structured output
response_format
parameter to request structured JSON output from the model.Example with the Python SDK
from pydantic import BaseModel
from openai import OpenAI
client = OpenAI()
class CalendarEvent(BaseModel):
name: str
date: str
participants: list[str]
completion = client.beta.chat.completions.parse(
model="google/gemini-2.5-flash-preview-04-17",
messages=[
{"role": "system", "content": "Extract the event information."},
{"role": "user", "content": "Alice and Bob are going to a science fair on Friday."},
],
response_format=CalendarEvent,
)
print(completion.choices[0].message.parsed)
What's next
Examples
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-15 UTC.