The following sample shows you how to send non-streaming requests:
Before trying this sample, follow the Python setup instructions in the
Vertex AI quickstart using
client libraries.
For more information, see the
Vertex AI Python API
reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials.
For more information, see
Set up authentication for a local development environment.
The following sample shows you how to send streaming requests to a
Gemini model by using the Chat Completions API:
Before trying this sample, follow the Python setup instructions in the
Vertex AI quickstart using
client libraries.
For more information, see the
Vertex AI Python API
reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials.
For more information, see
Set up authentication for a local development environment.
Before trying this sample, follow the Python setup instructions in the
Vertex AI quickstart using
client libraries.
For more information, see the
Vertex AI Python API
reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials.
For more information, see
Set up authentication for a local development environment.
The following sample shows you how to send non-streaming requests:
Before trying this sample, follow the Python setup instructions in the
Vertex AI quickstart using
client libraries.
For more information, see the
Vertex AI Python API
reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials.
For more information, see
Set up authentication for a local development environment.
The following sample shows you how to send streaming requests to a
self-deployed model by using the Chat Completions API:
Before trying this sample, follow the Python setup instructions in the
Vertex AI quickstart using
client libraries.
For more information, see the
Vertex AI Python API
reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials.
For more information, see
Set up authentication for a local development environment.
You can use either the SDK or the REST API to pass in You can populate this field by using the REST API directly. You can use these The Chat Completions API supports a variety of multimodal input, including both
audio and video. You can use the Call Gemini with the Chat Completions API
REST
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/openapi/chat/completions \
-d '{
"model": "google/${MODEL_ID}",
"messages": [{
"role": "user",
"content": "Write a story about a magic backpack."
}]
}'
Python
REST
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/openapi/chat/completions \
-d '{
"model": "google/${MODEL_ID}",
"stream": true,
"messages": [{
"role": "user",
"content": "Write a story about a magic backpack."
}]
}'
Python
Send a prompt and an image to the Gemini API in Vertex AI
Python
Call a self-deployed model with the Chat Completions API
REST
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/global/endpoints/${ENDPOINT}/chat/completions \
-d '{
"messages": [{
"role": "user",
"content": "Write a story about a magic backpack."
}]
}'
Python
REST
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/global/endpoints/${ENDPOINT}/chat/completions \
-d '{
"stream": true,
"messages": [{
"role": "user",
"content": "Write a story about a magic backpack."
}]
}'
Python
extra_body
examplesextra_body
.Add
thought_tag_marker
{
...,
"extra_body": {
"google": {
...,
"thought_tag_marker": "..."
}
}
}
Add
extra_body
using the SDKclient.chat.completions.create(
...,
extra_body = {
'extra_body': { 'google': { ... } }
},
)
extra_content
examplesextra_content
with string content
{
"messages": [
{ "role": "...", "content": "...", "extra_content": { "google": { ... } } }
]
}
Per-message
extra_content
{
"messages": [
{
"role": "...",
"content": [
{ "type": "...", ..., "extra_content": { "google": { ... } } }
]
}
}
Per-tool call
extra_content
{
"messages": [
{
"role": "...",
"tool_calls": [
{
...,
"extra_content": { "google": { ... } }
}
]
}
]
}
Sample
curl
requestscurl
requests directly, rather than going through the SDK.Use
thinking_config
with extra_body
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/us-central1/endpoints/openapi/chat/completions \
-d '{ \
"model": "google/gemini-2.5-flash-preview-04-17", \
"messages": [ \
{ "role": "user", \
"content": [ \
{ "type": "text", \
"text": "Are there any primes number of the form n*ceil(log(n))" \
}] }], \
"extra_body": { \
"google": { \
"thinking_config": { \
"include_thoughts": true, "thinking_budget": 10000 \
}, \
"thought_tag_marker": "think" } }, \
"stream": true }'
Multimodal requests
Use
image_url
to pass in image datacurl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT}/locations/us-central1/endpoints/openapi/chat/completions \
-d '{ \
"model": "google/gemini-2.0-flash-001", \
"messages": [{ "role": "user", "content": [ \
{ "type": "text", "text": "Describe this image" }, \
{ "type": "image_url", "image_url": "gs://cloud-samples-data/generative-ai/image/scones.jpg" }] }] }'
Use
input_audio
to pass in audio datacurl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT}/locations/us-central1/endpoints/openapi/chat/completions \
-d '{ \
"model": "google/gemini-2.0-flash-001", \
"messages": [ \
{ "role": "user", \
"content": [ \
{ "type": "text", "text": "Describe this: " }, \
{ "type": "input_audio", "input_audio": { \
"format": "audio/mp3", \
"data": "gs://cloud-samples-data/generative-ai/audio/pixel.mp3" } }] }] }'
Structured output
response_format
parameter to get structured output.Example using SDK
from pydantic import BaseModel
from openai import OpenAI
client = OpenAI()
class CalendarEvent(BaseModel):
name: str
date: str
participants: list[str]
completion = client.beta.chat.completions.parse(
model="google/gemini-2.5-flash-preview-04-17",
messages=[
{"role": "system", "content": "Extract the event information."},
{"role": "user", "content": "Alice and Bob are going to a science fair on Friday."},
],
response_format=CalendarEvent,
)
print(completion.choices[0].message.parsed)
What's next
Examples
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-15 UTC.