Use generateContent
or streamGenerateContent
to generate content with
Gemini.
The Gemini model family includes models that work with multimodal prompt requests. The term multimodal indicates that you can use more than one modality, or type of input, in a prompt. Models that aren't multimodal accept prompts only with text. Modalities can include text, audio, video, and more.
Create a Google Cloud account to get started
To start using the Vertex AI API for Gemini, create a Google Cloud account.
After creating your account, use this document to review the Gemini model request body, model parameters, response body, and some sample requests.
When you're ready, see the Vertex AI API for Gemini quickstart to learn how to send a request to the Vertex AI Gemini API using a using a programming language SDK or the REST API.
Supported models
Model | Version |
---|---|
Gemini 1.5 Flash | gemini-1.5-flash-001 |
Gemini 1.5 Pro | gemini-1.5-pro-001 |
Gemini 1.0 Pro Vision | gemini-1.0-pro-001 gemini-1.0-pro-vision-001 |
Gemini 1.0 Pro | gemini-1.0-pro gemini-1.0-pro-001 gemini-1.0-pro-002 |
Example syntax
Syntax to generate a model response.
Non-streaming
curl
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent \ -d '{ "contents": [{ ... }], "generationConfig": { ... }, "safetySettings": { ... } ... }'
Python
gemini_model = GenerativeModel(MODEL_ID) generation_config = GenerationConfig(...) model_response = gemini_model.generate_content([...], generation_config, safety_settings={...})
Streaming
curl
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:streamGenerateContent \ -d '{ "contents": [{ ... }], "generationConfig": { ... }, "safetySettings": { ... } ... }'
Python
gemini_model = GenerativeModel(MODEL_ID) model_response = gemini_model.generate_content([...], generation_config, safety_settings={...}, stream=True)
Parameter list
See examples for implementation details.
Request body
{ "contents": [ { "role": string, "parts": [ { // Union field data can be only one of the following: "text": string, "inlineData": { "mimeType": string, "data": string }, "fileData": { "mimeType": string, "fileUri": string }, // End of list of possible types for union field data. "videoMetadata": { "startOffset": { "seconds": integer, "nanos": integer }, "endOffset": { "seconds": integer, "nanos": integer } } } ] } ], "systemInstruction": { "role": string, "parts": [ { "text": string } ] }, "tools": [ { "functionDeclarations": [ { "name": string, "description": string, "parameters": { object (OpenAPI Object Schema) } } ] } ], "safetySettings": [ { "category": enum (HarmCategory), "threshold": enum (HarmBlockThreshold) } ], "generationConfig": { "temperature": number, "topP": number, "topK": number, "candidateCount": integer, "maxOutputTokens": integer, "presencePenalty": float, "frequencyPenalty": float, "stopSequences": [ string ], "responseMimeType": string, "responseSchema": schema, "seed": integer } }
The request body contains data with the following parameters:
Parameters | |
---|---|
|
Required: The content of the current conversation with the model. For single-turn queries, this is a single instance. For multi-turn queries, this is a repeated field that contains conversation history and the latest request. |
|
Optional: Available for Instructions for the model to steer it toward better performance. For example, "Answer as concisely as possible" or "Don't use technical terms in your response". The The |
|
Optional. A piece of code that enables the system to interact with external systems to perform an action, or set of actions, outside of knowledge and scope of the model. See Function calling. |
|
Optional. See Function calling. |
|
Optional: Per request settings for blocking unsafe content. Enforced on |
|
Optional: Generation configuration settings. |
|
Optional: Cached content. You can use cached content in requests that contain repeated content. |
contents
The base structured data type containing multi-part content of a message.
This class consists of two main properties: role
and parts
. The role
property denotes the individual producing the content, while the parts
property contains multiple elements, each representing a segment of data within
a message.
Parameters | |
---|---|
|
Optional: The identity of the entity that creates the message. The following values are supported:
The For non-multi-turn conversations, this field can be left blank or unset. |
|
A list of ordered parts that make up a single message. Different parts may have different IANA MIME types. For limits on the inputs, such as the maximum number of tokens or the number of images, see the model specifications on the Google models page. To compute the number of tokens in your request, see Get token count. |
parts
A data type containing media that is part of a multi-part Content
message.
Parameters | |
---|---|
|
Optional: A text prompt or code snippet. |
|
Optional: Inline data in raw bytes. For |
|
Optional: Data stored in a file. |
|
Optional: It contains a string representing the See Function calling. |
|
Optional: The result output of a See Function calling. |
|
Optional: For video input, the start and end offset of the video in Duration format. For example, to specify a 10 second clip starting at 1:00, set The metadata should only be specified while the video data is presented in |
blob
Content blob. If possible send as text rather than raw bytes.
Parameters | |
---|---|
|
data or fileUri
fields. Acceptable values include the following:
Click to expand MIME types
For For Gemini 1.5 Pro and Gemini 1.5 Flash, the maximum length of an audio file is 8.4 hours and the maximum length of a video file (without audio) is one hour. For more information, see Gemini 1.5 Pro media requirements. Text files must be UTF-8 encoded. The contents of the text file count toward the token limit. There is no limit on image resolution. |
|
The base64 encoding of the image, PDF, or video
to include inline in the prompt. When including media inline, you must also specify the media
type ( Size limit: 20MB |
CachedContent
Used to update when a context cache expires. You must specify ttl
or expireTime
when you update CachedContent
, but you can't specify both. For more information, see
Use context caching.
Parameters | |
---|---|
|
Used to specify the number of seconds and nanoseconds after a context cache is created or updated that the context cache lives before it expires. |
|
A timestamp that specifies when a context cache expires. |
TTL
The time to live, or duration, after a context cache is created or updated before it expires.
Parameters | |
---|---|
|
The seconds component of the duration before a context cache expires after it's created. The default value is 3,600 seconds. |
|
Optional: The nanoseconds component of the duration before a context cache expires after it's created. |
FileData
URI based data.
Parameters | |
---|---|
|
IANA MIME type of the data. |
|
The Cloud Storage URI of the file to include in the prompt. The bucket object must either be
publicly readable or reside in the same Google Cloud project that's sending the request. You must also
specify the media type ( For For |
functionCall
A predicted functionCall
returned from the model that contains a string
representing the functionDeclaration.name
and a structured JSON object
containing the parameters and their values.
Parameters | |
---|---|
|
The name of the function to call. |
|
The function parameters and values in JSON object format. See Function calling for parameter details. |
functionResponse
The resulting output from a FunctionCall
that contains a string representing the
FunctionDeclaration.name
. Also contains a structured JSON object with the
output from the function (and uses it as context for the model). This should contain the
result of a FunctionCall
made based on model prediction.
Parameters | |
---|---|
|
The name of the function to call. |
|
The function response in JSON object format. |
videoMetadata
Metadata describing the input video content.
Parameters | |
---|---|
|
Optional: The start offset of the video. |
|
Optional: The end offset of the video. |
safetySetting
Safety settings.
Parameters | |
---|---|
|
Optional:
The safety category to configure a threshold for. Acceptable values include the following:
Click to expand safety categories |
|
Optional: The threshold for blocking responses that could belong to the specified safety category based on probability.
|
|
Optional: Specify if the threshold is used for probability or severity score. If not specified, the threshold is used for probability score. |
harmCategory
Hrm categories that block content.
Parameters | |
---|---|
|
The harm category is unspecified. |
|
The harm category is hate speech. |
|
The harm category is dangerous content. |
|
The harm category is harassment. |
|
The harm category is sexually explicit content. |
harmBlockThreshold
Probability thresholds levels used to block a response.
Parameters | |
---|---|
|
Unspecified harm block threshold. |
|
Block low threshold and higher (i.e. block more). |
|
Block medium threshold and higher. |
|
Block only high threshold (i.e. block less). |
|
Block none. |
harmBlockMethod
A probability threshold that blocks a response based on a combination of probability and severity.
Parameters | |
---|---|
|
The harm block method is unspecified. |
|
The harm block method uses both probability and severity scores. |
|
The harm block method uses the probability score. |
generationConfig
Configuration settings used when generating the prompt.
Parameters | |
---|---|
|
Optional:
The temperature is used for sampling during response generation, which occurs when If the model returns a response that's too generic, too short, or the model gives a fallback response, try increasing the temperature.
|
|
Optional: If specified, nucleus sampling is used. Top-P changes how the model selects tokens for output. Tokens are selected from the most (see top-K) to least probable until the sum of their probabilities equals the top-P value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1 and the top-P value is Specify a lower value for less random responses and a higher value for more random responses.
|
|
Optional: Top-K changes how the model selects tokens for output. A top-K of For each token selection step, the top-K tokens with the highest probabilities are sampled. Then tokens are further filtered based on top-P with the final token selected using temperature sampling. Specify a lower value for less random responses and a higher value for more random responses. Range: Supported by Default for |
|
Optional: The number of response variations to return. For each request, you're charged for the output tokens of all candidates, but are only charged once for the input tokens. Specifying multiple candidates is a Preview feature that works with
|
|
Optional: int Maximum number of tokens that can be generated in the response. A token is approximately four characters. 100 tokens correspond to roughly 60-80 words. Specify a lower value for shorter responses and a higher value for potentially longer responses. |
|
Optional:
Specifies a list of strings that tells the model to stop generating text if one
of the strings is encountered in the response. If a string appears multiple
times in the response, then the response truncates where it's first encountered.
The strings are case-sensitive.
For example, if the following is the returned response when Maximum 5 items in the list. |
|
Optional: Positive penalties. Positive values penalize tokens that already appear in the generated text, increasing the probability of generating more diverse content. The maximum value for Supported by |
|
Optional: Positive values penalize tokens that repeatedly appear in the generated text, decreasing the probability of repeating content. This maximum value for Supported by |
|
Optional: Available for the following models:
The output response MIME type of the generated candidate text. The following MIME types are supported:
Specify the appropriate response type to avoid unintended behaviors. For
example, if you require a JSON-formatted response, specify
This is a preview feature. |
|
Optional: schema Available for the following models:
The schema that generated candidate text must follow. For more information, see Control generated output. You must specify the This is a preview feature. |
|
Optional: When seed is fixed to a specific value, the model makes a best effort to provide the same response for repeated requests. Deterministic output isn't guaranteed. Also, changing the model or parameter settings, such as the temperature, can cause variations in the response even when you use the same seed value. By default, a random seed value is used. This is a preview feature. Available for the following models:
|
Response body
{ "candidates": [ { "content": { "parts": [ { "text": string } ] }, "finishReason": enum (FinishReason), "safetyRatings": [ { "category": enum (HarmCategory), "probability": enum (HarmProbability), "blocked": boolean } ], "citationMetadata": { "citations": [ { "startIndex": integer, "endIndex": integer, "uri": string, "title": string, "license": string, "publicationDate": { "year": integer, "month": integer, "day": integer } } ] }, "avgLogprobs": double } ], "usageMetadata": { "promptTokenCount": integer, "candidatesTokenCount": integer, "totalTokenCount": integer } }
Response element | Description |
---|---|
text |
The generated text. |
finishReason |
The reason why the model stopped generating tokens. If empty, the model
has not stopped generating the tokens. Because the response uses the
prompt for context, it's not possible to change the behavior of how the
model stops generating tokens.
|
category |
The safety category to configure a threshold for. Acceptable values include the following:
Click to expand safety categories
|
probability |
The harm probability levels in the content.
|
blocked |
A boolean flag associated with a safety attribute that indicates if the model's input or output was blocked. |
startIndex |
An integer that specifies where a citation starts in the content .
|
endIndex |
An integer that specifies where a citation ends in the content .
|
url |
The URL of a citation source. Examples of a URL source might be a news website or a GitHub repository. |
title |
The title of a citation source. Examples of source titles might be that of a news article or a book. |
license |
The license associated with a citation. |
publicationDate |
The date a citation was published. Its valid formats are
YYYY , YYYY-MM , and YYYY-MM-DD .
|
avgLogprobs |
Average log probability of the candidate. |
promptTokenCount |
Number of tokens in the request. |
candidatesTokenCount |
Number of tokens in the response(s). |
totalTokenCount |
Number of tokens in the request and response(s). |
Examples
Non-streaming text response
Generate a non-streaming model response from a text input.
REST
Before using any of the request data, make the following replacements:
PROJECT_ID
: Your project ID.LOCATION
: The region to process the request.MODEL_ID
: The model ID of the model that you want to use (for example,gemini-1.5-flash-001
). See the list of supported models.TEXT
: The text instructions to include in the prompt.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:generateContent
Request JSON body:
{ "contents": [{ "role": "user", "parts": [{ "text": "TEXT" }] }] }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:generateContent"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:generateContent" | Select-Object -Expand Content
Python
NodeJS
Java
Go
C#
REST (OpenAI)
You can call the Inference API by using the OpenAI library. For more information, see Call Vertex AI models by using the OpenAI library.
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your project ID.
- LOCATION: The region to process the request.
- MODEL_ID: The name of the model to use.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/endpoints/openapi/chat/completions
Request JSON body:
{ "model": "google/MODEL_ID", "messages": [{ "role": "user", "content": "Write a story about a magic backpack." }] }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/endpoints/openapi/chat/completions"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/endpoints/openapi/chat/completions" | Select-Object -Expand Content
Python (OpenAI)
You can call the Inference API by using the OpenAI library. For more information, see Call Vertex AI models by using the OpenAI library.
Non-streaming multi-modal response
Generate a non-streaming model response from a multi-modal input, such as text and an image.
REST
Before using any of the request data, make the following replacements:
PROJECT_ID
: Your project ID.LOCATION
: The region to process the request.MODEL_ID
: The model ID of the model that you want to use (for example,gemini-1.5-flash-001
). See the list of supported models.TEXT
: The text instructions to include in the prompt.FILE_URI
: The Cloud Storage URI to the file storing the data.MIME_TYPE
: The IANA MIME type of the data.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:generateContent
Request JSON body:
{ "contents": [{ "role": "user", "parts": [ { "text": "TEXT" }, { "fileData": { "fileUri": "FILE_URI", "mimeType": "MIME_TYPE" } } ] }] }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:generateContent"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:generateContent" | Select-Object -Expand Content
Python
NodeJS
Java
Go
C#
REST (OpenAI)
You can call the Inference API by using the OpenAI library. For more information, see Call Vertex AI models by using the OpenAI library.
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your project ID.
- LOCATION: The region to process the request.
- MODEL_ID: The name of the model to use.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/endpoints/openapi/chat/completions
Request JSON body:
{ "model": "google/MODEL_ID", "messages": [{ "role": "user", "content": [ { "type": "text", "text": "Describe the following image:" }, { "type": "image_url", "image_url": { "url": "gs://generativeai-downloads/images/character.jpg" } } ] }] }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/endpoints/openapi/chat/completions"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/endpoints/openapi/chat/completions" | Select-Object -Expand Content
Python (OpenAI)
You can call the Inference API by using the OpenAI library. For more information, see Call Vertex AI models by using the OpenAI library.
Streaming text response
Generate a streaming model response from a text input.
REST
Before using any of the request data, make the following replacements:
PROJECT_ID
: Your project ID.LOCATION
: The region to process the request.MODEL_ID
: The model ID of the model that you want to use (for example,gemini-1.5-flash-001
). See the list of supported models.TEXT
: The text instructions to include in the prompt.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:streamGenerateContent
Request JSON body:
{ "contents": [{ "role": "user", "parts": [{ "text": "TEXT" }] }] }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:streamGenerateContent"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:streamGenerateContent" | Select-Object -Expand Content
Python
NodeJS
Java
Go
REST (OpenAI)
You can call the Inference API by using the OpenAI library. For more information, see Call Vertex AI models by using the OpenAI library.
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your project ID.
- LOCATION: The region to process the request.
- MODEL_ID: The name of the model to use.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/endpoints/openapi/chat/completions
Request JSON body:
{ "model": "google/MODEL_ID", "stream": true, "messages": [{ "role": "user", "content": "Write a story about a magic backpack." }] }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/endpoints/openapi/chat/completions"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/endpoints/openapi/chat/completions" | Select-Object -Expand Content
Python (OpenAI)
You can call the Inference API by using the OpenAI library. For more information, see Call Vertex AI models by using the OpenAI library.
Streaming multi-modal response
Generate a streaming model response from a multi-modal input, such as text and an image.
REST
Before using any of the request data, make the following replacements:
PROJECT_ID
: Your project ID.LOCATION
: The region to process the request.MODEL_ID
: The model ID of the model that you want to use (for example,gemini-1.5-flash-001
). See the list of supported models.TEXT
: The text instructions to include in the prompt.FILE_URI1
: The Cloud Storage URI to the file storing the data.MIME_TYPE1
: The IANA MIME type of the data.FILE_URI2
: The Cloud Storage URI to the file storing the data.MIME_TYPE2
: The IANA MIME type of the data.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:streamGenerateContent
Request JSON body:
{ "contents": [{ "role": "user", "parts": [ { "text": "TEXT" }, { "fileData": { "fileUri": "FILE_URI1", "mimeType": "MIME_TYPE1" } }, { "fileData": { "fileUri": "FILE_URI2", "mimeType": "MIME_TYPE2" } } ] }] }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:streamGenerateContent"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:streamGenerateContent" | Select-Object -Expand Content
Python
NodeJS
Java
Go
REST (OpenAI)
You can call the Inference API by using the OpenAI library. For more information, see Call Vertex AI models by using the OpenAI library.
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your project ID.
- LOCATION: The region to process the request.
- MODEL_ID: The name of the model to use.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/endpoints/openapi/chat/completions
Request JSON body:
{ "model": "google/MODEL_ID", "stream": true, "messages": [{ "role": "user", "content": [ { "type": "text", "text": "Describe the following image:" }, { "type": "image_url", "image_url": { "url": "gs://generativeai-downloads/images/character.jpg" } } ] }] }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/endpoints/openapi/chat/completions"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/endpoints/openapi/chat/completions" | Select-Object -Expand Content
Python (OpenAI)
You can call the Inference API by using the OpenAI library. For more information, see Call Vertex AI models by using the OpenAI library.
Model versions
To use the auto-updated version,
specify the model name without the trailing version number, for example gemini-1.5-flash
instead of gemini-1.5-flash-001
.
For more information, see Gemini model versions and lifecycle.
What's next
- Learn more about the Gemini API.
- Learn more about Function calling.
- Learn more about Grounding responses for Gemini models.