Inference API

Use the Inference API to generate Gemini prompts.

The Gemini model family includes models that work with multimodal prompt requests. The term multimodal indicates that you can use more than one modality, or type of input, in a prompt. Models that aren't multimodal accept prompts only with text. Modalities can include text, audio, video, and more.

For more overview information see:

Supported Models:

Model Version
Gemini 1.5 Flash (Preview) gemini-1.5-flash-preview-0514
Gemini 1.5 Pro (Preview) gemini-1.5-pro-preview-0514
Gemini 1.0 Pro Vision gemini-1.0-pro-001
gemini-1.0-pro-vision-001
Gemini 1.0 Pro gemini-1.0-pro
gemini-1.0-pro-001
gemini-1.0-pro-002

Limitations:

If you provide a lot of images, then latency might be high.

Example syntax

Syntax to generate a model response.

Non-Streaming

curl

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \

https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent \
-d '{
  "contents": [{
    ...
  }],
  "generation_config": {
    ...
  },
  "safety_settings": {
    ...
  }
  ...
}'

Python

gemini_model = GenerativeModel(MODEL_ID)
generation_config = GenerationConfig(...)

model_response = gemini_model.generate_content([...], generation_config, safety_settings={...})

Streaming

curl

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:streamGenerateContent \
  -d '{
    "contents": [{
      ...
    }],
    "generation_config": {
      ...
    },
    "safety_settings": {
      ...
    }
    ...
  }'

Python

gemini_model = GenerativeModel(MODEL_ID)
model_response = gemini_model.generate_content([...], generation_config, safety_settings={...}, stream=True)

Parameter list

See examples for implementation details.

Request body

The request body contains data with the following parameters:

Parameters

contents

Required: Content

The content of the current conversation with the model.

For single-turn queries, this is a single instance. For multi-turn queries, this is a repeated field that contains conversation history and the latest request.

system_instruction

Optional: Content

The user provided system instructions for the model.

Note: Only text should be used in parts and content in each part should be in a separate paragraph.

tools

Optional. See Function Calling API.

tool_config

Optional. See Function Calling API.

safety_settings

Optional: SafetySetting

Per request settings for blocking unsafe content.

Enforced on GenerateContentResponse.candidates.

generation_config

Optional: GenerationConfig

Generation configuration settings.

Content

The base structured data type containing multi-part content of a message.

This class consists of two main properties: role and parts. The role property denotes the individual producing the content, while the parts property contains multiple elements, each representing a segment of data within a message.

Parameters

role

Optional: string

The identity of the entity that creates the message. The following values are supported:

  • user: This indicates that the message is sent by a real person, typically a user-generated message.
  • model: This indicates that the message is generated by the model.

The model value is used to insert messages from the model into the conversation during multi-turn conversations.

For non-multi-turn conversations, this field can be left blank or unset.

parts

Part

A list of ordered parts that make up a single message. Different parts may have different IANA MIME types.

Part

A data type containing media that is part of a multi-part Content message.

Parameters

text

Optional: string

A text prompt or code snippet.

inline_data

Optional: Blob

Inline data in raw bytes.

file_data

Optional: FileData

Data stored in a file.

function_call

Optional: FunctionCall.

It contains a string representing the FunctionDeclaration.name field and a structured JSON object containing any parameters for the function call predicted by the model.

See Function Calling API.

function_response

Optional: FunctionResponse.

The result output of a FunctionCall that contains a string representing the FunctionDeclaration.name field and a structured JSON object containing any output from the function call. It is used as context to the model.

See Function Calling API.

video_metadata

Optional: VideoMetadata

Video metadata. The metadata should only be specified while the video data is presented in inline_data or file_data.

Blob

Content blob. If possible send as text rather than raw bytes.

Parameters

mime_type

string

IANA MIME type of the data.

data

bytes

Raw bytes.

FileData

URI based data.

Parameters

mime_type

string

IANA MIME type of the data.

file_uri

string

The Cloud Storage URI to the file storing the data

FunctionCall

A predicted FunctionCall returned from the model that contains a string representing the FunctionDeclaration.name and a structured JSON object containing the parameters and their values.

Parameters

name

string

The name of the function to call.

args

Struct

The function parameters and values in JSON object format.

See Function Calling API for parameter details.

FunctionResponse

The resulting output from a FunctionCall that contains a string representing the FunctionDeclaration.name. Also contains a structured JSON object with the output from the function (and uses it as context for the model). This should contain the result of a FunctionCall made based on model prediction.

Parameters

name

string

The name of the function to call.

response

Struct

The function response in JSON object format.

VideoMetadata

Metadata describing the input video content.

Parameters

start_offset

Optional: google.protobuf.Duration

The start offset of the video.

end_offset

Optional: google.protobuf.Duration

The end offset of the video.

SafetySetting

Safety settings.

Parameters

category

Optional: HarmCategory

The category of harm.

threshold

Optional: HarmBlockThreshold

The harm block threshold.

max_influential_terms

Optional: int

The max number of influential terms that contribute the most to the safety scores, which might cause potential blocking.

method

Optional: HarmBlockMethod

Specify if the threshold is used for probability or severity score. If not specified, the threshold is used for probability score.

HarmCategory

Hrm categories that block content.

Parameters

HARM_CATEGORY_UNSPECIFIED

The harm category is unspecified.

HARM_CATEGORY_HATE_SPEECH

The harm category is hate speech.

HARM_CATEGORY_DANGEROUS_CONTENT

The harm category is dangerous content.

HARM_CATEGORY_HARASSMENT

The harm category is harassment.

HARM_CATEGORY_SEXUALLY_EXPLICIT

The harm category is sexually explicit content.

HarmBlockThreshold

Probability thresholds levels used to block a response.

Parameters

HARM_BLOCK_THRESHOLD_UNSPECIFIED

Unspecified harm block threshold.

BLOCK_LOW_AND_ABOVE

Block low threshold and higher (i.e. block more).

BLOCK_MEDIUM_AND_ABOVE

Block medium threshold and higher.

BLOCK_ONLY_HIGH

Block only high threshold (i.e. block less).

BLOCK_NONE

Block none.

HarmBlockMethod

A probability threshold that blocks a response based on a combination of probability and severity.

Parameters

HARM_BLOCK_METHOD_UNSPECIFIED

The harm block method is unspecified.

SEVERITY

The harm block method uses both probability and severity scores.

PROBABILITY

The harm block method uses the probability score.

GenerationConfig

Configuration settings used when generating the prompt.

Parameters

temperature

Optional: float

Controls the randomness of predictions.

top_p

Optional: float

If specified, nucleus sampling is used.

top_k

Optional: If specified, top-k sampling is used.

candidate_count

Optional: int

Number of candidates to generate.

max_output_tokens

Optional: int

The maximum number of output tokens to generate per message.

stop_sequences

Optional: List[string]

Stop sequences.

presence_penalty

Optional: float

Positive penalties.

frequency_penalty

Optional: float

Frequency penalties.

response_mime_type

Optional: string (enum)

Output response mimetype of the generated candidate text.

Supported mimetype:

  • text/plain: (default) Text output.
  • application/json: JSON response in the candidates.
  • The model needs to be prompted to output the appropriate response type, otherwise the behavior is undefined.

This is a preview feature.

Examples

Non-streaming text response

Generate a non-streaming model response from a text input.

REST

Before using any of the request data, make the following replacements:

  • PROJECT_ID: Your project ID.
  • LOCATION: The region to process the request.
  • TEXT: The text instructions to include in the prompt.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:generateContent

Request JSON body:

{
  "contents": [{
    "role": "user",
    "parts": [{
      "text": "TEXT"
    }]
  }]
}'

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:generateContent"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:generateContent" | Select-Object -Expand Content

Python

import vertexai

from vertexai.generative_models import GenerativeModel

vertexai.init(project=PROJECT_ID, location="us-central1")

model = GenerativeModel(MODEL_ID)
response = model.generate_content("Write a story about a magic backpack.")

print(response)

NodeJS

const {VertexAI} = require('@google-cloud/vertexai');

/**
 * TODO(developer): Update these variables before running the sample.
 */
async function generateContent(
  projectId = 'PROJECT_ID',
  location = 'us-central1',
  model = 'gemini-1.0-pro-002'
) {
  // Initialize Vertex with your Cloud project and location
  const vertexAI = new VertexAI({project: projectId, location: location});

  // Instantiate the model
  const generativeModel = vertexAI.getGenerativeModel({
    model: model,
  });

  const request = {
    contents: [
      {
        role: 'user',
        parts: [
          {
            text: 'Write a story about a magic backpack.',
          },
        ],
      },
    ],
  };

  console.log(JSON.stringify(request));

  const result = await generativeModel.generateContent(request);

  console.log(result.response.candidates[0].content.parts[0].text);
}

Non-streaming multi-modal response

Generate a non-streaming model response from a multi-modal input, such as text and an image.

REST

Before using any of the request data, make the following replacements:

  • PROJECT_ID: Your project ID.
  • LOCATION: The region to process the request.
  • TEXT: The text instructions to include in the prompt.
  • FILE_URI: The Cloud Storage URI to the file storing the data.
  • MIME_TYPE: The TIANA MIME type of the data.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:generateContent

Request JSON body:

{
"contents": [{
  "role": "user",
  "parts": [
    {
      "text": "TEXT"
    },
    {
      "file_data": {"file_uri": "FILE_URI", "MIME_TYPE"}
    },
    {
      "file_data": {"file_uri": "FILE_URI", "MIME_TYPE"}
    }
  ]
}]
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:generateContent"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:generateContent" | Select-Object -Expand Content

Python

import vertexai

from vertexai.generative_models import GenerativeModel, Part

vertexai.init(project=PROJECT_ID, location="us-central1")

model = GenerativeModel(MODEL_ID)
response = model.generate_content(
    [
        Part.from_uri(
            "gs://cloud-samples-data/generative-ai/video/animals.mp4", "video/mp4"
        ),
        Part.from_uri(
            "gs://cloud-samples-data/generative-ai/image/character.jpg",
            "image/jpeg",
        ),
        "Are these video and image correlated?",
    ]
)

print(response)

NodeJS

const {VertexAI} = require('@google-cloud/vertexai');

/**
 * TODO(developer): Update these variables before running the sample.
 */
async function generateContent(
  projectId = 'PROJECT_ID',
  location = 'us-central1',
  model = 'gemini-1.5-pro-preview-0409'
) {
  // Initialize Vertex AI
  const vertexAI = new VertexAI({project: projectId, location: location});
  const generativeModel = vertexAI.getGenerativeModel({model: model});

  const request = {
    contents: [
      {
        role: 'user',
        parts: [
          {
            file_data: {
              file_uri: 'gs://cloud-samples-data/video/animals.mp4',
              mime_type: 'video/mp4',
            },
          },
          {
            file_data: {
              file_uri:
                'gs://cloud-samples-data/generative-ai/image/character.jpg',
              mime_type: 'image/jpeg',
            },
          },
          {text: 'Are this video and image correlated?'},
        ],
      },
    ],
  };

  const result = await generativeModel.generateContent(request);

  console.log(result.response.candidates[0].content.parts[0].text);
}

Streaming text response

Generate a streaming model response from a text input.

REST

Before using any of the request data, make the following replacements:

  • PROJECT_ID: Your project ID.
  • LOCATION: The region to process the request.
  • TEXT: The text instructions to include in the prompt.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:streamGenerateContent

Request JSON body:

{
  "contents": [{
    "role": "user",
    "parts": [{
      "text": "TEXT"
    }]
  }]
}'

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:streamGenerateContent"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:streamGenerateContent" | Select-Object -Expand Content

Python

import vertexai

from vertexai.generative_models import GenerativeModel

vertexai.init(project=PROJECT_ID, location="us-central1")

model = GenerativeModel(MODEL_ID)
responses = model.generate_content(
    "Write a story about a magic backpack.", stream=True
)

for response in responses:
    print(response)

NodeJS

const {VertexAI} = require('@google-cloud/vertexai');

/**
 * TODO(developer): Update these variables before running the sample.
 */
async function generateContent(
  projectId = 'PROJECT_ID',
  location = 'us-central1',
  model = 'gemini-1.0-pro-002'
) {
  // Initialize Vertex with your Cloud project and location
  const vertexAI = new VertexAI({project: projectId, location: location});

  // Instantiate the model
  const generativeModel = vertexAI.getGenerativeModel({
    model: model,
  });

  const request = {
    contents: [
      {
        role: 'user',
        parts: [
          {
            text: 'Write a story about a magic backpack.',
          },
        ],
      },
    ],
  };

  console.log(JSON.stringify(request));

  const result = await generativeModel.generateContentStream(request);
  for await (const item of result.stream) {
    console.log(item.candidates[0].content.parts[0].text);
  }
}

Streaming multi-modal response

Generate a streaming model response from a multi-modal input, such as text and an image.

REST

Before using any of the request data, make the following replacements:

  • PROJECT_ID: Your project ID.
  • LOCATION: The region to process the request.
  • TEXT: The text instructions to include in the prompt.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:streamGenerateContent

Request JSON body:

{
  "contents": [{
    "role": "user",
    "parts": [{
      "text": "TEXT"
    }]
  }]
}'

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:streamGenerateContent"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:streamGenerateContent" | Select-Object -Expand Content

Python

import vertexai

from vertexai.generative_models import GenerativeModel, Part

vertexai.init(project=PROJECT_ID, location="us-central1")

model = GenerativeModel(MODEL_ID)
responses = model.generate_content(
    [
        Part.from_uri(
            "gs://cloud-samples-data/generative-ai/video/animals.mp4", "video/mp4"
        ),
        Part.from_uri(
            "gs://cloud-samples-data/generative-ai/image/character.jpg",
            "image/jpeg",
        ),
        "Are these video and image correlated?",
    ],
    stream=True,
)

for response in responses:
    print(response)

NodeJS

const {VertexAI} = require('@google-cloud/vertexai');

/**
 * TODO(developer): Update these variables before running the sample.
 */
async function generateContent(
  projectId = 'PROJECT_ID',
  location = 'us-central1',
  model = 'gemini-1.5-pro-preview-0409'
) {
  // Initialize Vertex AI
  const vertexAI = new VertexAI({project: projectId, location: location});
  const generativeModel = vertexAI.getGenerativeModel({model: model});

  const request = {
    contents: [
      {
        role: 'user',
        parts: [
          {
            file_data: {
              file_uri: 'gs://cloud-samples-data/video/animals.mp4',
              mime_type: 'video/mp4',
            },
          },
          {
            file_data: {
              file_uri:
                'gs://cloud-samples-data/generative-ai/image/character.jpg',
              mime_type: 'image/jpeg',
            },
          },
          {text: 'Are this video and image correlated?'},
        ],
      },
    ],
  };

  const result = await generativeModel.generateContentStream(request);

  for await (const item of result.stream) {
    console.log(item.candidates[0].content.parts[0].text);
  }
}

What's next