Send chat prompt requests (Gemini)

This page shows you how to send chat prompts to the Gemini 1.0 Pro (gemini-1.0-pro) model by using the Google Cloud console, REST API, and supported SDKs. Gemini 1.0 Pro supports prompts with text-only input, including natural language tasks, multi-turn text and code chat, and code generation. It can output text and code.

The Gemini 1.0 Pro foundation model is a large language model that excels at understanding and generating language. You can interact with Gemini Pro using a single-turn prompt and response or chat with it in a multi-turn, continuous conversation, even for code understanding and generation.

For a list of languages supported by Gemini 1.0 Pro, see model information Language support.


To explore this model in the console, select the gemini-1.0-pro model card in the Model Garden.

Go to the Model Garden


If you're looking for a way to use Gemini directly from your mobile and web apps, check out the Google AI SDKs for Android, Swift, and web.

Send chat prompts

For testing and iterating on chat prompts, we recommend using the Google Cloud console. To send prompts programmatically to the model, you can use the REST API, Vertex AI SDK for Python, or one of the other supported libraries and SDKs shown in the following tabs.

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Vertex AI SDK for Python API reference documentation.

Streaming and non-streaming responses

You can choose whether the model generates a streaming response or a non-streaming response. Streaming involves receiving responses to prompts as they are generated. That is, as soon as the model generates output tokens, the output tokens are sent. A non-streaming response to prompts is sent only after all of the output tokens are generated.

For a streaming response, use the stream parameter in generate_content.

  response = model.generate_content(contents=[...], stream = True)
  

For a non-streaming response, remove the parameter, or set the parameter to False.

Sample code

import vertexai

from vertexai.generative_models import GenerativeModel, ChatSession

# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"
# location = "us-central1"
vertexai.init(project=project_id, location=location)
model = GenerativeModel(model_name="gemini-1.0-pro-002")
chat = model.start_chat()

def get_chat_response(chat: ChatSession, prompt: str) -> str:
    text_response = []
    responses = chat.send_message(prompt, stream=True)
    for chunk in responses:
        text_response.append(chunk.text)
    return "".join(text_response)

prompt = "Hello."
print(get_chat_response(chat, prompt))

prompt = "What are all the colors in a rainbow?"
print(get_chat_response(chat, prompt))

prompt = "Why does it appear when it rains?"
print(get_chat_response(chat, prompt))

C#

Before trying this sample, follow the C# setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI C# API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.


using Google.Cloud.AIPlatform.V1;
using System;
using System.Collections.Generic;
using System.Threading.Tasks;

public class MultiTurnChatSample
{
    public async Task<string> GenerateContent(
        string projectId = "your-project-id",
        string location = "us-central1",
        string publisher = "google",
        string model = "gemini-1.0-pro"
    )
    {
        // Create a chat session to keep track of the context
        ChatSession chatSession = new ChatSession($"projects/{projectId}/locations/{location}/publishers/{publisher}/models/{model}", location);

        string prompt = "Hello.";
        Console.WriteLine($"\nUser: {prompt}");

        string response = await chatSession.SendMessageAsync(prompt);
        Console.WriteLine($"Response: {response}");

        prompt = "What are all the colors in a rainbow?";
        Console.WriteLine($"\nUser: {prompt}");

        response = await chatSession.SendMessageAsync(prompt);
        Console.WriteLine($"Response: {response}");

        prompt = "Why does it appear when it rains?";
        Console.WriteLine($"\nUser: {prompt}");

        response = await chatSession.SendMessageAsync(prompt);
        Console.WriteLine($"Response: {response}");

        return response;
    }

    private class ChatSession
    {
        private readonly string _modelPath;
        private readonly PredictionServiceClient _predictionServiceClient;

        private readonly List<Content> _contents;

        public ChatSession(string modelPath, string location)
        {
            _modelPath = modelPath;

            // Create a prediction service client.
            _predictionServiceClient = new PredictionServiceClientBuilder
            {
                Endpoint = $"{location}-aiplatform.googleapis.com"
            }.Build();

            // Initialize contents to send over in every request.
            _contents = new List<Content>();
        }


        public async Task<string> SendMessageAsync(string prompt)
        {
            // Initialize the content with the prompt.
            var content = new Content
            {
                Role = "USER"
            };
            content.Parts.AddRange(new List<Part>()
            {
                new() {
                    Text = prompt
                }
            });
            _contents.Add(content);


            // Create a request to generate content.
            var generateContentRequest = new GenerateContentRequest
            {
                Model = _modelPath,
                GenerationConfig = new GenerationConfig
                {
                    Temperature = 0.9f,
                    TopP = 1,
                    TopK = 32,
                    CandidateCount = 1,
                    MaxOutputTokens = 2048
                }
            };
            generateContentRequest.Contents.AddRange(_contents);

            // Make a non-streaming request, get a response.
            GenerateContentResponse response = await _predictionServiceClient.GenerateContentAsync(generateContentRequest);

            // Save the content from the response.
            _contents.Add(response.Candidates[0].Content);

            // Return the text
            return response.Candidates[0].Content.Parts[0].Text;
        }
    }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the Generative AI quickstart using the Node.js SDK. For more information, see the Node.js SDK for Gemini reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

Streaming and non-streaming responses

You can choose whether the model generates a streaming response or a non-streaming response. Streaming involves receiving responses to prompts as they are generated. That is, as soon as the model generates output tokens, the output tokens are sent. A non-streaming response to prompts is sent only after all of the output tokens are generated.

For a streaming response, use the generateContentStream method.

  const streamingResp = await generativeModel.generateContentStream(request);
  

For a non-streaming response, use the generateContent method.

  const streamingResp = await generativeModel.generateContent(request);
  

Sample code

const {VertexAI} = require('@google-cloud/vertexai');

/**
 * TODO(developer): Update these variables before running the sample.
 */
async function createStreamChat(
  projectId = 'PROJECT_ID',
  location = 'us-central1',
  model = 'gemini-1.0-pro'
) {
  // Initialize Vertex with your Cloud project and location
  const vertexAI = new VertexAI({project: projectId, location: location});

  // Instantiate the model
  const generativeModel = vertexAI.getGenerativeModel({
    model: model,
  });

  const chat = generativeModel.startChat({});
  const chatInput1 = 'How can I learn more about that?';

  console.log(`User: ${chatInput1}`);

  const result1 = await chat.sendMessageStream(chatInput1);
  for await (const item of result1.stream) {
    console.log(item.candidates[0].content.parts[0].text);
  }
}

Java

Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart. For more information, see the Vertex AI Java SDK for Gemini reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

Streaming and non-streaming responses

You can choose whether the model generates a streaming response or a non-streaming response. Streaming involves receiving responses to prompts as they are generated. That is, as soon as the model generates output tokens, the output tokens are sent. A non-streaming response to prompts is sent only after all of the output tokens are generated.

For a streaming response, use the generateContentStream method.

  public ResponseStream generateContentStream(Content content)
  

For a non-streaming response, use the generateContent method.

  public GenerateContentResponse generateContent(Content content)
  

Sample code

import com.google.cloud.vertexai.VertexAI;
import com.google.cloud.vertexai.api.GenerateContentResponse;
import com.google.cloud.vertexai.generativeai.ChatSession;
import com.google.cloud.vertexai.generativeai.GenerativeModel;
import com.google.cloud.vertexai.generativeai.ResponseHandler;
import java.io.IOException;

public class ChatDiscussion {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-google-cloud-project-id";
    String location = "us-central1";
    String modelName = "gemini-1.0-pro";

    chatDiscussion(projectId, location, modelName);
  }

  // Ask interrelated questions in a row using a ChatSession object.
  public static void chatDiscussion(String projectId, String location, String modelName)
      throws IOException {
    // Initialize client that will be used to send requests. This client only needs
    // to be created once, and can be reused for multiple requests.
    try (VertexAI vertexAI = new VertexAI(projectId, location)) {
      GenerateContentResponse response;

      GenerativeModel model = new GenerativeModel(modelName, vertexAI);
      // Create a chat session to be used for interactive conversation.
      ChatSession chatSession = new ChatSession(model);

      response = chatSession.sendMessage("Hello.");
      System.out.println(ResponseHandler.getText(response));

      response = chatSession.sendMessage("What are all the colors in a rainbow?");
      System.out.println(ResponseHandler.getText(response));

      response = chatSession.sendMessage("Why does it appear when it rains?");
      System.out.println(ResponseHandler.getText(response));
      System.out.println("Chat Ended.");
    }
  }
}

Go

Before trying this sample, follow the Go setup instructions in the Vertex AI quickstart. For more information, see the Vertex AI Go SDK for Gemini reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

Streaming and non-streaming responses

You can choose whether the model generates a streamed response or a non-streamed response. Streaming involves receiving responses to prompts as they are generated. That is, as soon as the model generates output tokens, the output tokens are sent. A non-streaming response to prompts is sent only after all of the output tokens are generated.

For a streaming response, use the GenerateContentStream method.

  iter := model.GenerateContentStream(ctx, genai.Text("Tell me a story about a lumberjack and his giant ox. Keep it very short."))
  

For a non-streaming response, use the GenerateContent method.

  resp, err := model.GenerateContent(ctx, genai.Text("What is the average size of a swallow?"))
  

Sample code

import (
	"context"
	"encoding/json"
	"fmt"

	"cloud.google.com/go/vertexai/genai"
)

var projectId = "PROJECT_ID"
var region = "us-central1"
var modelName = "gemini-1.0-pro-vision"

func makeChatRequests(projectId string, region string, modelName string) error {
	ctx := context.Background()
	client, err := genai.NewClient(ctx, projectId, region)
	if err != nil {
		return fmt.Errorf("error creating client: %v", err)
	}
	defer client.Close()

	gemini := client.GenerativeModel(modelName)
	chat := gemini.StartChat()

	r, err := chat.SendMessage(
		ctx,
		genai.Text("Hello"))
	if err != nil {
		return err
	}
	rb, _ := json.MarshalIndent(r, "", "  ")
	fmt.Println(string(rb))

	r, err = chat.SendMessage(
		ctx,
		genai.Text("What are all the colors in a rainbow?"))
	if err != nil {
		return err
	}
	rb, _ = json.MarshalIndent(r, "", "  ")
	fmt.Println(string(rb))

	r, err = chat.SendMessage(
		ctx,
		genai.Text("Why does it appear when it rains?"))
	if err != nil {
		return err
	}
	rb, _ = json.MarshalIndent(r, "", "  ")
	fmt.Println(string(rb))

	return nil
}

REST

You can use REST to send a chat prompt by using the Vertex AI API to send a POST request to the publisher model endpoint.

Before using any of the request data, make the following replacements:

  • GENERATE_RESPONSE_METHOD: The type of response that you want the model to generate. Choose a method that generates how you want the model's response to be returned:
    • streamGenerateContent: The response is streamed as it's being generated to reduce the perception of latency to a human audience.
    • generateContent: The response is returned after it's fully generated.
  • LOCATION: The region to process the request. Available options include the following:

    Click to expand available regions

    • us-central1
    • us-west4
    • northamerica-northeast1
    • us-east4
    • us-west1
    • asia-northeast3
    • asia-southeast1
    • asia-northeast1
  • PROJECT_ID: Your project ID.
  • MODEL_ID: The model ID of the multimodal model that you want to use. The options are:
    • gemini-1.0-pro-002
    • gemini-1.0-pro-vision-001
    • gemini-1.5-pro-preview-0409
  • ROLE: The role in a conversation associated with the content. Specifying a role is required even in singleturn use cases. Acceptable values include the following:
    • USER: Specifies content that's sent by you.
    • MODEL: Specifies the model's response.
  • TEXT: The text instructions to include in the prompt.
  • SAFETY_CATEGORY: The safety category to configure a threshold for. Acceptable values include the following:

    Click to expand safety categories

    • HARM_CATEGORY_SEXUALLY_EXPLICIT
    • HARM_CATEGORY_HATE_SPEECH
    • HARM_CATEGORY_HARASSMENT
    • HARM_CATEGORY_DANGEROUS_CONTENT
  • THRESHOLD: The threshold for blocking responses that could belong to the specified safety category based on probability. Acceptable values include the following:

    Click to expand blocking thresholds

    • BLOCK_NONE
    • BLOCK_ONLY_HIGH
    • BLOCK_MEDIUM_AND_ABOVE (default)
    • BLOCK_LOW_AND_ABOVE
    BLOCK_LOW_AND_ABOVE blocks the most while BLOCK_ONLY_HIGH blocks the least.
  • SYSTEM_INSTRUCTION: (Optional) Available for gemini-1.0-pro-002 and gemini-1.5-pro-preview-0409. Instructions for the model to steer it toward better performance. For example, "Answer as concisely as possible" or "Print the results in JSON format".
  • TEMPERATURE: The temperature is used for sampling during response generation, which occurs when topP and topK are applied. Temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that require a less open-ended or creative response, while higher temperatures can lead to more diverse or creative results. A temperature of 0 means that the highest probability tokens are always selected. In this case, responses for a given prompt are mostly deterministic, but a small amount of variation is still possible.

    If the model returns a response that's too generic, too short, or the model gives a fallback response, try increasing the temperature.

  • TOP_P: Top-P changes how the model selects tokens for output. Tokens are selected from the most (see top-K) to least probable until the sum of their probabilities equals the top-P value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1 and the top-P value is 0.5, then the model will select either A or B as the next token by using temperature and excludes C as a candidate.

    Specify a lower value for less random responses and a higher value for more random responses.

  • TOP_K: Top-K changes how the model selects tokens for output. A top-K of 1 means the next selected token is the most probable among all tokens in the model's vocabulary (also called greedy decoding), while a top-K of 3 means that the next token is selected from among the three most probable tokens by using temperature.

    For each token selection step, the top-K tokens with the highest probabilities are sampled. Then tokens are further filtered based on top-P with the final token selected using temperature sampling.

    Specify a lower value for less random responses and a higher value for more random responses.

  • MAX_OUTPUT_TOKENS: Maximum number of tokens that can be generated in the response. A token is approximately four characters. 100 tokens correspond to roughly 60-80 words.

    Specify a lower value for shorter responses and a higher value for potentially longer responses.

  • STOP_SEQUENCES: Specifies a list of strings that tells the model to stop generating text if one of the strings is encountered in the response. If a string appears multiple times in the response, then the response truncates where it's first encountered. The strings are case-sensitive.

    For example, if the following is the returned response when stopSequences isn't specified:

    public static string reverse(string myString)

    Then the returned response with stopSequences set to ["Str", "reverse"] is:

    public static string

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:GENERATE_RESPONSE_METHOD

Request JSON body:

{
  "contents": {
    "role": "ROLE",
    "parts": { "text": "TEXT" }
  },
  "system_instruction":
  {
    "parts": [
      {
        "text": "SYSTEM_INSTRUCTION"
      }
    ]
  },
  "safety_settings": {
    "category": "SAFETY_CATEGORY",
    "threshold": "THRESHOLD"
  },
  "generation_config": {
    "temperature": TEMPERATURE,
    "topP": TOP_P,
    "topK": TOP_K,
    "candidateCount": 1,
    "maxOutputTokens": MAX_OUTPUT_TOKENS,
    "stopSequences": STOP_SEQUENCES,
  }
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:GENERATE_RESPONSE_METHOD"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:GENERATE_RESPONSE_METHOD" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

Example curl command

LOCATION="us-central1"
MODEL_ID="gemini-1.0-pro"
PROJECT_ID="test-project"
GENERATE_RESPONSE_METHOD="generateContent"

curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:${GENERATE_RESPONSE_METHOD} -d \
$'{
  "contents": [
    {
    "role": "user",
    "parts": { "text": "Hello!" }
    },
    {
    "role": "model",
    "parts": { "text": "Argh! What brings ye to my ship?" }
    },
    {
    "role": "user",
    "parts": { "text": "Wow! You are a real-life pirate!" }
    }
  ],
  "safety_settings": {
    "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
    "threshold": "BLOCK_LOW_AND_ABOVE"
  },
  "generation_config": {
    "temperature": 0.9,
    "topP": 1,
    "candidateCount": 1,
    "maxOutputTokens": 2048
  }
}'

Console

To use the Vertex AI Studio to send a chat prompt in the Google Cloud console, do the following:

  1. In the Vertex AI section of the Google Cloud console, go to the Language section of the Vertex AI Studio.

    Go to Vertex AI Studio

  2. Click Text chat.
  3. Configure the model and parameters:

    • Region: Select the region that you want to use.
    • Model: Select Gemini Pro.
    • Temperature: Use the slider or textbox to enter a value for temperature.

      The temperature is used for sampling during response generation, which occurs when topP and topK are applied. Temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that require a less open-ended or creative response, while higher temperatures can lead to more diverse or creative results. A temperature of 0 means that the highest probability tokens are always selected. In this case, responses for a given prompt are mostly deterministic, but a small amount of variation is still possible.

      If the model returns a response that's too generic, too short, or the model gives a fallback response, try increasing the temperature.

    • Token limit: Use the slider or textbox to enter a value for the max output limit.

      Maximum number of tokens that can be generated in the response. A token is approximately four characters. 100 tokens correspond to roughly 60-80 words.

      Specify a lower value for shorter responses and a higher value for potentially longer responses.

    • Add stop sequence: Enter a stop sequence, which is a series of characters (including spaces) that stops response generation if the model encounters it. The sequence is not included as part of the response. You can add up to five stop sequences.
  4. Optional: To configure advanced parameters, click Advanced and configure as follows:

    Click to expand advanced configurations

    • Top-K: Use the slider or textbox to enter a value for top-K.

      Top-K changes how the model selects tokens for output. A top-K of 1 means the next selected token is the most probable among all tokens in the model's vocabulary (also called greedy decoding), while a top-K of 3 means that the next token is selected from among the three most probable tokens by using temperature.

      For each token selection step, the top-K tokens with the highest probabilities are sampled. Then tokens are further filtered based on top-P with the final token selected using temperature sampling.

      Specify a lower value for less random responses and a higher value for more random responses.

    • Top-P: Use the slider or textbox to enter a value for top-P. Tokens are selected from most probable to the least until the sum of their probabilities equals the value of top-P. For the least variable results, set top-P to 0.
  5. The Google Cloud console only supports streaming, which involves receiving responses to prompts as they are generated. You are ready to enter a message in the message box to start a conversation with the model.

    The model uses the previous messages as context for new responses.

  6. Optional: To save your prompt to My prompts, click Save.
  7. Optional: To get the Python code or a curl command for your prompt, click Get code.
  8. Optional: To clear all previous messages, click Clear conversation

Use system instructions

System instructions enable users to steer the behavior of the model based on their specific needs and use cases. When you set a system instruction, you give the model additional context to understand the task, provide more customized responses, and adhere to specific guidelines over the full user interaction with the model. For developers, product-level behavior can be specified in system instructions, separate from prompts provided by end users.

You can use system instructions in many ways, including:

  • Defining a persona or role (for a chatbot, for example)
  • Defining output format (Markdown, YAML, etc.)
  • Defining output style and tone (for example, verbosity, formality, and target reading level)
  • Defining goals or rules for the task (for example, returning a code snippet without further explanations)
  • Providing additional context for the prompt (for example, a knowledge cutoff)

When a system instruction is set, it applies to the entire request. It works across multiple user and model turns when included in the prompt.

System instructions code samples

The following is an example of specifying simple system instructions in using the Vertex AI Python SDK.

from vertexai.generative_models import GenerativeModel
model = GenerativeModel(
    "gemini-1.0-pro-002",
    system_instruction=[
        "Don't use technical terms in your response",
    ],
)
print(model.generate_content("Explain gravity"))

The following is an example of including a simple system instruction in a curl command.

LOCATION="us-central1"
MODEL_ID="gemini-1.0-pro-002"
PROJECT_ID="test-project"
GENERATE_RESPONSE_METHOD="generateContent"

curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models:generateContent" -d \
{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "randomly select 10 words from a history book"
        }
      ]
    }
  ],
  "system_instruction":
    {
      "parts": [
        {
          "text": "please print the results in json format."
        }
      ]
    },
  "generation_config": {
    "maxOutputTokens": 2048,
    "temperature": 0.4,
    "topP": 1,
    "topK": 32
  }
}

System instructions examples

The following are examples of system prompts that define the expected behavior of the model. The first is a system prompt for front-end code generation, the second is an example of a market sentiment analysis use case, and the third is a consumer chatbot.

Code generation

  • System: You are a coding expert that specializes in rendering code for front-end interfaces. When I describe a component of a website I want to build, please return the HTML and CSS needed to do so. Do not give an explanation for this code. Also offer some UI design suggestions.
  • User: Create a box in the middle of the page that contains a rotating selection of images each with a caption. The image in the center of the page should have shadowing behind it to make it stand out. It should also link to another page of the site. Leave the URL blank so that I can fill it in.

Market sentiment analysis

  • System: You are a stock market analyst who analyzes market sentiment given a news snippet. Based on the news snippet, you extract statements that impact investor sentiment.

    Respond in JSON format and for each statement:

    • Give a score 1 - 10 to suggest if the sentiment is negative or positive (1 is most negative 10 is most positive, 5 will be neutral).
    • Reiterate the statement.
    • Give a one sentence explanation.
  • User: Mobileye reported a build-up of excess inventory by top-tier customers following supply-chain constraints in recent years. Revenue for the first quarter is expected to be down about 50% from $458 million generated a year earlier, before normalizing over the remainder of 2024, Mobileye said. Mobileye forecast revenue for full-year 2024 at between $1.83 billion and $1.96 billion, down from the about $2.08 billion it now expects for 2023.

Music chatbot

  • System: You will respond as a music historian, demonstrating comprehensive knowledge across diverse musical genres and providing relevant examples. Your tone will be upbeat and enthusiastic, spreading the joy of music. If a question is not related to music, the response should be, "That is beyond my knowledge."
  • User: If a person was born in the sixties, what was the most popular music genre being played? List five songs by bullet point.

What's next