This page shows you how to send chat prompts to the
Gemini 1.0 Pro (gemini-1.0-pro) model by using the
Google Cloud console, REST API, and supported SDKs. Gemini 1.0 Pro
supports prompts with text-only input, including natural language tasks,
multi-turn text and code chat, and code generation. It can output text and code.
The Gemini 1.0 Pro foundation model is a large language model that excels at understanding and generating language. You can interact with Gemini Pro using a single-turn prompt and response or chat with it in a multi-turn, continuous conversation, even for code understanding and generation.
For a list of languages supported by Gemini 1.0 Pro, see model information Language support.
To explore this model in the console, select the gemini-1.0-pro model
card in the Model Garden.
If you're looking for a way to use Gemini directly from your mobile and web apps, check out the Google AI SDKs for Android, Swift, and web.
Use cases
Gemini 1.0 Pro supports text and code generation from a text prompt, including but not limited to the following use cases:
- Summarization: Create a shorter version of a document that incorporates pertinent information from the original text. For example, you might want to summarize a chapter from a textbook. Or, you could create a succinct product description from a long paragraph that describes the product in detail.
- Question answering: Provide answers to questions in text. For example, you might automate the creation of a Frequently Asked Questions (FAQ) document from knowledge base content.
- Classification: Assign a label to provided text. For example, a label might be applied to text that describes how grammatically correct it is.
- Sentiment analysis: This is a form of classification that identifies the sentiment of text. The sentiment is turned into a label that's applied to the text. For example, the sentiment of text might be polarities like positive or negative, or sentiments like anger or happiness.
- Entity extraction: Extract a piece of information from text. For example, you can extract the name of a movie from the text of an article.
- Content creation: Generate texts by specifying a set of requirements and background. For example, you might want to draft an email under a given context using a certain tone.
- Code generation: Generate code based on a description. For example, you can ask the model to write a function that checks whether a year is a leap year.
- Multi-turn chat: Prompts that include previous messages as context for generating new responses.
To learn more about how to design prompts for various uses, see the following pages:
Send chat prompts
For testing and iterating on chat prompts, we recommend using the Google Cloud console. To send prompts programmatically to the model, you can use the REST API, Vertex AI SDK for Python, or one of the other supported libraries and SDKs shown in the following tabs.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Vertex AI SDK for Python API reference documentation.
Streaming and non-streaming responses
You can choose whether the model generates a streaming response or a non-streaming response. Streaming involves receiving responses to prompts as they are generated. That is, as soon as the model generates output tokens, the output tokens are sent. A non-streaming response to prompts is sent only after all of the output tokens are generated.
For a streaming response, use the stream parameter in
generate_content.
response = model.generate_content(contents=[...], stream = True)
For a non-streaming response, remove the parameter, or set the parameter to
False.
Sample code
Node.js
Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Streaming and non-streaming responses
You can choose whether the model generates a streaming response or a non-streaming response. Streaming involves receiving responses to prompts as they are generated. That is, as soon as the model generates output tokens, the output tokens are sent. A non-streaming response to prompts is sent only after all of the output tokens are generated.
For a streaming response, use the generateContentStream method.
const streamingResp = await generativeModel.generateContentStream(request);
For a non-streaming response, use the generateContent method.
const streamingResp = await generativeModel.generateContent(request);
Sample code
Java
Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Streaming and non-streaming responses
You can choose whether the model generates a streaming response or a non-streaming response. Streaming involves receiving responses to prompts as they are generated. That is, as soon as the model generates output tokens, the output tokens are sent. A non-streaming response to prompts is sent only after all of the output tokens are generated.
For a streaming response, use the generateContentStream method.
public ResponseStreamgenerateContentStream(Content content)
For a non-streaming response, use the generateContent method.
public GenerateContentResponse generateContent(Content content)
Sample code
Go
Before trying this sample, follow the Go setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Go API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Streaming and non-streaming responses
You can choose whether the model generates a streamed response or a non-streamed response. Streaming involves receiving responses to prompts as they are generated. That is, as soon as the model generates output tokens, the output tokens are sent. A non-streaming response to prompts is sent only after all of the output tokens are generated.
For a streaming response, use the GenerateContentStream method.
iter := model.GenerateContentStream(ctx, genai.Text("Tell me a story about a lumberjack and his giant ox. Keep it very short."))
For a non-streaming response, use the GenerateContent method.
resp, err := model.GenerateContent(ctx, genai.Text("What is the average size of a swallow?"))
Sample code
REST
You can use REST to send a chat prompt by using the Vertex AI API to send a POST request to the publisher model endpoint.
Before using any of the request data, make the following replacements:
- GENERATE_RESPONSE_METHOD: The type of response that you want the model to generate.
Choose a method that generates how you want the model's response to be returned:
streamGenerateContent: The response is streamed as it's being generated to reduce the perception of latency to a human audience.generateContent: The response is returned after it's fully generated.
- LOCATION: The region to process the request. Available
options include the following:
Click to expand available regions
us-central1us-west4northamerica-northeast1us-east4us-west1asia-northeast3asia-southeast1asia-northeast1
- PROJECT_ID: Your project ID.
- MODEL_ID: The model ID of the multimodal model
that you want to use. The options are:
gemini-1.0-pro
- ROLE:
The role in a conversation associated with the content. Specifying a role is required even in
singleturn use cases.
Acceptable values include the following:
USER: Specifies content that's sent by you.MODEL: Specifies the model's response.
- TEXT: The text instructions to include in the prompt.
- SAFETY_CATEGORY:
The safety category to configure a threshold for. Acceptable values include the following:
Click to expand safety categories
HARM_CATEGORY_SEXUALLY_EXPLICITHARM_CATEGORY_HATE_SPEECHHARM_CATEGORY_HARASSMENTHARM_CATEGORY_DANGEROUS_CONTENT
- THRESHOLD:
The threshold for blocking responses that could belong to the specified safety category based on
probability. Acceptable values include the following:
Click to expand blocking thresholds
BLOCK_NONEBLOCK_ONLY_HIGHBLOCK_MEDIUM_AND_ABOVE(default)BLOCK_LOW_AND_ABOVE
BLOCK_LOW_AND_ABOVEblocks the most whileBLOCK_ONLY_HIGHblocks the least. - TEMPERATURE:
The temperature is used for sampling during response generation, which occurs when
topPandtopKare applied. Temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that require a less open-ended or creative response, while higher temperatures can lead to more diverse or creative results. A temperature of0means that the highest probability tokens are always selected. In this case, responses for a given prompt are mostly deterministic, but a small amount of variation is still possible.If the model returns a response that's too generic, too short, or the model gives a fallback response, try increasing the temperature.
- TOP_P:
Top-P changes how the model selects tokens for output. Tokens are selected
from the most (see top-K) to least probable until the sum of their probabilities
equals the top-P value. For example, if tokens A, B, and C have a probability of
0.3, 0.2, and 0.1 and the top-P value is
0.5, then the model will select either A or B as the next token by using temperature and excludes C as a candidate.Specify a lower value for less random responses and a higher value for more random responses.
- TOP_K:
Top-K changes how the model selects tokens for output. A top-K of
1means the next selected token is the most probable among all tokens in the model's vocabulary (also called greedy decoding), while a top-K of3means that the next token is selected from among the three most probable tokens by using temperature.For each token selection step, the top-K tokens with the highest probabilities are sampled. Then tokens are further filtered based on top-P with the final token selected using temperature sampling.
Specify a lower value for less random responses and a higher value for more random responses.
- MAX_OUTPUT_TOKENS:
Maximum number of tokens that can be generated in the response. A token is
approximately four characters. 100 tokens correspond to roughly 60-80 words.
Specify a lower value for shorter responses and a higher value for potentially longer responses.
- STOP_SEQUENCES:
Specifies a list of strings that tells the model to stop generating text if one
of the strings is encountered in the response. If a string appears multiple
times in the response, then the response truncates where it's first encountered.
The strings are case-sensitive.
For example, if the following is the returned response when
stopSequencesisn't specified:public static string reverse(string myString)Then the returned response withstopSequencesset to["Str", "reverse"]is:public static string
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:GENERATE_RESPONSE_METHOD
Request JSON body:
{
"contents": {
"role": "ROLE",
"parts": { "text": "TEXT" }
},
"safety_settings": {
"category": "SAFETY_CATEGORY",
"threshold": "THRESHOLD"
},
"generation_config": {
"temperature": TEMPERATURE,
"topP": TOP_P,
"topK": TOP_K,
"candidateCount": 1,
"maxOutputTokens": MAX_OUTPUT_TOKENS,
"stopSequences": STOP_SEQUENCES,
}
}
To send your request, choose one of these options:
curl
Save the request body in a file named request.json,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:GENERATE_RESPONSE_METHOD"
PowerShell
Save the request body in a file named request.json,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:GENERATE_RESPONSE_METHOD" | Select-Object -Expand Content
You should receive a JSON response similar to the following.
Example curl command
LOCATION="us-central1"
MODEL_ID="gemini-1.0-pro"
PROJECT_ID="test-project"
GENERATE_RESPONSE_METHOD="generateContent"
curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:${GENERATE_RESPONSE_METHOD} -d \
$'{
"contents": [
{
"role": "user",
"parts": { "text": "Hello!" }
},
{
"role": "model",
"parts": { "text": "Argh! What brings ye to my ship?" }
},
{
"role": "user",
"parts": { "text": "Wow! You are a real-life pirate!" }
}
],
"safety_settings": {
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_LOW_AND_ABOVE"
},
"generation_config": {
"temperature": 0.9,
"topP": 1,
"candidateCount": 1,
"maxOutputTokens": 2048
}
}'
Console
To use the Vertex AI Studio to send a chat prompt in the Google Cloud console, do the following:
- In the Generative AI section of the Google Cloud console, go to the Language section of the Vertex AI Studio.
- Click Text chat.
Configure the model and parameters:
- Region: Select the region that you want to use.
- Model: Select Gemini Pro.
Temperature: Use the slider or textbox to enter a value for temperature.
The temperature is used for sampling during response generation, which occurs whentopPandtopKare applied. Temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that require a less open-ended or creative response, while higher temperatures can lead to more diverse or creative results. A temperature of0means that the highest probability tokens are always selected. In this case, responses for a given prompt are mostly deterministic, but a small amount of variation is still possible.If the model returns a response that's too generic, too short, or the model gives a fallback response, try increasing the temperature.
Token limit: Use the slider or textbox to enter a value for the max output limit.
Maximum number of tokens that can be generated in the response. A token is approximately four characters. 100 tokens correspond to roughly 60-80 words.Specify a lower value for shorter responses and a higher value for potentially longer responses.
- Add stop sequence: Enter a stop sequence, which is a series of characters (including spaces) that stops response generation if the model encounters it. The sequence is not included as part of the response. You can add up to five stop sequences.
- Optional: To configure advanced parameters, click Advanced and
configure as follows:
Click to expand advanced configurations
Top-K: Use the slider or textbox to enter a value for top-K.
Top-K changes how the model selects tokens for output. A top-K of1means the next selected token is the most probable among all tokens in the model's vocabulary (also called greedy decoding), while a top-K of3means that the next token is selected from among the three most probable tokens by using temperature.For each token selection step, the top-K tokens with the highest probabilities are sampled. Then tokens are further filtered based on top-P with the final token selected using temperature sampling.
Specify a lower value for less random responses and a higher value for more random responses.
- Top-P: Use the slider or textbox to enter a value for top-P.
Tokens are selected from most probable to the least until the sum of their
probabilities equals the value of top-P. For the least variable results,
set top-P to
0.
The Google Cloud console only supports streaming, which involves receiving responses to prompts as they are generated. You are ready to enter a message in the message box to start a conversation with the model.
The model uses the previous messages as context for new responses.
- Optional: To save your prompt to My prompts, click Save.
- Optional: To get the Python code or a curl command for your prompt, click Get code.
- Optional: To clear all previous messages, click Clear conversation
What's next
- Learn how to send multimodal prompt requests.
- Learn about responsible AI best practices and Vertex AI's safety filters.