This document describes how to use system instructions. To learn about what system instructions are and best practices for using system instructions, see Introduction to system instructions instead.
System instructions are a set of instructions that the model processes before it processes prompts. We recommend that you use system instructions to tell the model how you want it to behave and respond to prompts. For example, you can include things like the role or persona, contextual information, and formatting instructions:
You are a friendly and helpful assistant.
Ensure your answers are complete, unless the user requests a more concise approach.
When generating code, offer explanations for code segments as necessary and maintain good coding practices.
When presented with inquiries seeking information, provide answers that reflect a deep understanding of the field, guaranteeing their correctness.
For any non-english queries, respond in the same language as the prompt unless otherwise specified by the user.
For prompts involving reasoning, provide a clear explanation of each step in the reasoning process before presenting the final answer.
When a system instruction is set, it applies to the entire request. It works across multiple user and model turns when included in the prompt. Though system instructions are separate from the contents of the prompt, they are still part of your overall prompts and therefore are subject to standard data use policies.
Supported models
The following models support system instructions:
- All Gemini 1.5 Pro model versions
- All Gemini 1.5 Flash model versions
- Gemini 1.0 Pro version
gemini-1.0-pro-002
If you're using a different model, see Assign a role instead.
Use cases
You can use system instructions in many ways, including:
- Defining a persona or role (for a chatbot, for example)
- Defining output format (Markdown, YAML, etc.)
- Defining output style and tone (for example, verbosity, formality, and target reading level)
- Defining goals or rules for the task (for example, returning a code snippet without further explanations)
- Providing additional context for the prompt (for example, a knowledge cutoff)
Specifying which language the model should respond in (sometimes models can respond in your local language, even if the prompt is written in another language). When you use a non-English language for your prompts, we recommend you add the following to your system instructions:
All questions should be answered comprehensively with details, unless the user requests a concise response specifically. Respond in the same language as the query.
Code samples
The code samples on the following tabs demonstrate how to use system instructions in your generative AI application.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Vertex AI SDK for Python API reference documentation.
Streaming and non-streaming responses
You can choose whether the model generates streaming responses or non-streaming responses. For streaming responses, you receive each response as soon as its output token is generated. For non-streaming responses, you receive all responses after all of the output tokens are generated.
For a streaming response, use the stream
parameter in
generate_content
.
response = model.generate_content(contents=[...], stream = True)
For a non-streaming response, remove the parameter, or set the parameter to
False
.
Sample code
Node.js
Before trying this sample, follow the Node.js setup instructions in the Generative AI quickstart using the Node.js SDK. For more information, see the Node.js SDK for Gemini reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Streaming and non-streaming responses
You can choose whether the model generates streaming responses or non-streaming responses. For streaming responses, you receive each response as soon as its output token is generated. For non-streaming responses, you receive all responses after all of the output tokens are generated.
For a streaming response, use the
generateContentStream
method.
const streamingResp = await generativeModel.generateContentStream(request);
For a non-streaming response, use the
generateContent
method.
const streamingResp = await generativeModel.generateContent(request);
Sample code
Java
Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart. For more information, see the Vertex AI Java SDK for Gemini reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Streaming and non-streaming responses
You can choose whether the model generates streaming responses or non-streaming responses. For streaming responses, you receive each response as soon as its output token is generated. For non-streaming responses, you receive all responses after all of the output tokens are generated.
For a streaming response, use the
generateContentStream
method.
public ResponseStream<GenerateContentResponse> generateContentStream(Content content)
For a non-streaming response, use the
generateContent
method.
public GenerateContentResponse generateContent(Content content)
Sample code
Go
Before trying this sample, follow the Go setup instructions in the Vertex AI quickstart. For more information, see the Vertex AI Go SDK for Gemini reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Streaming and non-streaming responses
You can choose whether the model generates streaming responses or non-streaming responses. For streaming responses, you receive each response as soon as its output token is generated. For non-streaming responses, you receive all responses after all of the output tokens are generated.
For a streaming response, use the
GenerateContentStream
method.
iter := model.GenerateContentStream(ctx, genai.Text("Tell me a story about a lumberjack and his giant ox. Keep it very short."))
For a non-streaming response, use the GenerateContent
method.
resp, err := model.GenerateContent(ctx, genai.Text("What is the average size of a swallow?"))
Sample code
C#
Before trying this sample, follow the C# setup instructions in the Vertex AI quickstart. For more information, see the Vertex AI C# reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Streaming and non-streaming responses
You can choose whether the model generates streaming responses or non-streaming responses. For streaming responses, you receive each response as soon as its output token is generated. For non-streaming responses, you receive all responses after all of the output tokens are generated.
For a streaming response, use the
StreamGenerateContent
method.
public virtual PredictionServiceClient.StreamGenerateContentStream StreamGenerateContent(GenerateContentRequest request)
For a non-streaming response, use the
GenerateContentAsync
method.
public virtual Task<GenerateContentResponse> GenerateContentAsync(GenerateContentRequest request)
For more information on how the server can stream responses, see Streaming RPCs.
Sample code
REST
After you set up your environment, you can use REST to test a text prompt. The following sample sends a request to the publisher model endpoint.
Before using any of the request data, make the following replacements:
GENERATE_RESPONSE_METHOD
: The type of response that you want the model to generate. Choose a method that generates how you want the model's response to be returned:streamGenerateContent
: The response is streamed as it's being generated to reduce the perception of latency to a human audience.generateContent
: The response is returned after it's fully generated.
LOCATION
: The region to process the request. Available options include the following:Click to expand a partial list of available regions
us-central1
us-west4
northamerica-northeast1
us-east4
us-west1
asia-northeast3
asia-southeast1
asia-northeast1
PROJECT_ID
: Your project ID.MODEL_ID
: The model ID of the multimodal model that you want to use. Some options include the following:gemini-1.0-pro-002
gemini-1.0-pro-vision-001
gemini-1.5-pro-002
gemini-1.5-flash
ROLE
: The role in a conversation associated with the content. Specifying a role is required even in singleturn use cases. Acceptable values include the following:USER
: Specifies content that's sent by you.MODEL
: Specifies the model's response.
The text instructions to include in the prompt. For example,TEXT
User input: I like bagels
.SAFETY_CATEGORY
: The safety category to configure a threshold for. Acceptable values include the following:Click to expand safety categories
HARM_CATEGORY_SEXUALLY_EXPLICIT
HARM_CATEGORY_HATE_SPEECH
HARM_CATEGORY_HARASSMENT
HARM_CATEGORY_DANGEROUS_CONTENT
THRESHOLD
: The threshold for blocking responses that could belong to the specified safety category based on probability. Acceptable values include the following:Click to expand blocking thresholds
BLOCK_NONE
BLOCK_ONLY_HIGH
BLOCK_MEDIUM_AND_ABOVE
(default)BLOCK_LOW_AND_ABOVE
BLOCK_LOW_AND_ABOVE
blocks the most whileBLOCK_ONLY_HIGH
blocks the least. (Optional) Not available for all models. Instructions for the model to steer it toward better performance. JSON does not support line breaks. Replace all line breaks in this field withSYSTEM_INSTRUCTION
\n
. For example,You are a helpful language translator.\nYour mission is to translate text in English to French.
TEMPERATURE
: The temperature is used for sampling during response generation, which occurs whentopP
andtopK
are applied. Temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that require a less open-ended or creative response, while higher temperatures can lead to more diverse or creative results. A temperature of0
means that the highest probability tokens are always selected. In this case, responses for a given prompt are mostly deterministic, but a small amount of variation is still possible.If the model returns a response that's too generic, too short, or the model gives a fallback response, try increasing the temperature.
TOP_P
: Top-P changes how the model selects tokens for output. Tokens are selected from the most (see top-K) to least probable until the sum of their probabilities equals the top-P value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1 and the top-P value is0.5
, then the model will select either A or B as the next token by using temperature and excludes C as a candidate.Specify a lower value for less random responses and a higher value for more random responses.
TOP_K
: Top-K changes how the model selects tokens for output. A top-K of1
means the next selected token is the most probable among all tokens in the model's vocabulary (also called greedy decoding), while a top-K of3
means that the next token is selected from among the three most probable tokens by using temperature.For each token selection step, the top-K tokens with the highest probabilities are sampled. Then tokens are further filtered based on top-P with the final token selected using temperature sampling.
Specify a lower value for less random responses and a higher value for more random responses.
MAX_OUTPUT_TOKENS
: Maximum number of tokens that can be generated in the response. A token is approximately four characters. 100 tokens correspond to roughly 60-80 words.Specify a lower value for shorter responses and a higher value for potentially longer responses.
STOP_SEQUENCES
: Specifies a list of strings that tells the model to stop generating text if one of the strings is encountered in the response. If a string appears multiple times in the response, then the response truncates where it's first encountered. The strings are case-sensitive. For example, if the following is the returned response whenstopSequences
isn't specified:public static string reverse(string myString)
Then the returned response withstopSequences
set to["Str", "reverse"]
is:public static string
Specify an empty array ([]
) to disable stop sequences.
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
.
Run the following command in the terminal to create or overwrite
this file in the current directory:
cat > request.json << 'EOF' { "contents": { "role": "ROLE", "parts": { "text": "TEXT" } }, "system_instruction": { "parts": [ { "text": "SYSTEM_INSTRUCTION" } ] }, "safety_settings": { "category": "SAFETY_CATEGORY", "threshold": "THRESHOLD" }, "generation_config": { "temperature": TEMPERATURE, "topP": TOP_P, "topK": TOP_K, "candidateCount": 1, "maxOutputTokens": MAX_OUTPUT_TOKENS, "stopSequences": STOP_SEQUENCES } } EOF
Then execute the following command to send your REST request:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:GENERATE_RESPONSE_METHOD"
PowerShell
Save the request body in a file named request.json
.
Run the following command in the terminal to create or overwrite
this file in the current directory:
@' { "contents": { "role": "ROLE", "parts": { "text": "TEXT" } }, "system_instruction": { "parts": [ { "text": "SYSTEM_INSTRUCTION" } ] }, "safety_settings": { "category": "SAFETY_CATEGORY", "threshold": "THRESHOLD" }, "generation_config": { "temperature": TEMPERATURE, "topP": TOP_P, "topK": TOP_K, "candidateCount": 1, "maxOutputTokens": MAX_OUTPUT_TOKENS, "stopSequences": STOP_SEQUENCES } } '@ | Out-File -FilePath request.json -Encoding utf8
Then execute the following command to send your REST request:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:GENERATE_RESPONSE_METHOD" | Select-Object -Expand Content
You should receive a JSON response similar to the following.
Note the following in the URL for this sample:- Use the
generateContent
method to request that the response is returned after it's fully generated. To reduce the perception of latency to a human audience, stream the response as it's being generated by using thestreamGenerateContent
method. - The multimodal model ID is located at the end of the URL before the method
(for example,
gemini-1.5-flash
orgemini-1.0-pro-vision
). This sample may support other models as well.
Prompt examples
Here's a basic example of setting the system instruction using the Python SDK for the Gemini API:
model=genai.GenerativeModel(
model_name="gemini-1.5-pro-002",
system_instruction="You are a cat. Your name is Neko.")
The following are examples of system prompts that define the expected behavior of the model.
Code generation
Code generation |
---|
You are a coding expert that specializes in rendering code for front-end interfaces. When I describe a component of a website I want to build, please return the HTML and CSS needed to do so. Do not give an explanation for this code. Also offer some UI design suggestions. Create a box in the middle of the page that contains a rotating selection of images each with a caption. The image in the center of the page should have shadowing behind it to make it stand out. It should also link to another page of the site. Leave the URL blank so that I can fill it in. |
Formatted data generation
Formatted data generation |
---|
You are an assistant for home cooks. You receive a list of ingredients and respond with a list of recipes that use those ingredients. Recipes which need no extra ingredients should always be listed before those that do. Your response must be a JSON object containing 3 recipes. A recipe object has the following schema: * name: The name of the recipe * usedIngredients: Ingredients in the recipe that were provided in the list * otherIngredients: Ingredients in the recipe that were not provided in the list (omitted if there are no other ingredients) * description: A brief description of the recipe, written positively as if to sell it * 1 lb bag frozen broccoli * 1 pint heavy cream * 1 lb pack cheese ends and pieces |
Music chatbot
Music chatbot |
---|
You will respond as a music historian, demonstrating comprehensive knowledge across diverse musical genres and providing relevant examples. Your tone will be upbeat and enthusiastic, spreading the joy of music. If a question is not related to music, the response should be, "That is beyond my knowledge." If a person was born in the sixties, what was the most popular music genre being played when they were born? List five songs by bullet point. |
Financial analysis
Financial analysis |
---|
As a financial analysis expert, your role is to interpret complex financial data, offer personalized advice, and evaluate investments using statistical methods to gain insights across different financial areas. Accuracy is the top priority. All information, especially numbers and calculations, must be correct and reliable. Always double-check for errors before giving a response. The way you respond should change based on what the user needs. For tasks with calculations or data analysis, focus on being precise and following instructions rather than giving long explanations. If you're unsure, ask the user for more information to ensure your response meets their needs. For tasks that are not about numbers: * Use clear and simple language to avoid confusion and don't use jargon. * Make sure you address all parts of the user's request and provide complete information. * Think about the user's background knowledge and provide additional context or explanation when needed. Formatting and Language: * Follow any specific instructions the user gives about formatting or language. * Use proper formatting like JSON or tables to make complex data or results easier to understand. Please summarize the key insights of given numerical tables. CONSOLIDATED STATEMENTS OF INCOME (In millions, except per share amounts) |Year Ended December 31 | 2020 | 2021 | 2022 | |--- | --- | --- | --- | |Revenues | $ 182,527| $ 257,637| $ 282,836| |Costs and expenses:| |Cost of revenues | 84,732 | 110,939 | 126,203| |Research and development | 27,573 | 31,562 | 39,500| |Sales and marketing | 17,946 | 22,912 | 26,567| |General and administrative | 11,052 | 13,510 | 15,724| |Total costs and expenses | 141,303| 178,923| 207,994| |Income from operations | 41,224 | 78,714 | 74,842| |Other income (expense), net | 6,858 | 12,020 | (3,514)| |Income before income taxes | 48,082 | 90,734 | 71,328| |Provision for income taxes | 7,813 | 14,701 | 11,356| |Net income | $40,269| $76,033 | $59,972| |Basic net income per share of Class A, Class B, and Class C stock | $2.96| $5.69| $4.59| |Diluted net income per share of Class A, Class B, and Class C stock| $2.93| $5.61| $4.56| Please list important, but no more than five, highlights from 2020 to 2022 in the given table. Please write in a professional and business-neutral tone. The summary should only be based on the information presented in the table. |
Market sentiment analysis
Market sentiment analysis |
---|
You are a stock market analyst who analyzes market sentiment given a news snippet. Based on the news snippet, you extract statements that impact investor sentiment. Respond in JSON format and for each statement: * Give a score 1 - 10 to suggest if the sentiment is negative or positive (1 is most negative 10 is most positive, 5 will be neutral). * Reiterate the statement. * Give a one sentence explanation. Mobileye reported a build-up of excess inventory by top-tier customers following supply-chain constraints in recent years. Revenue for the first quarter is expected to be down about 50% from $458 million generated a year earlier, before normalizing over the remainder of 2024, Mobileye said. Mobileye forecast revenue for full-year 2024 at between $1.83 billion and $1.96 billion, down from the about $2.08 billion it now expects for 2023. |
What's next
- Explore more examples of prompts in the Prompt gallery.