You can guarantee that a model's generated output always adheres to a specific schema so that you receive consistently formatted responses. For example, you might have an established data schema that you use for other tasks. If you have the model follow the same schema, you can directly extract data from the model's output without any post-processing.
To specify the structure of a model's output, define a response schema, which works like a blueprint for model responses. When you submit a prompt and include the response schema, the model's response always follows your defined schema.
You can control generated output when using the following models:
- Gemini 1.5 Pro
- Gemini 1.5 Flash
For function calling with controlled generation (also known as forced function calling), see Introduction to function calling.
Example use cases
One use case for applying a response schema is to ensure a model's response produces valid JSON and conforms to your schema. Generative model outputs can have some degree of variability, so including a response schema ensures that you always receive valid JSON. Consequently, your downstream tasks can reliably expect valid JSON input from generated responses.
Another example is to constrain how a model can respond. For example, you can
have a model annotate text with user-defined labels, not with labels that the
model produces. This constraint is useful when you expect a specific set of
labels such as positive
or negative
and don't want to receive a mixture of
other labels that the model might generate like good
, positive
, negative
,
or bad
.
Considerations
The following considerations discuss potential limitations if you plan on using a response schema:
- You must use the API to define and use a response schema. There's no console support.
- The size of your response schema counts towards the input token limit.
- Only certain output formats are supported, such as
application/json
ortext/x.enum
. For more information, see theresponseMimeType
parameter in the Gemini API reference. - Controlled generation supports a subset of the Vertex AI schema reference. For more information, see Supported schema fields.
A complex schema can result in an
InvalidArgument: 400
error. Complexity might come from long property names, long array length limits, enums with many values, objects with lots of optional properties, or a combination of these factors.If you get this error with a valid schema, make one or more of the following changes to resolve the error:
- Shorten property names or enum names.
- Flatten nested arrays.
- Reduce the number of properties with constraints, such as numbers with minimum and maximum limits.
- Reduce the number of properties with complex constraints, such as
properties with complex formats like
date-time
. - Reduce the number of optional properties.
- Reduce the number of valid values for enums.
Supported schema fields
Controlled generation supports the following fields from the Vertex AI schema. If you use an unsupported field, Vertex AI can still handle your request but ignores the field.
anyOf
enum
format
items
maximum
maxItems
minimum
minItems
nullable
properties
propertyOrdering
*required
* propertyOrdering
is specifically for controlled generation and
not part of the Vertex AI schema. This field defines the order in
which properties are generated. The listed properties must be unique and must be
valid keys in the properties
dictionary.
For the format
field, Vertex AI supports the following values: date
,
date-time
, duration
, and time
. The description and format of each value is
described in the OpenAPI Initiative Registry
Before you begin
Define a response schema to specify the structure of a model's output, the field names, and the expected data type for each field. Use only the supported fields as listed in the Considerations section. All other fields are ignored.
Include your response schema as part of the responseSchema
field only. Don't
duplicate the schema in your input prompt. If you do, the generated output might
be lower in quality.
For sample schemas, see the Example schemas and model responses section.
Model behavior and response schema
When a model generates a response, it uses the field name and context from your prompt. As such, we recommend that you use a clear structure and unambiguous field names so that your intent is clear.
By default, fields are optional, meaning the model can populate the fields or skip them. You can set fields as required to force the model to provide a value. If there's insufficient context in the associated input prompt, the model generates responses mainly based on the data it was trained on.
If you aren't seeing the results you expect, add more context to your input prompts or revise your response schema. For example, review the model's response without controlled generation to see how the model responds. You can then update your response schema that better fits the model's output.
Send a prompt with a response schema
By default, all fields are optional, meaning a model might generate a response to a field. To force the model to always generate a response to a field, set the field as required.
Python
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
REST
Before using any of the request data, make the following replacements:
- GENERATE_RESPONSE_METHOD: The type of response that you want the model to generate.
Choose a method that generates how you want the model's response to be returned:
streamGenerateContent
: The response is streamed as it's being generated to reduce the perception of latency to a human audience.generateContent
: The response is returned after it's fully generated.
- LOCATION: The region to process the request.
- PROJECT_ID: Your project ID.
- MODEL_ID: The model ID of the multimodal model
that you want to use. The options are:
gemini-1.5-flash
gemini-1.5-pro
- ROLE:
The role in a conversation associated with the content. Specifying a role is required even in
singleturn use cases.
Acceptable values include the following:
USER
: Specifies content that's sent by you.
- TEXT: The text instructions to include in the prompt.
- RESPONSE_MIME_TYPE: The format type of the
generated candidate text. For a list of supported values, see the
responseMimeType
parameter in the Gemini API. - RESPONSE_SCHEMA: Schema for the model to follow when generating responses. For more information, see the Schema reference.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:GENERATE_RESPONSE_METHOD
Request JSON body:
{ "contents": { "role": "ROLE", "parts": { "text": "TEXT" } }, "generation_config": { "responseMimeType": "RESPONSE_MIME_TYPE", "responseSchema": RESPONSE_SCHEMA, } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:GENERATE_RESPONSE_METHOD"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:GENERATE_RESPONSE_METHOD" | Select-Object -Expand Content
You should receive a JSON response similar to the following.
Example curl command
LOCATION="us-central1"
MODEL_ID="gemini-1.5-pro"
PROJECT_ID="test-project"
GENERATE_RESPONSE_METHOD="generateContent"
cat << EOF > request.json
{
"contents": {
"role": "user",
"parts": {
"text": "List a few popular cookie recipes."
}
},
"generation_config": {
"maxOutputTokens": 2048,
"responseMimeType": "application/json",
"responseSchema": {
"type": "array",
"items": {
"type": "object",
"properties": {
"recipe_name": {
"type": "string",
},
},
"required": ["recipe_name"],
},
}
}
}
EOF
curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:${GENERATE_RESPONSE_METHOD} -d \
-d `@request.json`
Example schemas for JSON output
The following sections demonstrate a variety of sample prompts and response schemas. A sample model response is also included after each code sample.
- Summarize review ratings in a nested list
- Forecast the weather for each day of the week in an array
- Classify a product with a well-defined enum
- Identify objects in images
Summarize review ratings
The following example outputs an array of objects, where each object has two properties: the rating and the name of an ice cream flavor.
Python
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Example model response
candidates { content { role: "model" parts { text: "[\n [\n {\n \"rating\": 4\n },\n {\n \"flavor\": \"Strawberry Cheesecake\"\n },\n {\n \"rating\": 1\n },\n {\n \"flavor\": \"Mango Tango\"\n }\n ]\n] " } } finish_reason: STOP safety_ratings { category: HARM_CATEGORY_HATE_SPEECH probability: NEGLIGIBLE probability_score: 0.1139734759926796 severity: HARM_SEVERITY_NEGLIGIBLE severity_score: 0.10070161521434784 } safety_ratings { category: HARM_CATEGORY_DANGEROUS_CONTENT probability: NEGLIGIBLE probability_score: 0.13695430755615234 severity: HARM_SEVERITY_NEGLIGIBLE severity_score: 0.12241825461387634 } safety_ratings { category: HARM_CATEGORY_HARASSMENT probability: NEGLIGIBLE probability_score: 0.11676400154829025 severity: HARM_SEVERITY_NEGLIGIBLE severity_score: 0.05310790613293648 } safety_ratings { category: HARM_CATEGORY_SEXUALLY_EXPLICIT probability: NEGLIGIBLE probability_score: 0.10521054267883301 severity: HARM_SEVERITY_NEGLIGIBLE severity_score: 0.08299414813518524 } } usage_metadata { prompt_token_count: 61 candidates_token_count: 66 total_token_count: 127 }
Forecast the weather for each day of the week
The following example outputs a forecast
object for each day
of the week that includes an array of properties such as the expected
temperate and humidity level for the day. Some properties are set to nullable
so the model can return a null value when it doesn't have enough context to
generate a meaningful response. This strategy helps reduce hallucinations.
Python
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Example model response
candidates { content { role: "model" parts { text: "{\"forecast\": [{\"Day\": \"Sunday\", \"Forecast\": \"sunny\", \"Humidity\": \"50%\", \"Temperature\": 77, \"Wind Speed\": 10}, {\"Day\": \"Monday\", \"Forecast\": \"partly cloudy\", \"Humidity\": null, \"Temperature\": 72, \"Wind Speed\": 15}, {\"Day\": \"Tuesday\", \"Forecast\": \"rain showers\", \"Humidity\": \"70%\", \"Temperature\": 64, \"Wind Speed\": null}, {\"Day\": \"Wednesday\", \"Forecast\": \"thunderstorms\", \"Humidity\": null, \"Temperature\": 68, \"Wind Speed\": null}, {\"Day\": \"Thursday\", \"Forecast\": \"cloudy\", \"Humidity\": \"60%\", \"Temperature\": 66, \"Wind Speed\": null}, {\"Day\": \"Friday\", \"Forecast\": \"partly cloudy\", \"Humidity\": null, \"Temperature\": 73, \"Wind Speed\": 12}, {\"Day\": \"Saturday\", \"Forecast\": \"sunny\", \"Humidity\": \"40%\", \"Temperature\": 80, \"Wind Speed\": 8}]}" } } finish_reason: STOP safety_ratings { category: HARM_CATEGORY_HATE_SPEECH probability: NEGLIGIBLE probability_score: 0.1037486344575882 severity: HARM_SEVERITY_NEGLIGIBLE severity_score: 0.09670579433441162 } safety_ratings { category: HARM_CATEGORY_DANGEROUS_CONTENT probability: NEGLIGIBLE probability_score: 0.18126320838928223 severity: HARM_SEVERITY_NEGLIGIBLE severity_score: 0.10052486509084702 } safety_ratings { category: HARM_CATEGORY_HARASSMENT probability: NEGLIGIBLE probability_score: 0.15960998833179474 severity: HARM_SEVERITY_NEGLIGIBLE severity_score: 0.09518112242221832 } safety_ratings { category: HARM_CATEGORY_SEXUALLY_EXPLICIT probability: NEGLIGIBLE probability_score: 0.1388116478919983 severity: HARM_SEVERITY_NEGLIGIBLE severity_score: 0.10539454221725464 } } usage_metadata { prompt_token_count: 280 candidates_token_count: 249 total_token_count: 529 }
Classify a product
The following example includes enums where the model must classify an object's type and condition from a list of given values.
Python
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Example model response
candidates { content { role: "model" parts { text: " [{\n \"item_category\": \"winter apparel\",\n \"subcategory\": \"coat\",\n \"to_discard\": 1\n }] " } } finish_reason: STOP safety_ratings { category: HARM_CATEGORY_HATE_SPEECH probability: NEGLIGIBLE probability_score: 0.08945459872484207 severity: HARM_SEVERITY_NEGLIGIBLE severity_score: 0.13753245770931244 } safety_ratings { category: HARM_CATEGORY_DANGEROUS_CONTENT probability: NEGLIGIBLE probability_score: 0.19208428263664246 severity: HARM_SEVERITY_LOW severity_score: 0.23810701072216034 } safety_ratings { category: HARM_CATEGORY_HARASSMENT probability: NEGLIGIBLE probability_score: 0.07585817575454712 severity: HARM_SEVERITY_NEGLIGIBLE severity_score: 0.04336579889059067 } safety_ratings { category: HARM_CATEGORY_SEXUALLY_EXPLICIT probability: NEGLIGIBLE probability_score: 0.12667709589004517 severity: HARM_SEVERITY_NEGLIGIBLE severity_score: 0.07396338135004044 } } usage_metadata { prompt_token_count: 38 candidates_token_count: 33 total_token_count: 71 }
Identify objects in images
The following example identifies objects from two images that are stored on Cloud Storage.
Python
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Example model response
candidates { content { role: "model" parts { text: "[\n [\n {\n \"object\": \"globe model\"\n },\n {\n \"object\": \"tablet computer\"\n },\n {\n \"object\": \"shopping cart\"\n },\n {\n \"object\": \"Eiffel Tower model\"\n },\n {\n \"object\": \"airplane model\"\n },\n {\n \"object\": \"coffee cup\"\n },\n {\n \"object\": \"computer keyboard\"\n },\n {\n \"object\": \"computer mouse\"\n },\n {\n \"object\": \"passport\"\n },\n {\n \"object\": \"sunglasses\"\n },\n {\n \"object\": \"US Dollar bills\"\n },\n {\n \"object\": \"notepad\"\n },\n {\n \"object\": \"pen\"\n }\n ],\n [\n {\n \"object\": \"watering can\"\n },\n {\n \"object\": \"oregano\"\n },\n {\n \"object\": \"flower pot\"\n },\n {\n \"object\": \"flower pot\"\n },\n {\n \"object\": \"gardening gloves\"\n },\n {\n \"object\": \"hand rake\"\n },\n {\n \"object\": \"hand trowel\"\n },\n {\n \"object\": \"grass\"\n }\n ]\n] " } } finish_reason: STOP safety_ratings { category: HARM_CATEGORY_HATE_SPEECH probability: NEGLIGIBLE probability_score: 0.1872812658548355 severity: HARM_SEVERITY_NEGLIGIBLE severity_score: 0.16357900202274323 } safety_ratings { category: HARM_CATEGORY_DANGEROUS_CONTENT probability: LOW probability_score: 0.37920594215393066 severity: HARM_SEVERITY_LOW severity_score: 0.29320207238197327 } safety_ratings { category: HARM_CATEGORY_HARASSMENT probability: NEGLIGIBLE probability_score: 0.14175598323345184 severity: HARM_SEVERITY_NEGLIGIBLE severity_score: 0.12074951827526093 } safety_ratings { category: HARM_CATEGORY_SEXUALLY_EXPLICIT probability: NEGLIGIBLE probability_score: 0.12241825461387634 severity: HARM_SEVERITY_NEGLIGIBLE severity_score: 0.0955180674791336 } } usage_metadata { prompt_token_count: 525 candidates_token_count: 333 total_token_count: 858 }
Example schema for enum output
The following example identifies the genre of a movie based on its description. The output is one plain-text enum value that the model selects from a list values that are defined in the response schema.
Python
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Example model response
candidates { content { role: "model" parts { text: "documentary" } } finish_reason: STOP safety_ratings { category: HARM_CATEGORY_HATE_SPEECH probability: NEGLIGIBLE probability_score: 0.051025390625 severity: HARM_SEVERITY_NEGLIGIBLE severity_score: 0.08056640625 } safety_ratings { category: HARM_CATEGORY_DANGEROUS_CONTENT probability: NEGLIGIBLE probability_score: 0.1416015625 severity: HARM_SEVERITY_NEGLIGIBLE severity_score: 0.068359375 } safety_ratings { category: HARM_CATEGORY_HARASSMENT probability: NEGLIGIBLE probability_score: 0.11572265625 severity: HARM_SEVERITY_NEGLIGIBLE severity_score: 0.0439453125 } safety_ratings { category: HARM_CATEGORY_SEXUALLY_EXPLICIT probability: NEGLIGIBLE probability_score: 0.099609375 severity: HARM_SEVERITY_NEGLIGIBLE severity_score: 0.146484375 } avg_logprobs: -8.783838711678982e-05 } usage_metadata { prompt_token_count: 33 candidates_token_count: 2 total_token_count: 35 }