Google's generative AI models, like Gemini 1.5 Flash and Gemini 1.5 Pro, are designed to prioritize safety. However, they can still generate harmful responses, especially when they're explicitly prompted. To further enhance safety and minimize misuse, you can configure safety filters to block potentially harmful responses.
This page describes each of the safety filter types and outlines key safety concepts. For configurable filters, it shows you how to configure the blocking thresholds of each harm category to control how often prompts and responses are blocked.
Safety filters act as a barrier, preventing harmful output, but they don't directly influence the model's behavior. To learn more about model steerability, see System instructions for safety.
Unsafe prompts
The Vertex AI Gemini API provides one of the following enum
codes to explain why a prompt was rejected:
Enum | Filter type | Description |
---|---|---|
PROHIBITED_CONTENT | Non-configurable safety filter | The prompt was blocked because it was flagged for containing the prohibited contents, usually CSAM. |
BLOCKED_REASON_UNSPECIFIED | N/A | The reason for blocking the prompt is unspecified. |
OTHER | N/A | This enum refers to all other reasons for blocking a prompt. Note that Vertex AI Gemini API does not support all languages. For a list of supported languages, see Gemini language support. |
To learn more, see BlockedReason.
The following is an example of Vertex AI Gemini API output when a prompt is
blocked for containing PROHIBITED_CONTENT
:
{ "promptFeedback": { "blockReason": "PROHIBITED_CONTENT" }, "usageMetadata": { "promptTokenCount": 7, "totalTokenCount": 7 } }
Unsafe responses
The following filters can detect and block potentially unsafe responses:
- Non-configurable safety filters, which block child sexual abuse material (CSAM) and personally identifiable information (PII).
- Configurable safety filters, which block unsafe content based on a list of harm categories and their user-configured blocking thresholds. You can configure blocking thresholds for each of these harms based on what is appropriate for your use case and business. To learn more, see Configurable safety filters.
- Citation filters, which provide citations for source material. To learn more, see Citation filter.
An LLM generates responses in units of text called tokens. A model stops
generating tokens because it reaches a natural stopping point or
because one of the filters blocks the response. The Vertex AI Gemini API
provides one of the following enum
codes to explain why token generation stopped:
Enum | Filter type | Description |
---|---|---|
STOP | N/A | This enum indicates that the model reached a natural stopping point or the provided stop sequence. |
MAX_TOKENS | N/A | The token generation was stopped because the model reached the maximum number of tokens that was specified in the request. |
SAFETY | Configurable safety filter | The token generation was stopped because the response was flagged for safety reasons. |
RECITATION | Citation filter | The token generation stopped because of potential recitation. |
SPII | Non-configurable safety filter | The token generation was stopped because the response was flagged for Sensitive Personally Identifiable Information (SPII) content. |
PROHIBITED_CONTENT | Non-configurable safety filter | The token generation was stopped because the response was flagged for containing prohibited content, usually CSAM. |
FINISH_REASON_UNSPECIFIED | N/A | The finish reason is unspecified. |
OTHER | N/A | This enum refers to all other reasons that stop token generation. Note that token generation is not supported for all languages. For a list of supported languages, see Gemini language support. |
To learn more, see FinishReason.
If a filter blocks the response, it voids the response's Candidate.content
field. It does not provide any feedback to the model.
Configurable safety filters
Safety filters assess content against a list of harms. For each harm category, the safety filters assign one safety score based on the probability of the content being unsafe and another safety score based on the severity of harmful content.
The configurable safety filters don't have versioning independent of model versions. Google won't update the configurable safety filter for a previously released version of a model. However, it may update the configurable safety filter for a future version of a model.
Harm categories
Safety filters assess content based on the following harm categories:
Harm Category | Definition |
---|---|
Hate Speech | Negative or harmful comments targeting identity and/or protected attributes. |
Harassment | Threatening, intimidating, bullying, or abusive comments targeting another individual. |
Sexually Explicit | Contains references to sexual acts or other lewd content. |
Dangerous Content | Promotes or enables access to harmful goods, services, and activities. |
Comparison of probability scores and severity scores
The probability safety score reflects the likelihood that a model response
is associated with the respective harm. It has an
associated confidence score between 0.0
and 1.0
, rounded to one decimal place.
The confidence score is discretized into four safety-confidence levels:
NEGLIGIBLE
, LOW
, MEDIUM
, and HIGH
.
The severity safety score reflects the magnitude of how harmful a model
response might be. It has an associated severity score ranging from
0.0
to 1.0
, rounded to one decimal place. The severity score is discretized
into four levels: NEGLIGIBLE
, LOW
, MEDIUM
, and HIGH
.
Content can have a low probability score and a high severity score, or a high probability score and a low severity score.
How to configure safety filters
You can use the Vertex AI Gemini API or the Google Cloud console to configure the safety filter.
Vertex AI Gemini API
The Vertex AI Gemini API provides two "harm block" methods:
- SEVERITY: This method uses both probability and severity scores.
- PROBABILITY: This method uses the probability score only.
The default method is SEVERITY
. For models older than gemini-1.5-flash
and
gemini-1.5-pro
, the default method is
PROBABILITY
. To learn more, see
HarmBlockMethod API reference.
The Vertex AI Gemini API provides the following "harm block" thresholds:
- BLOCK_LOW_AND_ABOVE: Block when the probability score or the severity
score is
LOW
,MEDIUM
orHIGH
. - BLOCK_MEDIUM_AND_ABOVE: Block when the probability score or the severity
score is
MEDIUM
orHIGH
. Forgemini-1.5-flash-001
andgemini-1.5-pro-001
,BLOCK_MEDIUM_AND_ABOVE
is the default value. - BLOCK_ONLY_HIGH: Block when the probability score or the severity score
is
HIGH
. - HARM_BLOCK_THRESHOLD_UNSPECIFIED: Block using the default threshold.
- OFF: No automated response blocking and no safety metadata is returned.
For
gemini-1.5-flash-002
andgemini-1.5-pro-002
,OFF
is the default value. - BLOCK_NONE: The
BLOCK_NONE
safety setting removes automated response blocking. Instead, you can configure your own safety guidelines with the returned scores. This is a restricted field that isn't available to all users in GA model versions.
For example, the following Python code demonstrates how you can set the harm
block threshold to BLOCK_ONLY_HIGH
for the dangerous content category:
generative_models.SafetySetting(
category=generative_models.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
threshold=generative_models.HarmBlockThreshold.BLOCK_ONLY_HIGH,
),
This will block most of the content that is classified as dangerous content. To learn more, see HarmBlockThreshold API reference.
For end-to-end examples in Python, Node.js, Java, Go, C# and REST, see Examples of safety filter configuration.
Google Cloud console
The Google Cloud console lets you configure a threshold for each safety attribute. The safety filter uses only the probability scores. There is no option to use the severity scores.
The Google Cloud console provides the following threshold values:
- Off (default): No automated response blocking.
- Block few: Block when the probability score is
HIGH
. - Block some: Block when the probability score is
MEDIUM
orHIGH
. - Block most: Block when the probability score is
LOW
,MEDIUM
orHIGH
.
For example, if you set the block setting to Block few for the
Dangerous Content category, everything that has a high probability of
being dangerous content is blocked. Anything with a lower probability is
allowed. The default threshold is Block some
.
To set the thresholds, see the following steps:
In the Vertex AI section of the Google Cloud console, go to the Vertex AI Studio page.
Under Create a new prompt, click any of the buttons to open the prompt design page.
Click Safety settings.
The Safety settings dialog window opens.
For each harm category, configure the desired threshold value.
Click Save.
Example output when a response is blocked by the configurable safety filter
The following is an example of Vertex AI Gemini API output when a response is blocked by the configurable safety filter for containing dangerous content:
{ "candidates": [{ "finishReason": "SAFETY", "safetyRatings": [{ "category": "HARM_CATEGORY_HATE_SPEECH", "probability": "NEGLIGIBLE", "probabilityScore": 0.11027937, "severity": "HARM_SEVERITY_LOW", "severityScore": 0.28487435 }, { "category": "HARM_CATEGORY_DANGEROUS_CONTENT", "probability": "HIGH", "blocked": true, "probabilityScore": 0.95422274, "severity": "HARM_SEVERITY_MEDIUM", "severityScore": 0.43398145 }, { "category": "HARM_CATEGORY_HARASSMENT", "probability": "NEGLIGIBLE", "probabilityScore": 0.11085559, "severity": "HARM_SEVERITY_NEGLIGIBLE", "severityScore": 0.19027223 }, { "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "probability": "NEGLIGIBLE", "probabilityScore": 0.22901751, "severity": "HARM_SEVERITY_NEGLIGIBLE", "severityScore": 0.09089675 }] }], "usageMetadata": { "promptTokenCount": 38, "totalTokenCount": 38 } }
Examples of safety filter configuration
The following examples demonstrate how you can configure the safety filter using the Vertex AI Gemini API:
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Go
Before trying this sample, follow the Go setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Go API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
C#
Before trying this sample, follow the C# setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI C# API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
REST
After you set up your environment, you can use REST to test a text prompt. The following sample sends a request to the publisher model endpoint.
Before using any of the request data, make the following replacements:
- LOCATION: The region to process the request. Available
options include the following:
Click to expand a partial list of available regions
us-central1
us-west4
northamerica-northeast1
us-east4
us-west1
asia-northeast3
asia-southeast1
asia-northeast1
- PROJECT_ID: Your project ID.
- MODEL_ID: The model ID of the multimodal model
that you want to use. The options are:
gemini-1.0-pro
gemini-1.0-pro-vision
- ROLE:
The role in a conversation associated with the content. Specifying a role is required even in
singleturn use cases.
Acceptable values include the following:
USER
: Specifies content that's sent by you.MODEL
: Specifies the model's response.
- TEXT: The text instructions to include in the prompt.
- SAFETY_CATEGORY:
The safety category to configure a threshold for. Acceptable values include the following:
Click to expand safety categories
HARM_CATEGORY_SEXUALLY_EXPLICIT
HARM_CATEGORY_HATE_SPEECH
HARM_CATEGORY_HARASSMENT
HARM_CATEGORY_DANGEROUS_CONTENT
- THRESHOLD:
The threshold for blocking responses that could belong to the specified safety category based on
probability. Acceptable values include the following:
Click to expand blocking thresholds
BLOCK_NONE
BLOCK_ONLY_HIGH
BLOCK_MEDIUM_AND_ABOVE
(default)BLOCK_LOW_AND_ABOVE
BLOCK_LOW_AND_ABOVE
blocks the most whileBLOCK_ONLY_HIGH
blocks the least.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:streamGenerateContent
Request JSON body:
{ "contents": { "role": "ROLE", "parts": { "text": "TEXT" } }, "safetySettings": { "category": "SAFETY_CATEGORY", "threshold": "THRESHOLD" }, }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:streamGenerateContent"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:streamGenerateContent" | Select-Object -Expand Content
You should receive a JSON response similar to the following.
Example curl command
LOCATION="us-central1"
MODEL_ID="gemini-1.0-pro"
PROJECT_ID="test-project"
curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:streamGenerateContent -d \
$'{
"contents": {
"role": "user",
"parts": { "text": "Hello!" }
},
"safety_settings": [
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "OFF"
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_LOW_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_ONLY_HIGH"
}
]
}'
Citation filter
The generative code features of Vertex AI are intended to produce original content. By design, Gemini limits the likelihood that existing content is replicated at length. If a Gemini feature does make an extensive quotation from a web page, Gemini cites that page.
Sometimes the same content can be found on multiple web pages. Gemini attempts to point you to a popular source. In the case of citations to code repositories, the citation might also reference an applicable open source license. Complying with any license requirements is your own responsibility.
To learn about the metadata of the citation filter, see the Citation API reference.
Civic integrity filter
The civic integrity filter detects and blocks prompts that mention or relate to
political elections and candidates. This filter is disabled by default. To turn
it on, set the blocking threshold for CIVIC_INTEGRITY
to any of the following
values. It doesn't make a difference which value you specify.
BLOCK_LOW_AND_ABOVE
BLOCK_MEDIUM_AND_ABOVE
BLOCK_ONLY_HIGH
The following Python code shows you how to turn on the civic integrity filter:
generative_models.SafetySetting(
category=generative_models.HarmCategory.CIVIC_INTEGRITY,
threshold=generative_models.HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
),
For more details about the civic integrity filter, contact your Google Cloud representative.
Best practices
While safety filters help prevent unsafe content, they might occasionally block safe content or miss unsafe content. Advanced models like Gemini 1.5 Flash and Gemini 1.5 Pro are designed to generate safe responses even without filters. Test different filter settings to find the right balance between safety and allowing appropriate content.
What's next
- Learn about system instructions for safety.
- Learn about abuse monitoring.
- Learn more about responsible AI.
- Learn about data governance.