This page describes the safety and content filters available in the Gemini API in Vertex AI and shows you how to configure them. This page covers the following topics: Google's generative AI models, like Gemini 2.5 Flash, are designed with safety as a priority. However, they can still generate harmful responses, especially when explicitly prompted. To enhance safety and minimize misuse, you can configure content filters to block potentially harmful responses by adjusting blocking thresholds. Safety filters act as a barrier to prevent harmful output, but they don't directly influence the model's behavior. To learn more about guiding the model's output, see System instructions for safety. The Gemini API in Vertex AI provides one of the following To learn more, see BlockedReason. The following is an example of Gemini API in Vertex AI output when a prompt is blocked for containing The following filters can detect and block potentially unsafe responses: A large language model (LLM) generates responses in units of text called tokens. A model stops generating tokens because it reaches a natural stopping point or because one of the filters blocks the response. The Gemini API in Vertex AI provides one of the following To learn more, see FinishReason. If a filter blocks a response, the Content filters assess content against a list of harms. For each harm category, the content filters assign one score based on the probability of the content being harmful and another score based on the severity of harmful content. Configurable content filters are tied to model versions and don't have independent versioning. Google doesn't update the content filter for a previously released model version, but might update it for future model versions. Content filters assess content based on the following harm categories: Content filters use the following two types of scores: Content can have a low probability score and a high severity score, or vice versa. You can use the Gemini API in Vertex AI or the Google Cloud console to configure content filters. Gemini API in Vertex AI The Gemini API in Vertex AI provides two "harm block" methods: The default method is The Gemini API in Vertex AI provides the following "harm block" thresholds: For example, the following Python code demonstrates how you can set the harm
block threshold to This setting blocks content that is classified as dangerous content with a For end-to-end examples in Python, Node.js, Java, Go, C#, and REST, see Examples of content filter configuration. Google Cloud console In the Google Cloud console, you can configure a threshold for each content attribute. The content filter uses only the probability scores. There is no option to use the severity scores. The Google Cloud console provides the following threshold values: For example, if you set the block setting to Block few for the Dangerous Content category, content that has a high probability of being dangerous is blocked. Anything with a lower probability is allowed. The default threshold is To set the thresholds, follow these steps: In the Vertex AI section of the Google Cloud console, go to the Vertex AI Studio page. Under Create a new prompt, click any of the buttons to open the prompt design page. Click Safety settings. The Safety settings dialog opens. For each harm category, configure the desired threshold value. Click Save. The following is an example of Gemini API in Vertex AI output when a response is blocked by the configurable content filter for containing dangerous content: The following examples demonstrate how you can configure the content filter using the Gemini API in Vertex AI:
To learn more, see the
SDK reference documentation.
Set environment variables to use the Gen AI SDK with Vertex AI:
After you
set up your environment,
you can use REST to test a text prompt. The following sample sends a request to the publisher
model endpoint.
Before using any of the request data,
make the following replacements:
Click to expand a partial list of available regions Click to expand safety categories Click to expand blocking thresholds
HTTP method and URL:
Request JSON body:
To send your request, choose one of these options:
Save the request body in a file named
Save the request body in a file named You should receive a JSON response similar to the following.
Unsafe prompts
<abbr data-title="An enumerated type, which is a data type consisting of a set of named values called elements, members, or enumerators of the type.">enum</abbr>
codes to explain why a prompt was rejected:
Enum
Filter type
Description
PROHIBITED_CONTENT
Non-configurable safety filter
The prompt was blocked because it was flagged for containing prohibited content, usually CSAM.
BLOCKED_REASON_UNSPECIFIED
N/A
The reason for blocking the prompt is unspecified.
OTHER
N/A
This enum refers to all other reasons for blocking a prompt. For a list of supported languages, see Gemini language support.
PROHIBITED_CONTENT
:{
"promptFeedback": {
"blockReason": "PROHIBITED_CONTENT"
},
"usageMetadata": {
"promptTokenCount": 7,
"totalTokenCount": 7
}
}
Unsafe responses
Filter Type
Description
Configurability
Non-configurable safety filters
Blocks highly sensitive content like CSAM and PII.
Always on; cannot be changed.
Configurable content filters
Blocks content based on harm categories (for example, Hate Speech, Dangerous Content).
You can set blocking thresholds.
Citation filters
Adds citations when the model quotes extensively from a source.
Always on; cannot be changed.
enum
codes to explain why token generation stopped:
Enum
Filter type
Description
STOP
N/A
This enum indicates that the model reached a natural stopping point or the provided stop sequence.
MAX_TOKENS
N/A
The token generation was stopped because the model reached the maximum number of tokens that was specified in the request.
SAFETY
Configurable content filter
The token generation was stopped because the response was flagged for harmful content.
RECITATION
Citation filter
The token generation stopped because of potential recitation.
SPII
Non-configurable safety filter
The token generation was stopped because the response was flagged for Sensitive Personally Identifiable Information (SPII) content.
PROHIBITED_CONTENT
Non-configurable safety filter
The token generation was stopped because the response was flagged for containing prohibited content, usually CSAM.
FINISH_REASON_UNSPECIFIED
N/A
The finish reason is unspecified.
OTHER
N/A
This enum refers to all other reasons that stop token generation. For a list of supported languages, see Gemini language support.
Candidate.content
field in the response is empty. This action doesn't provide feedback to the model.Configurable content filters
Harm categories
Harm Category
Definition
Hate Speech
Negative or harmful comments targeting identity and/or protected attributes.
Harassment
Threatening, intimidating, bullying, or abusive comments targeting another individual.
Sexually Explicit
Contains references to sexual acts or other lewd content.
Dangerous Content
Promotes or enables access to harmful goods, services, and activities.
Comparison of probability scores and severity scores
0.0
to 1.0
, which is discretized into four levels: NEGLIGIBLE
, LOW
, MEDIUM
, and HIGH
.0.0
to 1.0
, which is discretized into four levels: NEGLIGIBLE
, LOW
, MEDIUM
, and HIGH
.How to configure content filters
Method
Scoring Used
Key Feature
Gemini API in Vertex AI
Probability and/or Severity scores.
Offers fine-grained control with two blocking methods (`SEVERITY`, `PROBABILITY`) and multiple threshold levels.
Google Cloud console
Probability scores only.
Provides a simpler, UI-based approach with four predefined threshold levels.
SEVERITY
. For models older than gemini-1.5-flash
and gemini-1.5-pro
, the default method is PROBABILITY
. To learn more, see HarmBlockMethod
API reference.
BLOCK_LOW_AND_ABOVE
: Block when the probability score or the severity score is LOW
, MEDIUM
or HIGH
.BLOCK_MEDIUM_AND_ABOVE
: Block when the probability score or the severity score is MEDIUM
or HIGH
.BLOCK_ONLY_HIGH
: Block when the probability score or the severity score is HIGH
.HARM_BLOCK_THRESHOLD_UNSPECIFIED
: Block using the default threshold.OFF
: No automated response blocking and no metadata is returned. For gemini-2.5-flash
and subsequent models, OFF
is the default value.BLOCK_NONE
: The BLOCK_NONE
setting removes
automated response blocking. Instead, you can configure your own content
guidelines with the returned scores. This is a restricted field that isn't
available to all users in GA model
versions.BLOCK_ONLY_HIGH
for the dangerous content category:generative_models.SafetySetting(
category=generative_models.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
threshold=generative_models.HarmBlockThreshold.BLOCK_ONLY_HIGH,
),
HIGH
probability or severity score. To learn more, see HarmBlockThreshold
API reference.
HIGH
.MEDIUM
or HIGH
.LOW
, MEDIUM
or HIGH
.Block some
.
Example output when a response is blocked by the configurable content filter
{
"candidates": [{
"finishReason": "SAFETY",
"safetyRatings": [{
"category": "HARM_CATEGORY_HATE_SPEECH",
"probability": "NEGLIGIBLE",
"probabilityScore": 0.11027937,
"severity": "HARM_SEVERITY_LOW",
"severityScore": 0.28487435
}, {
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"probability": "HIGH",
"blocked": true,
"probabilityScore": 0.95422274,
"severity": "HARM_SEVERITY_MEDIUM",
"severityScore": 0.43398145
}, {
"category": "HARM_CATEGORY_HARASSMENT",
"probability": "NEGLIGIBLE",
"probabilityScore": 0.11085559,
"severity": "HARM_SEVERITY_NEGLIGIBLE",
"severityScore": 0.19027223
}, {
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"probability": "NEGLIGIBLE",
"probabilityScore": 0.22901751,
"severity": "HARM_SEVERITY_NEGLIGIBLE",
"severityScore": 0.09089675
}]
}],
"usageMetadata": {
"promptTokenCount": 38,
"totalTokenCount": 38
}
}
Examples of content filter configuration
Python
Install
pip install --upgrade google-genai
# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=global
export GOOGLE_GENAI_USE_VERTEXAI=True
REST
us-central1
us-west4
northamerica-northeast1
us-east4
us-west1
asia-northeast3
asia-southeast1
asia-northeast1
gemini-2.5-flash
.
USER
: Specifies content that's sent by you.MODEL
: Specifies the model's response.
HARM_CATEGORY_SEXUALLY_EXPLICIT
HARM_CATEGORY_HATE_SPEECH
HARM_CATEGORY_HARASSMENT
HARM_CATEGORY_DANGEROUS_CONTENT
BLOCK_NONE
BLOCK_ONLY_HIGH
BLOCK_MEDIUM_AND_ABOVE
(default)BLOCK_LOW_AND_ABOVE
BLOCK_LOW_AND_ABOVE
blocks the most while BLOCK_ONLY_HIGH
blocks the least.
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:streamGenerateContent
{
"contents": {
"role": "ROLE",
"parts": { "text": "TEXT" }
},
"safetySettings": {
"category": "SAFETY_CATEGORY",
"threshold": "THRESHOLD"
},
}
curl
request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:streamGenerateContent"PowerShell
request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:streamGenerateContent" | Select-Object -Expand ContentExample curl command
LOCATION="us-central1"
MODEL_ID="gemini-2.5-flash"
PROJECT_ID="test-project"
curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:streamGenerateContent -d \
$'{
"contents": {
"role": "user",
"parts": { "text": "Hello!" }
},
"safety_settings": [
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "OFF"
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_LOW_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_ONLY_HIGH"
}
]
}'
Citation filter
The generative code features of Vertex AI are designed to produce original content. By design, Gemini limits the likelihood that existing content is replicated at length. If a Gemini feature makes an extensive quotation from a web page, Gemini cites that page.
Sometimes the same content can be found on multiple web pages. Gemini points you to a popular source. In the case of citations to code repositories, the citation might also reference an applicable open source license. You are responsible for complying with any license requirements.
To learn about the metadata of the citation filter, see the Citation API reference.
Civic integrity filter
The civic integrity filter detects and blocks prompts that mention or relate to political elections and candidates. This filter is disabled by default. To turn it on, set the blocking threshold for CIVIC_INTEGRITY
to any of the following values. It doesn't make a difference which value you specify.
BLOCK_LOW_AND_ABOVE
BLOCK_MEDIUM_AND_ABOVE
BLOCK_ONLY_HIGH
The following Python code shows you how to turn on the civic integrity filter:
generative_models.SafetySetting(
category=generative_models.HarmCategory.CIVIC_INTEGRITY,
threshold=generative_models.HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
),
For more details about the civic integrity filter, contact your Google Cloud representative.
Best practices
While content filters help prevent unsafe content, they might occasionally block benign content or miss harmful content. Advanced models like Gemini 2.5 Flash are designed to generate safe responses even without filters. Test different filter settings to find the right balance between safety and allowing appropriate content.
What's next
- Learn about system instructions for safety.
- Learn about abuse monitoring.
- Learn more about responsible AI.
- Learn how to process blocked responses.