Safety attribute confidence scoring
Content processed through the Vertex AI Gemini API is assessed against a list of safety attributes, which include harmful categories" and topics that can be considered sensitive. Those safety attributes are denoted in the following table:
Safety attribute scoring
Safety Attribute | Definition |
---|---|
Hate Speech | Negative or harmful comments targeting identity and/or protected attributes. |
Harassment | Malicious, intimidating, bullying, or abusive comments targeting another individual. |
Sexually Explicit | Contains references to sexual acts or other lewd content. |
Dangerous Content | Promotes or enables access to harmful goods, services, and activities. |
Safety attribute probabilities
Each safety attribute has an associated confidence score between 0.0 and 1.0, rounded to one decimal place. The confidence score reflects the likelihood of the input or response belonging to a given category.
The confidence score in the following table is returned with a safety-confidence level:
Probability | Description |
---|---|
NEGLIGIBLE |
Content has a negligible probability of being unsafe. |
LOW |
Content has a low probability of being unsafe. |
MEDIUM |
Content has a medium probability of being unsafe. |
HIGH |
Content has a high probability of being unsafe. |
Safety attribute severity
Each of the four safety attributes is assigned a safety rating (severity level) and a severity score ranging from 0.0 to 1.0, rounded to one decimal place. The ratings and scores in the following table reflect the predicted severity of the content belonging to a given category:
Severity | Description |
---|---|
NEGLIGIBLE |
Content severity is predicted as negligible in respect to Google's safety policy. |
LOW |
Content severity is predicted as low in respect to Google's safety policy. |
MEDIUM |
Content severity is predicted as medium in respect to Google's safety policy. |
HIGH |
Content severity is predicted as high in respect to Google's safety policy. |
Probability scores compared to severity scores
There are two types of safety scores:
- Safety scores based on probability of being unsafe
- Safety scores based on severity of harmful content
The probability safety attribute reflects the likelihood that an input or model response is associated with the respective safety attribute. The severity safety attribute reflects the magnitude of how harmful an input or model response might be.
Content can have a low probability score and a high severity score, or a high probability score and a low severity score. For example, consider the following two sentences:
- The robot punched me.
- The robot slashed me up.
The first sentence might cause a higher probability of being unsafe and the second sentence might have a higher severity in terms of violence. Because of this, it's important to carefully test and consider the appropriate level of blocking required to support your key use cases and also minimize harm to end users.
Safety settings
Safety settings are part of the request you send to the API service. It can be adjusted for each request you make to the API. The following table describes the block settings you can adjust for each category. For example, if you set the block setting to Block few for the Dangerous Content category, everything that has a high probability of being dangerous content is blocked. But anything with a lower probability is allowed. If not set, the default block setting is Block some.
Threshold (Studio) | Threshold (API) | Threshold (Description) |
---|---|---|
BLOCK_NONE (Restricted) |
Always show regardless of probability of unsafe content. | |
Block few | BLOCK_ONLY_HIGH |
Block when high probability of unsafe content. |
Block some (Default) | BLOCK_MEDIUM_AND_ABOVE (Default) |
Block when medium or high probability of unsafe content. |
Block most | BLOCK_LOW_AND_ABOVE |
Block when low probability of unsafe content. |
HARM_BLOCK_THRESHOLD_UNSPECIFIED |
Threshold is unspecified, block using default threshold. |
You can change these settings for each request that you make to the text service. See the HarmBlockThreshold API reference for details.
How to remove automated response blocking for select safety attributes
The BLOCK_NONE
safety setting removes automated response blocking (for the
safety attributes described in Safety Settings) and lets you configure
your own safety guidelines with the returned scores. To access
the BLOCK_NONE
setting, you can:
Apply for the allowlist through the Gemini safety filter allowlist form, or
Switch your account type to monthly invoiced billing with the Google Cloud invoiced billing reference.
Key differences between Gemini and other model families
While the same safety classifiers are applied to Gemini and PaLM, the number of safety attributes returned in the API might vary across different model families. The blocking logic (ie. confidence threshold) is based on rigorous evaluation against each model. Therefore, a safety setting that is applied to one model may not perfectly match the behavior of a safety setting applied to a different model. If this is a concern, we recommend that you configure your own blocking logic with raw severity scores and raw confidence scores, applying the same scoring thresholds across models.
Configure thresholds
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Go
Before trying this sample, follow the Go setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Go API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
C#
Before trying this sample, follow the C# setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI C# API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
REST
Before using any of the request data, make the following replacements:
- LOCATION: The region to process the request. Available
options include the following:
Click to expand available regions
us-central1
us-west4
northamerica-northeast1
us-east4
us-west1
asia-northeast3
asia-southeast1
asia-northeast1
- PROJECT_ID: Your project ID.
- MODEL_ID: The model ID of the multimodal model
that you want to use. The options are:
gemini-1.0-pro
gemini-1.0-pro-vision
- ROLE:
The role in a conversation associated with the content. Specifying a role is required even in
singleturn use cases.
Acceptable values include the following:
USER
: Specifies content that's sent by you.MODEL
: Specifies the model's response.
- TEXT: The text instructions to include in the prompt.
- SAFETY_CATEGORY:
The safety category to configure a threshold for. Acceptable values include the following:
Click to expand safety categories
HARM_CATEGORY_SEXUALLY_EXPLICIT
HARM_CATEGORY_HATE_SPEECH
HARM_CATEGORY_HARASSMENT
HARM_CATEGORY_DANGEROUS_CONTENT
- THRESHOLD:
The threshold for blocking responses that could belong to the specified safety category based on
probability. Acceptable values include the following:
Click to expand blocking thresholds
BLOCK_NONE
BLOCK_ONLY_HIGH
BLOCK_MEDIUM_AND_ABOVE
(default)BLOCK_LOW_AND_ABOVE
BLOCK_LOW_AND_ABOVE
blocks the most whileBLOCK_ONLY_HIGH
blocks the least.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:streamGenerateContent
Request JSON body:
{ "contents": { "role": "ROLE", "parts": { "text": "TEXT" } }, "safety_settings": { "category": "SAFETY_CATEGORY", "threshold": "THRESHOLD" }, }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:streamGenerateContent"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:streamGenerateContent" | Select-Object -Expand Content
You should receive a JSON response similar to the following.
Example curl command
LOCATION="us-central1"
MODEL_ID="gemini-1.0-pro"
PROJECT_ID="test-project"
curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:streamGenerateContent -d \
$'{
"contents": {
"role": "user",
"parts": { "text": "Hello!" }
},
"safety_settings": [
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_NONE"
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_LOW_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_ONLY_HIGH"
}
]
}'
Console
In the Vertex AI section of the Google Cloud console, go to the Vertex AI Studio page.
Under Create a new prompt, click any of the buttons to open the prompt design page.
Click Safety settings.
The Safety settings dialog window opens.
For each safety attribute, configure the desired threshold value.
Click Save.
What's next
- Learn more about responsible AI.
- Learn about data governance.