Configure safety attributes

Large language models (LLM) might generate output that you don't expect, including text that's offensive, insensitive, or factually incorrect. To maintain safety and prevent misuse, Gemini uses safety filters to block prompts and responses that it determines to be potentially harmful.

This page describes each of the filter types and outlines key safety concepts. For configurable filters, it shows you how to configure the blocking thresholds of each safety attribute to control how often prompts and responses are blocked.

Safety filters act as a barrier, preventing harmful output, but they don't directly influence the model's behavior. To learn more about model steerability, see System instructions.

Unsafe prompts

The Vertex AI Gemini API provides one of the following enum codes to explain why a prompt was rejected:

Enum Filter type Description
PROHIBITED_CONTENT Non-configurable safety filter The prompt was blocked because it was flagged for containing the prohibited contents, usually CSAM.
BLOCKED_REASON_UNSPECIFIED N/A The reason for blocking the prompt is unspecified.
OTHER N/A This enum refers to all other reasons for blocking a prompt. Note that Vertex AI Gemini API does not support all languages. For a list of supported languages, see Gemini language support.

To learn more, see BlockedReason.

The following is an example of Vertex AI Gemini API output when a prompt is blocked for containing PROHIBITED_CONTENT:

{
  "promptFeedback": {
    "blockReason": "PROHIBITED_CONTENT"
  },
  "usageMetadata": {
    "promptTokenCount": 7,
    "totalTokenCount": 7
  }
}

Unsafe responses

To determine which responses are potentially unsafe, the Vertex AI Gemini API uses the following filters:

  • Non-configurable safety filters, which block child sexual abuse material (CSAM) and personally identifiable information (PII).
  • Configurable safety filters, which block unsafe content based on a list of safety attributes and their user-configured blocking thresholds. You can configure blocking thresholds for each of these attributes based on what is appropriate for your use case and business. To learn more, see Configurable safety filters.
  • Citation filters, which prevent misuse and ensure proper citation of copyrighted data. To learn more, see Citation filter.

An LLM generates responses in units of text called tokens. A model stops generating tokens because it reaches a natural stopping point or because one of the filters blocks the response. The Vertex AI Gemini API provides one of the following enum codes to explain why token generation stopped:

Enum Filter type Description
STOP N/A This enum indicates that the model reached a natural stopping point or the provided stop sequence.
MAX_TOKENS N/A The token generation was stopped because the model reached the maximum number of tokens that was specified in the request.
SAFETY Configurable safety filter The token generation was stopped because the response was flagged for safety reasons.
RECITATION Citation filter The token generation was stopped because the response was flagged for unauthorized citations.
SPII Non-configurable safety filter The token generation was stopped because the response was flagged for Sensitive Personally Identifiable Information (SPII) content.
PROHIBITED_CONTENT Non-configurable safety filter The token generation was stopped because the response was flagged for containing the prohibited contents, usually CSAM.
FINISH_REASON_UNSPECIFIED N/A The finish reason is unspecified.
OTHER N/A This enum refers to all other reasons that stop token generation. Note that token generation is not supported for all languages. For a list of supported languages, see Gemini language support.

To learn more, see FinishReason.

If a filter blocks the response, it voids the response's Candidate.content field. It does not provide any feedback to the model.

Configurable safety filters

Gemini assesses content against a list of safety attributes, which include harmful categories and topics that can be considered sensitive. For each attribute, the Vertex AI Gemini API assigns one safety score based on the probability of the content being unsafe and another safety score based on the severity of harmful content.

The configurable safety filter doesn't have versioning independent of model versions. Google won't update the configurable safety filter for a previously released version of a model. However, it may update the configurable safety filter for a future version of a model.

Safety attributes

Gemini assesses content based on the following safety attributes:

Safety Attribute Definition
Hate Speech Negative or harmful comments targeting identity and/or protected attributes.
Harassment Malicious, intimidating, bullying, or abusive comments targeting another individual.
Sexually Explicit Contains references to sexual acts or other lewd content.
Dangerous Content Promotes or enables access to harmful goods, services, and activities.

Comparison of probability scores and severity scores

The probability safety attribute reflects the likelihood that a model response is associated with the respective safety attribute. It has an associated confidence score between 0.0 and 1.0, rounded to one decimal place. The confidence score is discretized into four safety-confidence levels: NEGLIGIBLE, LOW, MEDIUM, and HIGH.

The severity safety attribute reflects the magnitude of how harmful a model response might be. It has an associated severity score ranging from 0.0 to 1.0, rounded to one decimal place. The severity score is discretized into four levels: NEGLIGIBLE, LOW, MEDIUM, and HIGH.

Content can have a low probability score and a high severity score, or a high probability score and a low severity score. For example, consider the following two sentences:

  1. The robot punched me.
  2. The robot slashed me up.

The first sentence might cause a higher probability of being unsafe and the second sentence might have a higher severity in terms of violence. Because of this, it's important to carefully test and consider the appropriate level of blocking required to support your key use cases and also minimize harm to end users.

How to configure the safety filter

You can use the Vertex AI Gemini API or the Google Cloud console to configure the safety filter.

Vertex AI Gemini API

The Vertex AI Gemini API provides two "harm block" methods:

  • SEVERITY: This method uses both probability and severity scores.
  • PROBABILITY: This method uses the probability score only.

The default method is PROBABILITY. To learn more, see HarmBlockMethod API reference.

The Vertex AI Gemini API provides the following "harm block" thresholds:

  • BLOCK_LOW_AND_ABOVE: Block when the probability score or the severity score is LOW, MEDIUM or HIGH.
  • BLOCK_MEDIUM_AND_ABOVE: Block when the probability score or the severity score is MEDIUM or HIGH.
  • BLOCK_ONLY_HIGH: Block when the probability score or the severity score is HIGH.
  • HARM_BLOCK_THRESHOLD_UNSPECIFIED: Block using the default threshold.
  • BLOCK_NONE (Restricted): The BLOCK_NONE safety setting removes automated response blocking and lets you configure your own safety guidelines with the returned scores. This is a restricted field that isn't available to all users in GA model versions. BLOCK_NONE is not supported for audio and video input when you're using Gemini 1.5 Flash or Gemini 1.5 Pro.

To access the BLOCK_NONE setting, you can:

  1. Apply for the allowlist through the Gemini safety filter allowlist form, or
  2. Switch your account type to monthly invoiced billing with the Google Cloud invoiced billing reference.

The default threshold is BLOCK_MEDIUM_AND_ABOVE. To learn more, see HarmBlockThreshold API reference.

For example, the following Python code demonstrates how you can set the harm block method to SEVERITY and the harm block threshold to BLOCK_LOW_AND_ABOVE for the dangerous content category:

generative_models.SafetySetting(
        category=generative_models.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
        method=generative_models.HarmBlockMethod.SEVERITY,
        threshold=generative_models.HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
    ),

This will block most of the content that is classified as dangerous content.

For end-to-end examples in Python, Node.js, Java, Go, C# and REST, see Examples of safety filter configuration.

Google Cloud console

The Google Cloud console lets you configure a threshold for each safety attribute. The safety filter uses only the probability scores. There is no option to use the severity scores.

The Google Cloud console provides the following threshold values:

  • Block few: Block when the probability score is HIGH.
  • Block some: Block when the probability score is MEDIUM or HIGH.
  • Block most: Block when the probability score is LOW, MEDIUM or HIGH.

For example, if you set the block setting to Block few for the Dangerous Content category, everything that has a high probability of being dangerous content is blocked. Anything with a lower probability is allowed. The default threshold is Block some.

To set the thresholds, see the following steps:

  1. In the Vertex AI section of the Google Cloud console, go to the Vertex AI Studio page.

    Go to Vertex AI Studio

  2. Under Create a new prompt, click any of the buttons to open the prompt design page.

  3. Click Safety settings.

    The Safety settings dialog window opens.

  4. For each safety attribute, configure the desired threshold value.

  5. Click Save.

Example output when a response is blocked by the configurable safety filter

The following is an example of Vertex AI Gemini API output when a response is blocked by the configurable safety filter for containing dangerous content:

{
  "candidates": [{
    "finishReason": "SAFETY",
    "safetyRatings": [{
      "category": "HARM_CATEGORY_HATE_SPEECH",
      "probability": "NEGLIGIBLE",
      "probabilityScore": 0.11027937,
      "severity": "HARM_SEVERITY_LOW",
      "severityScore": 0.28487435
    }, {
      "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
      "probability": "HIGH",
      "blocked": true,
      "probabilityScore": 0.95422274,
      "severity": "HARM_SEVERITY_MEDIUM",
      "severityScore": 0.43398145
    }, {
      "category": "HARM_CATEGORY_HARASSMENT",
      "probability": "NEGLIGIBLE",
      "probabilityScore": 0.11085559,
      "severity": "HARM_SEVERITY_NEGLIGIBLE",
      "severityScore": 0.19027223
    }, {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "probability": "NEGLIGIBLE",
      "probabilityScore": 0.22901751,
      "severity": "HARM_SEVERITY_NEGLIGIBLE",
      "severityScore": 0.09089675
    }]
  }],
  "usageMetadata": {
    "promptTokenCount": 38,
    "totalTokenCount": 38
  }
}

Examples of safety filter configuration

The following examples demonstrate how you can configure the safety filter using the Vertex AI Gemini API:

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.

import vertexai

from vertexai import generative_models

# TODO(developer): Update and un-comment below line
# project_id = "PROJECT_ID"

vertexai.init(project=project_id, location="us-central1")

model = generative_models.GenerativeModel(model_name="gemini-1.0-pro-vision-001")

# Generation config
generation_config = generative_models.GenerationConfig(
    max_output_tokens=2048, temperature=0.4, top_p=1, top_k=32
)

# Safety config
safety_config = [
    generative_models.SafetySetting(
        category=generative_models.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
        threshold=generative_models.HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
    ),
    generative_models.SafetySetting(
        category=generative_models.HarmCategory.HARM_CATEGORY_HARASSMENT,
        threshold=generative_models.HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
    ),
]

image_file = Part.from_uri(
    "gs://cloud-samples-data/generative-ai/image/scones.jpg", "image/jpeg"
)

# Generate content
responses = model.generate_content(
    [image_file, "What is in this image?"],
    generation_config=generation_config,
    safety_settings=safety_config,
    stream=True,
)

text_responses = []
for response in responses:
    print(response.text)
    text_responses.append(response.text)

Node.js

Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

const {
  VertexAI,
  HarmCategory,
  HarmBlockThreshold,
} = require('@google-cloud/vertexai');

/**
 * TODO(developer): Update these variables before running the sample.
 */
async function setSafetySettings(
  projectId = 'PROJECT_ID',
  location = 'us-central1',
  model = 'gemini-1.5-flash-001'
) {
  // Initialize Vertex with your Cloud project and location
  const vertexAI = new VertexAI({project: projectId, location: location});

  // Instantiate the model
  const generativeModel = vertexAI.getGenerativeModel({
    model: model,
    // The following parameters are optional
    // They can also be passed to individual content generation requests
    safety_settings: [
      {
        category: HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
        threshold: HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
      },
    ],
    generation_config: {
      max_output_tokens: 256,
      temperature: 0.4,
      top_p: 1,
      top_k: 16,
    },
  });

  const request = {
    contents: [{role: 'user', parts: [{text: 'Tell me something dangerous.'}]}],
  };

  console.log('Prompt:');
  console.log(request.contents[0].parts[0].text);
  console.log('Streaming Response Text:');

  // Create the response stream
  const responseStream = await generativeModel.generateContentStream(request);

  // Log the text response as it streams
  for await (const item of responseStream.stream) {
    if (item.candidates[0].finishReason === 'SAFETY') {
      console.log('This response stream terminated due to safety concerns.');
      break;
    } else {
      process.stdout.write(item.candidates[0].content.parts[0].text);
    }
  }
  console.log('This response stream terminated due to safety concerns.');
}

Java

Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

import com.google.cloud.vertexai.VertexAI;
import com.google.cloud.vertexai.api.Candidate;
import com.google.cloud.vertexai.api.GenerateContentResponse;
import com.google.cloud.vertexai.api.GenerationConfig;
import com.google.cloud.vertexai.api.HarmCategory;
import com.google.cloud.vertexai.api.SafetySetting;
import com.google.cloud.vertexai.generativeai.GenerativeModel;
import java.util.Arrays;
import java.util.List;

public class WithSafetySettings {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-google-cloud-project-id";
    String location = "us-central1";
    String modelName = "gemini-1.0-pro-vision-001";
    String textPrompt = "your-text-here";

    String output = safetyCheck(projectId, location, modelName, textPrompt);
    System.out.println(output);
  }

  // Use safety settings to avoid harmful questions and content generation.
  public static String safetyCheck(String projectId, String location, String modelName,
      String textPrompt) throws Exception {
    // Initialize client that will be used to send requests. This client only needs
    // to be created once, and can be reused for multiple requests.
    try (VertexAI vertexAI = new VertexAI(projectId, location)) {
      StringBuilder output = new StringBuilder();

      GenerationConfig generationConfig =
          GenerationConfig.newBuilder()
              .setMaxOutputTokens(2048)
              .setTemperature(0.4F)
              .setTopK(32)
              .setTopP(1)
              .build();

      List<SafetySetting> safetySettings = Arrays.asList(
          SafetySetting.newBuilder()
              .setCategory(HarmCategory.HARM_CATEGORY_HATE_SPEECH)
              .setThreshold(SafetySetting.HarmBlockThreshold.BLOCK_LOW_AND_ABOVE)
              .build(),
          SafetySetting.newBuilder()
              .setCategory(HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT)
              .setThreshold(SafetySetting.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE)
              .build()
      );

      GenerativeModel model = new GenerativeModel(modelName, vertexAI)
          .withGenerationConfig(generationConfig)
          .withSafetySettings(safetySettings);

      GenerateContentResponse response = model.generateContent(textPrompt);
      output.append(response).append("\n");

      // Verifies if the above content has been blocked for safety reasons.
      boolean blockedForSafetyReason = response.getCandidatesList()
          .stream()
          .anyMatch(candidate -> candidate.getFinishReason() == Candidate.FinishReason.SAFETY);
      output.append("Blocked for safety reasons?: ").append(blockedForSafetyReason);

      return output.toString();
    }
  }
}

Go

Before trying this sample, follow the Go setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Go API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

import (
	"context"
	"fmt"
	"io"
	"mime"
	"path/filepath"

	"cloud.google.com/go/vertexai/genai"
)

// generateMultimodalContent generates a response into w, based upon the prompt
// and image provided.
func generateMultimodalContent(w io.Writer, prompt, image, projectID, location, modelName string) error {
	// prompt := "describe this image."
	// location := "us-central1"
	// model := "gemini-1.5-flash-001"
	// image := "gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg"
	ctx := context.Background()

	client, err := genai.NewClient(ctx, projectID, location)
	if err != nil {
		return fmt.Errorf("unable to create client: %v", err)
	}
	defer client.Close()

	model := client.GenerativeModel(modelName)
	model.SetTemperature(0.4)
	// configure the safety settings thresholds
	model.SafetySettings = []*genai.SafetySetting{
		{
			Category:  genai.HarmCategoryHarassment,
			Threshold: genai.HarmBlockLowAndAbove,
		},
		{
			Category:  genai.HarmCategoryDangerousContent,
			Threshold: genai.HarmBlockLowAndAbove,
		},
	}

	// Given an image file URL, prepare image file as genai.Part
	img := genai.FileData{
		MIMEType: mime.TypeByExtension(filepath.Ext(image)),
		FileURI:  image,
	}

	res, err := model.GenerateContent(ctx, img, genai.Text(prompt))
	if err != nil {
		return fmt.Errorf("unable to generate contents: %w", err)
	}

	fmt.Fprintf(w, "generated response: %s\n", res.Candidates[0].Content.Parts[0])
	return nil
}

C#

Before trying this sample, follow the C# setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI C# API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.


using Google.Api.Gax.Grpc;
using Google.Cloud.AIPlatform.V1;
using System.Text;
using System.Threading.Tasks;
using static Google.Cloud.AIPlatform.V1.SafetySetting.Types;

public class WithSafetySettings
{
    public async Task<string> GenerateContent(
        string projectId = "your-project-id",
        string location = "us-central1",
        string publisher = "google",
        string model = "gemini-1.5-flash-001"
    )
    {
        var predictionServiceClient = new PredictionServiceClientBuilder
        {
            Endpoint = $"{location}-aiplatform.googleapis.com"
        }.Build();


        var generateContentRequest = new GenerateContentRequest
        {
            Model = $"projects/{projectId}/locations/{location}/publishers/{publisher}/models/{model}",
            Contents =
            {
                new Content
                {
                    Role = "USER",
                    Parts =
                    {
                        new Part { Text = "Hello!" }
                    }
                }
            },
            SafetySettings =
            {
                new SafetySetting
                {
                    Category = HarmCategory.HateSpeech,
                    Threshold = HarmBlockThreshold.BlockLowAndAbove
                },
                new SafetySetting
                {
                    Category = HarmCategory.DangerousContent,
                    Threshold = HarmBlockThreshold.BlockMediumAndAbove
                }
            }
        };

        using PredictionServiceClient.StreamGenerateContentStream response = predictionServiceClient.StreamGenerateContent(generateContentRequest);

        StringBuilder fullText = new();

        AsyncResponseStream<GenerateContentResponse> responseStream = response.GetResponseStream();
        await foreach (GenerateContentResponse responseItem in responseStream)
        {
            // Check if the content has been blocked for safety reasons.
            bool blockForSafetyReason = responseItem.Candidates[0].FinishReason == Candidate.Types.FinishReason.Safety;
            if (blockForSafetyReason)
            {
                fullText.Append("Blocked for safety reasons");
            }
            else
            {
                fullText.Append(responseItem.Candidates[0].Content.Parts[0].Text);
            }
        }

        return fullText.ToString();
    }
}

REST

After you set up your environment, you can use REST to test a text prompt. The following sample sends a request to the publisher model endpoint.

Before using any of the request data, make the following replacements:

  • LOCATION: The region to process the request. Available options include the following:

    Click to expand a partial list of available regions

    • us-central1
    • us-west4
    • northamerica-northeast1
    • us-east4
    • us-west1
    • asia-northeast3
    • asia-southeast1
    • asia-northeast1
  • PROJECT_ID: Your project ID.
  • MODEL_ID: The model ID of the multimodal model that you want to use. The options are:
    • gemini-1.0-pro
    • gemini-1.0-pro-vision
  • ROLE: The role in a conversation associated with the content. Specifying a role is required even in singleturn use cases. Acceptable values include the following:
    • USER: Specifies content that's sent by you.
    • MODEL: Specifies the model's response.
  • TEXT: The text instructions to include in the prompt.
  • SAFETY_CATEGORY: The safety category to configure a threshold for. Acceptable values include the following:

    Click to expand safety categories

    • HARM_CATEGORY_SEXUALLY_EXPLICIT
    • HARM_CATEGORY_HATE_SPEECH
    • HARM_CATEGORY_HARASSMENT
    • HARM_CATEGORY_DANGEROUS_CONTENT
  • THRESHOLD: The threshold for blocking responses that could belong to the specified safety category based on probability. Acceptable values include the following:

    Click to expand blocking thresholds

    • BLOCK_NONE
    • BLOCK_ONLY_HIGH
    • BLOCK_MEDIUM_AND_ABOVE (default)
    • BLOCK_LOW_AND_ABOVE
    BLOCK_LOW_AND_ABOVE blocks the most while BLOCK_ONLY_HIGH blocks the least.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:streamGenerateContent

Request JSON body:

{
  "contents": {
    "role": "ROLE",
    "parts": { "text": "TEXT" }
  },
  "safety_settings": {
    "category": "SAFETY_CATEGORY",
    "threshold": "THRESHOLD"
  },
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:streamGenerateContent"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:streamGenerateContent" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

Example curl command

LOCATION="us-central1"
MODEL_ID="gemini-1.0-pro"
PROJECT_ID="test-project"

curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:streamGenerateContent -d \
$'{
  "contents": {
    "role": "user",
    "parts": { "text": "Hello!" }
  },
  "safety_settings": [
    {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "threshold": "BLOCK_NONE"
    },
    {
      "category": "HARM_CATEGORY_HATE_SPEECH",
      "threshold": "BLOCK_LOW_AND_ABOVE"
    },
    {
      "category": "HARM_CATEGORY_HARASSMENT",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    },
    {
      "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
      "threshold": "BLOCK_ONLY_HIGH"
    }
  ]
}'

Citation filter

The generative code features of Vertex AI are intended to produce original content. By design, Gemini limits the likelihood that existing content is replicated at length. If a Gemini feature does make an extensive quotation from a web page, Gemini cites that page.

Sometimes the same content can be found on multiple web pages. Gemini attempts to point you to a popular source. In the case of citations to code repositories, the citation might also reference an applicable open source license. Complying with any license requirements is your own responsibility.

To learn about the metadata of the citation filter, see the Citation API reference.

What's next