AI & Machine Learning

Enhance Gemini model security with content filters and system instructions

February 13, 2025

Salah Ahmed

Senior Product Manager, Google Cloud

Anand Iyer

Group Product Manager, Google Cloud

Try Gemini 3

Our most intelligent model is now available on Vertex AI and Gemini Enterprise

Try now

As organizations rush to adopt generative AI-driven chatbots and agents, it’s important to reduce the risk of exposure to threat actors who force AI models to create harmful content.

We want to highlight two powerful capabilities of Vertex AI that can help manage this risk — content filters and system instructions. Today, we’ll show how you can use them to ensure consistent and trustworthy interactions.

Content filters: Post-response defenses

By analyzing generated text and blocking responses that trigger specific criteria, content filters can help block the output of harmful content. They function independently from Gemini models as part of a layered defense against threat actors who attempt to jailbreak the model.

Gemini models on Vertex AI use two types of content filters:

Non-configurable safety filters automatically block outputs containing prohibited content, such as child sexual abuse material (CSAM) and personally identifiable information (PII).
Configurable content filters allow you to define blocking thresholds in four harm categories (hate speech, harassment, sexually explicit, and dangerous content,) based on probability and severity scores. These filters are default off but you can configure them according to your needs.

https://storage.googleapis.com/gweb-cloudblog-publish/images/RAI_diagram.max-2200x2200.jpg

It's important to note that, like any automated system, these filters can occasionally produce false positives, incorrectly flagging benign content. This can negatively impact user experience, particularly in conversational settings. System instructions (below) can help mitigate some of these limitations.

System instructions: Proactive model steering for custom safety

System instructions for Gemini models in Vertex AI provide direct guidance to the model on how to behave and what type of content to generate. By providing specific instructions, you can proactively steer the model away from generating undesirable content to meet your organization’s unique needs.

You can craft system instructions to define content safety guidelines, such as prohibited and sensitive topics, and disclaimer language, as well as brand safety guidelines to ensure the model's outputs align with your brand's voice, tone, values, and target audience.

System instructions have the following advantages over content filters:

You can define specific harms and topics you want to avoid, so you’re not restricted to a small set of categories.
You can be prescriptive and detailed. For example, instead of just saying “avoid nudity,” you can define what you mean by nudity in your cultural context and outline allowed exceptions.
You can iterate instructions to meet your needs. For example, if you notice that the instruction “avoid dangerous content” leads to the model being excessively cautious or avoiding a wider range of topics than intended, you can make the instruction more specific, such as “don’t generate violent content” or “avoid discussion of illegal drug use.”

However, system instructions have the following limitations:

They are theoretically more susceptible to zero-shot and other complex jailbreak techniques.
They can cause the model to be overly cautious on borderline topics.
In some situations, a complex system instruction for safety may inadvertently impact overall output quality.

We recommend using both content filters and system instructions.

Evaluate your safety configuration

You can create your own evaluation sets, and test model performance with your specific configurations ahead of time. We recommend creating separate harmful and benign sets, so you can measure how effective your configuration is at catching harmful content and how often it incorrectly blocks benign content.

Investing in an evaluation set can help reduce the time it takes to test the model when implementing changes in the future.

How to get started

Both content filters and system instructions play a role in ensuring safe and responsible use of Gemini. The best approach depends on your specific requirements and risk tolerance. To get started, check out content filters and system instructions for safety documentation.

Posted in

https://storage.googleapis.com/gweb-cloudblog-publish/images/112125a_HF1385_Social_Anthropic_Opus_4.5_v1c.max-700x700.jpg

AI & Machine Learning

Announcing Claude Opus 4.5 on Vertex AI

By Michael Gerstenhaber • 5-minute read

Business Intelligence

Looker and Looker Conversational Analytics extensions available in the Gemini CLI

By Mike DeAngelo • 3-minute read

Data Analytics

BigQuery AI: The convergence of data and AI is here

By Suda Srinivasan • 5-minute read

https://storage.googleapis.com/gweb-cloudblog-publish/images/Nano_Banana_Pro_blog_hero.max-700x700.png

AI & Machine Learning

Announcing Nano Banana Pro for every builder and business

By Michael Gerstenhaber • 7-minute read

Enhance Gemini model security with content filters and system instructions

Salah Ahmed

Anand Iyer

Try Gemini 3

Content filters: Post-response defenses

System instructions: Proactive model steering for custom safety

Evaluate your safety configuration

How to get started

Related articles

Announcing Claude Opus 4.5 on Vertex AI

Looker and Looker Conversational Analytics extensions available in the Gemini CLI

BigQuery AI: The convergence of data and AI is here

Announcing Nano Banana Pro for every builder and business