借助 Model Armor 模板,您可以配置 Model Armor 过滤提示和回答的方式。它们充当一组自定义过滤器和阈值,用于设置不同的安全性和安全性置信度,从而控制标记哪些内容。
阈值表示置信度。也就是说,Model Armor 对提示或回答包含冒犯性内容的置信度。例如,您可以创建一个模板,用于过滤包含仇恨内容的提示,并设置 HIGH 阈值,这意味着 Model Armor 会报告提示包含仇恨内容的高置信度。LOW_AND_ABOVE 阈值表示在做出相应声明时具有任何程度的置信度(LOW、MEDIUM 和 HIGH)。
Model Armor 过滤条件
Model Armor 提供各种过滤条件,可帮助您提供安全可靠的 AI 模型。以下是过滤条件类别。
Responsible AI 安全过滤条件
系统可以按照上述置信度级别,针对以下类别过滤提示和回答:
类别
定义
仇恨言论
针对身份和/或受保护属性的负面或有害评论。
骚扰
针对其他人的威胁、恐吓、欺凌或辱骂性评论。
露骨色情内容
包含对性行为或其他淫秽内容的引用。
危险内容
宣传或允许访问有害商品、服务和活动。
系统会默认应用儿童性虐待内容 (CSAM) 过滤条件,且无法关闭。
提示注入和越狱检测
提示注入是一种安全漏洞,攻击者会在文本输入(提示)中编写特殊命令,以欺骗 AI 模型。这可能会导致 AI 忽略其常规指令、泄露敏感信息,或执行其本不应该执行的操作。在 LLM 的背景下,越狱是指绕过模型内置的安全协议和道德准则的行为。这会导致 LLM 生成其最初设计时要避免的回答,例如有害、不道德和危险的内容。
Sensitive Data Protection 是一项 Google Cloud 服务,可帮助您发现、分类和去标识化敏感数据。Sensitive Data Protection 可以识别敏感元素、上下文和文档,帮助您降低 AI 工作负载中数据泄露的风险。您可以直接在 Model Armor 中使用敏感数据保护来转换、词元化和隐去敏感元素,同时保留非敏感上下文。
Model Armor 可以接受现有的检查模板,这些模板是充当蓝图的配置,可简化扫描和识别敏感数据的流程,以满足您的业务和合规性需求。这样一来,您就可以在其他使用 Sensitive Data Protection 的工作负载之间实现一致性和互操作性。
Model Armor 提供两种敏感数据保护配置模式:
基本 Sensitive Data Protection 配置:此模式提供了一种更简单的方式来配置 Sensitive Data Protection,即直接指定要扫描的敏感数据类型。它支持六个类别,分别是 CREDIT_CARD_NUMBER、US_SOCIAL_SECURITY_NUMBER、FINANCIAL_ACCOUNT_NUMBER、US_INDIVIDUAL_TAXPAYER_IDENTIFICATION_NUMBER、GCP_CREDENTIALS、GCP_API_KEY。基本配置仅允许执行检查操作,不支持使用 Sensitive Data Protection 模板。如需了解详情,请参阅基本 Sensitive Data Protection 配置。
高级 Sensitive Data Protection 配置:此模式支持使用 Sensitive Data Protection 模板,从而提供更高的灵活性和自定义程度。Sensitive Data Protection 模板是预定义的配置,可让您指定更精细的检测规则和去标识化技术。高级配置支持检查和去标识化操作。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-09-05。"],[],[],null,["This page provides information about the key concepts for\nModel Armor.\n\nModel Armor templates\n\nModel Armor templates let you configure how Model Armor\nscreens prompts and responses. They function as sets of customized filters and\nthresholds for different safety and security confidence levels, allowing control\nover what content is flagged.\n\nThe thresholds represent confidence levels. That is, how confident Model Armor\nis about the prompt or response including offending content. For example, you\ncan create a template that filters prompts for hateful content with a `HIGH`\nthreshold, meaning Model Armor reports high confidence that the prompt\ncontains hateful content. A `LOW_AND_ABOVE` threshold indicates any level of\nconfidence (`LOW`, `MEDIUM`, and `HIGH`) in making that claim.\n\nModel Armor filters\n\nModel Armor offers a variety of filters to help you provide safe and\nsecure AI models. Here's a breakdown of the filter categories.\n\nResponsible AI safety filter\n\nPrompts and responses can be screened at the aforementioned confidence levels\nfor the following categories:\n\n| Category | Definition |\n|-------------------|----------------------------------------------------------------------------------------|\n| Hate Speech | Negative or harmful comments targeting identity and/or protected attributes. |\n| Harassment | Threatening, intimidating, bullying, or abusive comments targeting another individual. |\n| Sexually Explicit | Contains references to sexual acts or other lewd content. |\n| Dangerous Content | Promotes or enables access to harmful goods, services, and activities. |\n\nThe child sexual abuse material (CSAM) filter is applied by default and\ncannot be turned off.\n\nPrompt injection and jailbreak detection\n\nPrompt injection is a security vulnerability where attackers craft special\ncommands within the text input (the prompt) to trick an AI model. This can\nmake the AI ignore its usual instructions, reveal sensitive information, or\nperform actions it wasn't designed to do. Jailbreaking in the context of LLMs\nrefers to the act of bypassing the safety protocols and ethical guidelines that\nare built into the model. This allows the LLM to generate responses that it was\noriginally designed to avoid, such as harmful, unethical, and dangerous content.\n\nWhen prompt injection and jailbreak detection is enabled, Model Armor\nscans prompts and responses for malicious content. If it is detected,\nModel Armor blocks the prompt or response.\n\nSensitive Data Protection\n\nSensitive data, like a person's name or address, may inadvertently or\nintentionally be sent to a model or provided in a model's response.\n\nSensitive Data Protection is a Google Cloud service to help you discover,\nclassify, and de-identify sensitive data. Sensitive Data Protection\ncan identify sensitive elements, context, and documents to help you reduce the risk of data leakage going into and\nout of AI workloads. You can use Sensitive Data Protection\ndirectly within Model Armor to transform, tokenize, and redact sensitive elements while retaining non-sensitive context.\nModel Armor can accept existing inspection templates,\nwhich are configurations that act like blueprints to streamline the process of\nscanning and identifying sensitive data specific to your business and compliance\nneeds. This way, you can have consistency and interoperability between other\nworkloads that use Sensitive Data Protection.\n\nModel Armor offers two modes for Sensitive Data Protection\nconfiguration:\n\n- Basic Sensitive Data Protection configuration: This mode provides a simpler\n way to configure Sensitive Data Protection by directly specifying the types\n of sensitive data to scan for. It supports six categories, which are,\n `CREDIT_CARD_NUMBER`, `US_SOCIAL_SECURITY_NUMBER`, `FINANCIAL_ACCOUNT_NUMBER`,\n `US_INDIVIDUAL_TAXPAYER_IDENTIFICATION_NUMBER`, `GCP_CREDENTIALS`, `GCP_API_KEY`.\n Basic configuration only allows for inspection operations and does not support\n the use of Sensitive Data Protection templates. For more information, see\n [Basic Sensitive Data Protection configuration](/security-command-center/docs/sanitize-prompts-responses#basic_sdp_configuration).\n\n- Advanced Sensitive Data Protection configuration: This mode offers more\n flexibility and customization by enabling the use of Sensitive Data Protection\n templates. Sensitive Data Protection templates are predefined configurations\n that allow you to specify more granular detection rules and de-identification\n techniques. Advanced configuration supports both inspection and de-identification\n operations.\n\nWhile confidence levels can be set for Sensitive Data Protection, they operate\nin a slightly different way than confidence levels for other filters. For more\ninformation about confidence levels for Sensitive Data Protection, see\n[Sensitive Data Protection match likelihood](/sensitive-data-protection/docs/likelihood).\nFor more information about Sensitive Data Protection in general, see\n[Sensitive Data Protection overview](/sensitive-data-protection/docs/sensitive-data-protection-overview).\n\nMalicious URL detection\n\nMalicious URLs are often disguised to look legitimate, making them a potent tool\nfor phishing attacks, malware distribution, and other online threats. For\nexample, if a PDF contains an embedded malicious URL, it can be used to\ncompromise any downstream systems processing LLM outputs.\n\nWhen malicious URL detection is enabled, Model Armor scans URLs\nto identify if they're malicious. This lets you to take action and prevent\nmalicious URLs from being returned.\n\nModel Armor confidence levels\n\nConfidence levels can be set for responsible AI safety categories (that is, Sexually Explicit,\nDangerous, Harassment, and Hate Speech), Prompt Injection and Jailbreak, and Sensitive\nData Protection (including topicality).\n| **Note:** While confidence levels can be set for Sensitive Data Protection, they operate in a slightly different way than confidence levels for other filters. For more information about confidence levels for Sensitive Data Protection, see [Sensitive Data Protection match likelihood](/sensitive-data-protection/docs/likelihood).\n\nFor confidence levels that allow granular thresholds, Model Armor\ninterprets them as follows:\n\n- High: Identify if the message has content with a high likelihood.\n- Medium and above: Identify if the message has content with a medium or high likelihood.\n- Low and above: Identify if the message has content with a low, medium, or high likelihood.\n\n| **Note:** Confidence levels are applicable only to [prompt injection and jailbreak detection](#ma-prompt-injection) and [responsible AI safety filters](#ma-responsible-ai-safety-categories).\n\nDefine the enforcement type\n\nEnforcement defines what happens after a violation is detected. To configure how\nModel Armor handles detections, you set the enforcement type.\nModel Armor offers the following enforcement types:\n\n- **Inspect only**: It inspects requests that violate the configured settings, but it doesn't block them.\n- **Inspect and block**: It blocks requests that violate the configured settings.\n\nTo effectively use `Inspect only` and gain valuable insights, enable Cloud Logging.\nWithout Cloud Logging enabled, `Inspect only` won't yield any useful information.\n\nAccess your logs through Cloud Logging. Filter by the service name\n`modelarmor.googleapis.com`. Look for entries related to the operations that you\nenabled in your template. For more information, see\n[View logs by using the Logs Explorer](/logging/docs/view/logs-explorer-interface).\n\nPDF screening\n\nText in PDFs can include malicious and sensitive content. Model Armor\ncan screen PDFs for safety, prompt injection and jailbreak attempts, sensitive data,\nand malicious URLs.\n\nModel Armor floor settings\n\nWhile Model Armor templates provide flexibility for individual\napplications, organizations often need to establish a baseline level of\nprotection across all their AI applications. This is where Model Armor\nfloor settings are used. They act as rules that dictate minimum requirements\nfor all templates created at a specific point in the Google Cloud resource\nhierarchy (that is, at an organization, folder, or project level).\n\nFor more information, see [Model Armor floor settings](/security-command-center/docs/model_armor_floor_settings).\n\nWhat's next\n\n- Learn about [Model Armor overview](/security-command-center/docs/model-armor-overview).\n- Learn about [Model Armor templates](/security-command-center/docs/manage-model-armor-templates).\n- Learn about [Model Armor floor settings](/security-command-center/docs/model_armor_floor_settings).\n- [Sanitize prompts and responses](/security-command-center/docs/sanitize-prompts-responses).\n- Learn about [Model Armor audit logging](/security-command-center/docs/audit-logging-model-armor).\n- [Troubleshoot Model Armor issues](/security-command-center/docs/troubleshooting#ma)."]]