Moderate text

Text moderation analyzes a document against a list of safety attributes, which include "harmful categories" and topics that may be considered sensitive. To moderate the text in a document, call the moderateText method.

A complete list of categories returned for the moderateText method are found here:

Safety Attribute	Description
Toxic	Content that is rude, disrespectful, or unreasonable.
Derogatory	Negative or harmful comments targeting identity and/or protected attributes.
Violent	Describes scenarios depicting violence against an individual or group, or general descriptions of gore.
Sexual	Contains references to sexual acts or other lewd content.
Insult	Insulting, inflammatory, or negative comment towards a person or a group of people.
Profanity	Obscene or vulgar language such as cursing.
Death, Harm & Tragedy	Human deaths, tragedies, accidents, disasters, and self-harm.
Firearms & Weapons	Content that mentions knives, guns, personal weapons, and accessories such as ammunition, holsters, etc.
Public Safety	Services and organizations that provide relief and ensure public safety.
Health	Human health, including: Health conditions, diseases, and disorders Medical therapies, medication, vaccination, medical practices, and resources for healing, including support groups.
Religion & Belief	Belief systems that deal with the possibility of supernatural laws and beings; religion, faith, belief, spiritual practice, churches, and places of worship. Includes astrology and the occult.
Illicit Drugs	Recreational and illicit drugs; drug paraphernalia and cultivation, headshops, etc. Includes medicinal use of drugs typically used recreationally (e.g. marijuana).
War & Conflict	War, military conflicts, and major physical conflicts involving large numbers of people. Includes discussion of military services, even if not directly related to a war or conflict.
Finance	Consumer and business financial services, such as banking, loans, credit, investing, and insurance.
Politics	Political news and media; discussions of social, governmental, and public policy.
Legal	Law-related content, including law firms, legal information, primary legal materials, paralegal services, legal publications and technology, expert witnesses, litigation consultants, and other legal service providers.

Safety attribute confidence scores

Each safety attribute has an associated confidence score between 0.00 and 1.00, reflecting the likelihood of the input or response belonging to a given category.

Sample response

{
  "moderationCategories": [
    {
      "name": "Toxic",
      "confidence": 0.10
    },
    {
      "name": "Insult",
      "confidence": 0.12
    },
    {
      "name": "Profanity",
      "confidence": 0.07
    },
    {
      "name": "Derogatory",
      "confidence": 0.04
    },
    {
      "name": "Sexual",
      "confidence": 0.00
    },
    {
      "name": "Death, Harm & Tragedy",
      "confidence": 0.00
    },
    {
      "name": "Violent",
      "confidence": 0.00
    },
    {
      "name": "Firearms & Weapons",
      "confidence": 0.00
    },
    {
      "name": "Public Safety",
      "confidence": 0.01
    },
    {
      "name": "Health",
      "confidence": 0.01
    },
    {
      "name": "Religion & Belief",
      "confidence": 0.00
    },
    {
      "name": "Illicit Drugs",
      "confidence": 0.01
    },
    {
      "name": "War & Conflict",
      "confidence": 0.02
    },
    {
      "name": "Politics",
      "confidence": 0.01
    },
    {
      "name": "Finance",
      "confidence": 0.00
    },
    {
      "name": "Legal",
      "confidence": 0.00
    }
  ]
}

Test your confidence thresholds

You can test Google's safety filters and define confidence thresholds that are right for your business. By using these thresholds, you can take comprehensive measures to detect content that violates Google's usage policies or terms of service and take appropriate action.

The confidence scores are only predictions. You shouldn't depend on the scores for reliability or accuracy. Google is not responsible for interpreting or using these scores for business decisions.

Difference between probability and severity

The confidence scores indicate that the content belongs to the specified category and not the severity. This is important to consider because some content can have low probability of being unsafe even though the severity of harm could still be high. For example, comparing the sentences:

The robot punched me.
The robot slashed me up.

Sentence 1 might cause a higher probability of being unsafe but you might consider sentence 2 to be a higher severity in terms of violence.

Therefore, it is important for you to carefully test and consider what the appropriate level of blocking is for your use cases while minimizing harm to end users.

Language support

Language	ISO-639-1	Support
English	es	yes
Spanish	es	yes
Portuguese	pt	yes
French	fr	yes
Italian	it	yes
German	de	yes
Chinese	zh	yes
Japanese	ja	yes
Korean	ko	yes
Dutch	nl	limited
Russian	ru	limited
Hindi	hi	limited
Indonesian	id	limited
Arabic	ar	limited

How to moderate text

This section demonstrates how to moderate text in a document. You need to submit a separate request for each document.

Here is an example of moderating text provided as a string:

Protocol

To moderate content from a document, make a POST request to the documents:moderateText REST method and provide the appropriate request body as shown in the following example.

The example uses the gcloud auth application-default print-access-token command to obtain an access token for a service account set up for the project using the Google Cloud Platform gcloud CLI. For instructions on installing the gcloud CLI, setting up a project with a service account see the Quickstart.

curl -X POST \
     -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
     -H "Content-Type: application/json; charset=utf-8" \
     --data "{
  'document':{
    'type':'PLAIN_TEXT',
    'content':'Shut up!'
  }
}" "https://language.googleapis.com/v1/documents:moderateText"

Moderate text from Cloud Storage

Here is an example of moderating text stored in a text file on Cloud Storage:

Protocol

To moderate text from a document stored in Cloud Storage, make a POST request to the documents:moderateText REST method and provide the appropriate request body with the path to the document as shown in the following example.

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
 https://language.googleapis.com/v1/documents:moderateText -d "{
  'document':{
    'type':'PLAIN_TEXT',
    'gcsContentUri':'gs://<bucket-name>/<object-name>'
  }
}"