AI & Machine Learning

Improving Trust in AI and Online Communities with PaLM-based Moderation

September 7, 2023

Colby Hawker

Product Manager, Vertex AI

Emmanouil Koukoumidis

Senior Software Engineering Manager

To empower developers to identify sensitive content in a rapidly changing media environment, we are excited to announce Text Moderation powered by PaLM 2, available through the Cloud Natural Language API. Built in collaboration with Jigsaw and Google Research, Text Moderation helps organizations scan for sensitive or harmful content. Here are some examples of how the Text Moderation service can be used:

Brand Safety: Protect against user-generated content and publisher content that are considered not “brand safe” for the advertiser
User protection: Scan for potentially offensive or harmful content
Generative AI risk mitigation: Help safeguard against the generation of inappropriate content in outputs from generative models

Promote brand safety

Brand safety is a set of procedures that aim to protect the reputation and trustworthiness of a brand in the digital age. One of the biggest risks to brand safety is the content that ads are associated with; if an ad appears on a website that contains content that does not conform with the sponsoring brand’s values, it can reflect poorly on the brand and organization, so it’s important for companies to identify and remove content that isn’t aligned with brand guidelines or consistent with the brand.

Text Moderation can be used by our customers to identify content that they determine is offensive or harmful, sensitive in context, or otherwise inappropriate for their brand. Once an organization has identified this content, teams can take steps to remove it from advertising campaigns or prevent it from being associated with the brand in the future, helping ensure that advertising campaigns are effective and that the brand is associated with positive and trustworthy content.

Protect users from harmful content

Digital media platforms, gaming publishers, and online marketplaces all have a vested interest in mitigating the risks of user-generated content. They want to provide a safe and welcoming environment for their users while also maintaining an open and free exchange of ideas. Text Moderation can help them achieve this goal, using artificial neural networks to detect and remove harmful content, such as harassment or abuse. These efforts can help reduce harm, improve customer experience, and increase customer retention.

Mitigate risks of generative models

Over the last year, progress in AI has enabled software to more reliably generate text, images, and video, leading to new products and services that use machine learning, including text generators, to create content. However, with any AI content generation, there is a risk of producing offensive material, even inadvertently.

To address this risk, we have trained and evaluated the Text Moderation service on real prompts and responses from large generative models. Text Moderation is versatile and covers a broad range of content types, making it a powerful tool for protecting users from harmful content.

Getting started with Text Moderation using the Natural Language API

Text Moderation is powered by Google’s latest PaLM 2 foundation model to identify a wide range of harmful content, including hate speech, bullying, and sexual harassment. Easy to use and integrate with existing systems, the API can be accessed from almost any programming language to return confidence scores across 16 different “safety attributes.”

Visit the Natural Language AI website to give it a try and refer to the “Text Moderation” page for details. You may also try out the Text Moderation codelab here.

Posted in