Responsible AI

Large language models (LLMs) can translate language, summarize text, generate creative writing, generate code, power chatbots and virtual assistants, and complement search engines and recommendation systems. At the same time, as an early-stage technology, its evolving capabilities and uses create potential for misapplication, misuse, and unintended or unforeseen consequences. Large language models can generate output that you don't expect, including text that's offensive, insensitive, or factually incorrect.

What's more, the incredible versatility of LLMs is also what makes it difficult to predict exactly what kinds of unintended or unforeseen outputs they might produce. Given these risks and complexities, the PaLM API is designed with Google's AI Principles in mind. However, it is important for developers to understand and test their models to deploy safely and responsibly. To aid developers, the Generative AI Studio has built-in content filtering, and the PaLM API has safety attribute scoring to help customers test Google's safety filters and define confidence thresholds that are right for their use case and business. Refer to the Safety filters and attributes section to learn more.

When the PaLM API is integrated into a customer's unique use case and context, additional responsible AI considerations and PaLM API limitations may need to be considered. We encourage customers to leverage fairness, interpretability, privacy and security recommended practices.

Safety filters and attributes

The following topics show you how to use safety filters and attributes to use Generative AI responsibly.

Generative AI Studio

Fallback responses

If the model responds to a request with a scripted response like "I'm not able to help with that, as I'm only a language model," it means that either the input or the output is triggering a safety filter. If you feel that a safety filter is being inappropriately triggered, please click Report Inappropriate Responses on the Generative AI Studio Overview page in the Google Cloud console to report the issue.

Safety filter threshold

With the adjustable safety filter threshold, you can adjust how likely you are to see responses that could be harmful. Model responses are blocked based on the probability that it contains violent, sexual, toxic, or derogatory content. The safety filter setting is located on the right side of the prompt box in Generative AI Studio. You can choose from three options: block more, block some, and block less.

Vertex AI PaLM API

Safety attribute confidence scoring

Content processed through the Vertex AI PaLM API is assessed against a list of safety attributes, which include "harmful categories" and topics that may be considered sensitive.

Each safety attribute has an associated confidence score between 0.0 and 1.0, rounded to one decimal place, reflecting the likelihood of the input or response belonging to a given category.

Sample response

{
  "predictions": [
    {
      "safetyAttributes": {
        "categories": [
          "Derogatory",
          "Toxic",
          "Violent",
          "Sexual",
          "Insult",
          "Profanity",
          "Death, Harm & Tragedy",
          "Firearms & Weapons",
          "Public Safety",
          "Health",
          "Religion & Belief",
          "Illicit Drugs",
          "War & Conflict",
          "Politics",
          "Finance",
          "Legal"
        ],
        "scores": [
          0.1,
          0.1,
          0.1,
          0.1,
          0.1,
          0.1,
          0.1,
          0.1,
          0.1,
          0.1,
          0.1,
          0.1,
          0.1,
          0.1,
          0.1,
          0.1,
        ],
        "blocked": false
      },
      "content": "<>"
    }
  ]
}

Note: Categories with a score that rounds to 0.0 are omitted in the response. This sample response is for illustrative purposes only.

Safety attribute descriptions

Safety Attribute Description
Derogatory Negative or harmful comments targeting identity and/or protected attributes.
Toxic Content that is rude, disrespectful, or profane.
Sexual Contains references to sexual acts or other lewd content.
Violent Describes scenarios depicting violence against an individual or group, or general descriptions of gore.
Insult Insulting, inflammatory, or negative comment towards a person or a group of people.
Profanity Obscene or vulgar language such as cursing.
Death, Harm & Tragedy Human deaths, tragedies, accidents, disasters, and self-harm.
Firearms & Weapons Content that mentions knives, guns, personal weapons, and accessories such as ammunition, holsters, etc.
Public Safety Services and organizations that provide relief and ensure public safety.
Health Human health, including: Health conditions, diseases, and disorders Medical therapies, medication, vaccination, and medical practices Resources for healing, including support groups.
Religion & Belief Belief systems that deal with the possibility of supernatural laws and beings; religion, faith, belief, spiritual practice, churches, and places of worship. Includes astrology and the occult.
Illicit Drugs Recreational and illicit drugs; drug paraphernalia and cultivation, headshops, etc. Includes medicinal use of drugs typically used recreationally (e.g. marijuana).
War & Conflict War, military conflicts, and major physical conflicts involving large numbers of people. Includes discussion of military services, even if not directly related to a war or conflict.
Finance Consumer and business financial services, such as banking, loans, credit, investing, insurance, etc.
Politics Political news and media; discussions of social, governmental, and public policy.
Legal Law-related content, to include: law firms, legal information, primary legal materials, paralegal services, legal publications and technology, expert witnesses, litigation consultants, and other legal service providers.

Safety thresholds

Safety thresholds are in place for the following safety attributes:

  • Derogatory
  • Toxic
  • Sexual
  • Violent

Google blocks model responses that exceed the designated confidence scores for these safety attributes. To request the ability to modify a safety threshold, contact your Google Cloud account team.

Testing your Confidence Thresholds

You can test Google's safety filters and define confidence thresholds that are right for your business. By using these thresholds, you can take comprehensive measures to detect content that violates Google's usage policies or terms of service and take appropriate action.

The confidence scores are predictions only, and you should not depend on the scores for reliability or accuracy. Google is not responsible for interpreting or using these scores for business decisions.

Important: Probability vs Severity

The PaLM API safety filters confidence scores are based on the probability of content being unsafe and not the severity. This is important to consider because some content can have low probability of being unsafe even though the severity of harm could still be high. For example, comparing the sentences:

  1. The robot punched me.
  2. The robot slashed me up.

Sentence 1 might cause a higher probability of being unsafe but you might consider sentence 2 to be a higher severity in terms of violence.

Given this, it is important customers carefully test and consider what the appropriate level of blocking is needed to support their key use cases while minimizing harm to end users.

Citation metadata

Our generative code features are intended to produce original content and not replicate existing content at length. We've designed our systems to limit the chances of this occurring, and continuously improve how these systems function. If these features do directly quote at length from a webpage, they cite that page.

Sometimes the same content may be found on multiple webpages and we attempt to point you to a popular source. In the case of citations to code repositories, the citation may also reference an applicable open source license. Complying with any license requirements is your responsibility.

Sample citation metadata

{
  "predictions": [
    {
      "safetyAttributes": {
        "scores": [],
        "categories": [],
        "blocked": false
      },
      "content": "Shall I compare thee to a summer's day?\nThou art more lovely and more temperate.\nRough winds do shake the darling buds of May,\nAnd summer's lease hath all too short a date.\n\nSometime too hot the eye of heaven shines,\nAnd often is his gold complexion dimm'd;\nAnd every fair from fair sometime declines,\nBy chance or nature's changing course, untrimm'd.\n\nBut thy eternal summer shall not fade,\nNor lose possession of that fair thou ow'st,\nNor shall death brag thou wanderest in his shade,\nWhen in eternal lines to time thou grow'st.\n\nSo long as men can breathe or eyes can see,\nSo long lives this and this gives life to thee.",
      "citationMetadata": {
        "citations": [
          {
            "endIndex": 262,
            "publicationDate": "1800",
            "startIndex": 0,
            "title": ""The" Royal Shakspere"
          },
          {
            "title": "Sabrinae corolla in hortulis regiae scholae Salopiensis contextuerunt tres viri floribus legendis ...",
            "publicationDate": "1801",
            "startIndex": 140,
            "endIndex": 417
          },
          {
            "startIndex": 302,
            "publicationDate": "1800",
            "title": ""The" Royal Shakspere",
            "endIndex": 429
          },
          {
            "startIndex": 473,
            "publicationDate": "1847",
            "title": "The Poems of William Shakspeare",
            "endIndex": 618
          }
        ]
      }
    }
  ]
}

Metadata description

The following table describes the citation metadata.

Metadata Description

startIndex

Index in the response where the citation starts (inclusive). Must be greater than or equal to 0 and less than the value of endIndex.

endIndex

Index in the prediction output where the citation ends (exclusive). Must be greater than startIndex and less than the length of the response.

url

URL associated with this citation. If present, this URL links to the source webpage of this citation.

title

Title associated with this citation. If present, it refers to the title of the source of this citation.

license

License associated with this citation. If present, it refers to the automatically detected license of the source of this citation. Possible licenses include open source licenses.

publicationDate

Publication date associated with this citation. If present, it refers to the date at which the source of this citation was published. Possible formats are YYYY, YYYY-MM, and YYYY-MM-DD.

PaLM API limitations

Limitations you may encounter when using the PaLM API include (but are not limited to):

  • Edge Cases: Edge cases refer to unusual, rare, or exceptional situations that are not well-represented in the training data. These cases can lead to limitations in the performance of the PaLM API, such as model overconfidence, misinterpretation of context, or inappropriate outputs.

  • Model Hallucinations, Grounding, and Factuality: The PaLM API may lack grounding and factuality in real-world knowledge, physical properties, or accurate understanding. This limitation can lead to model hallucinations, which refer to instances where it may generate outputs that are plausible-sounding but factually incorrect, irrelevant, inappropriate, or nonsensical.

  • Data Quality and Tuning: The quality, accuracy, and bias of the prompt and/or data inputted into the PaLM API can have a significant impact on its performance. If users enter inaccurate or incorrect data and/or prompts, the PaLM API may have suboptimal performance or false model outputs.

  • Bias Amplification: Language models can inadvertently amplify existing biases in their training data, leading to outputs that may further reinforce societal prejudices and unequal treatment of certain groups.

  • Language Quality: While the PaLM API yields impressive multilingual capabilities on the benchmarks we evaluated against, the majority of our benchmarks (including all of fairness evaluations) are in the English language.1

    • Language models may provide inconsistent service quality to different users. For example, text generation might not be as effective for some dialects or language varieties due to underrepresentation in the training data. Performance may be worse for non-English languages or English language varieties with less representation.
  • Fairness benchmarks and subgroups: Google Research's fairness analyses of the PaLM API do not provide an exhaustive account of the various potential risks. For instance, we focus on biases along gender, race, ethnicity and religion axes, but perform the analysis only on the English language data and model outputs.1

  • Limited Domain Expertise: The PaLM API may lack the depth of knowledge required to provide accurate and detailed responses on highly specialized or technical topics, leading to superficial or incorrect information. For specialized, complex use cases, the PaLM API should be tuned on domain-specific data, and there must be meaningful human supervision in contexts with the potential to materially impact individual rights.

  • Length and structure of inputs and outputs: The PaLM API has a maximum input token limit of 8k and output token limit of 1k. If the input or output exceeds this limit our safety classifiers will not be applied, which could ultimately lead to poor model performance. While the PaLM API is designed to handle a wide range of text formats, their performance can be affected if the input data has an unusual or complex structure.

To utilize this technology safely and responsibly, it is also important to consider other risks specific to your use case, users, and business context in addition to built-in technical safeguards.

We recommend taking the following steps:

  1. Assess your application's security risks.
  2. Consider adjustments to mitigate safety risks.
  3. Perform safety testing appropriate to your use case.
  4. Solicit user feedback and monitor content.

Additional resources