Gemini for Google Cloud and responsible AI

This document describes how Gemini is designed in view of the capabilities, limitations, and risks that are associated with generative AI.

Capabilities and risks of large language models

Large language models (LLMs) can perform many useful tasks such as the following:

  • Translate language.
  • Summarize text.
  • Generate code and creative writing.
  • Power chatbots and virtual assistants.
  • Complement search engines and recommendation systems.

At the same time, the evolving technical capabilities of LLMs create the potential for misapplication, misuse, and unintended or unforeseen consequences.

LLMs can generate output that you don't expect, including text that's offensive, insensitive, or factually incorrect. Because LLMs are incredibly versatile, it can be difficult to predict exactly what kinds of unintended or unforeseen outputs they might produce.

Given these risks and complexities, Gemini for Google Cloud is designed with Google's AI principles in mind. However, it's important for users to understand some of the limitations of Gemini for Google Cloud to work safely and responsibly.

Gemini for Google Cloud limitations

Some of the limitations that you might encounter using Gemini for Google Cloud include (but aren't limited to) the following:

  • Edge cases. Edge cases refer to unusual, rare, or exceptional situations that aren't well represented in the training data. These cases can lead to limitations in the output of Gemini, such as model overconfidence, misinterpretation of context, or inappropriate outputs.

  • Model hallucinations, grounding, and factuality. Gemini for Google Cloud might lack grounding and factuality in real-world knowledge, physical properties, or accurate understanding. This limitation can lead to model hallucinations, where Gemini might generate outputs that are plausible-sounding but factually incorrect, irrelevant, inappropriate, or nonsensical. Hallucinations can also include fabricating links to web pages that don't exist and have never existed. For more information, see Write better prompts for Gemini for Google Cloud.

  • Data quality and tuning. The quality, accuracy, and bias of the prompt data that's entered into Gemini can have a significant impact on its performance. If users enter inaccurate or incorrect prompts, Gemini might return suboptimal or false responses.

  • Bias amplification. Language models can inadvertently amplify existing biases in their training data, leading to outputs that might further reinforce societal prejudices and unequal treatment of certain groups.

  • Language quality. While Gemini yields impressive multilingual capabilities on the benchmarks that we evaluated against, the majority of our benchmarks (including all of the fairness evaluations) are in American English.

    Language models might provide inconsistent service quality to different users. For example, text generation might not be as effective for some dialects or language varieties because they are underrepresented in the training data. Performance might be worse for non-English languages or English language varieties with less representation.

  • Fairness benchmarks and subgroups. Google Research's fairness analyses of Gemini don't provide an exhaustive account of the various potential risks. For example, we focus on biases along gender, race, ethnicity, and religion axes, but perform the analysis only on the American English language data and model outputs.

  • Limited domain expertise. Gemini has been trained on Google Cloud technology, but it might lack the depth of knowledge that's required to provide accurate and detailed responses on highly specialized or technical topics, leading to superficial or incorrect information.

    When you use the Gemini pane in the Google Cloud console, Gemini is not context aware of your specific environment, so it cannot answer questions such as "When was the last time I created a VM?"

    In some cases, Gemini sends a specific segment of your context to the model to receive a context-specific response—for example, when you click the Troubleshooting suggestions button in the Error Reporting service page.

Gemini safety and toxicity filtering

Gemini for Google Cloud prompts and responses are checked against a comprehensive list of safety attributes as applicable for each use case. These safety attributes aim to filter out content that violates our Acceptable Use Policy. If an output is considered harmful, the response will be blocked.

What's next