使用 Gemini 生成图片

Gemini 2.5 Flash Image 支持多种模态的回答生成,包括文本和图片。

图片生成

Gemini 2.5 Flash Image (gemini-2.5-flash-image) 支持生成图片和文本。这可扩展 Gemini 的功能,使其能够执行以下操作:

  • 通过自然语言对话迭代生成图片,在调整图片的同时保持一致性和上下文。
  • 生成具有高质量长文本渲染效果的图片。
  • 生成图文交织的输出。例如,在单个对话轮次中包含文本和图片的博文。以前,这需要将多个模型串联在一起。
  • 利用 Gemini 的世界知识和推理能力生成图片。

在此公开实验版中,Gemini 2.5 Flash Image 可以生成 1,024 像素的图片,支持生成人物图片,并包含更新的安全过滤条件,可提供更灵活、限制更少的用户体验。

它支持以下模态和功能:

  • 文本到图像

    • 提示示例:“生成一张以烟花为背景的埃菲尔铁塔图片。”
  • 文生图(文本渲染)

    • 提示示例:“generate a cinematic photo of a large building with this giant text projection mapped on the front of the building: "Gemini 2.5 can now generate long form text"”
  • 文生图和文本(交织)

    • 提示示例:“生成一份图文并茂的海鲜饭食谱。在生成食谱时,与文本一起创建图片。”
    • 提示示例:“生成一个关于狗狗的故事,采用 3D 卡通动画风格。为每个场景生成一张图片”
  • 图片和文生图及文本(交织)

    • 提示示例:(附带一张带家具的房间的照片)“我的空间还适合放置哪些颜色的沙发?您可以更新图片吗?”
  • 支持根据语言区域生成图片

    • 提示示例:“Generate an image of a breakfast meal.”

最佳实践

如需改善图片生成效果,请遵循以下最佳实践:

  • 内容要具体:提供的信息越详细,您就越能掌控结果。例如,不要使用“奇幻盔甲”,而要使用“华丽的精灵板甲,蚀刻有银叶图案,带有高领和猎鹰翅膀形状的肩甲”。

  • 提供背景信息和意图:说明图片的用途,帮助模型了解背景信息。例如,“为高端极简护肤品牌设计徽标”的效果要好于“设计徽标”。

  • 迭代和优化:不要期望第一次尝试就能生成完美的图片。使用后续提示进行小幅更改,例如“让光线更暖一些”或“让角色的表情更严肃一些”。

  • 使用分步说明:对于复杂的场景,请将请求拆分为多个步骤。例如,“首先,创建一个背景,其中包含清晨宁静而雾气缭绕的森林。然后,在前景中添加一个长满苔藓的古老石制祭坛。最后,在祭坛上放置一把发光的剑。”

  • 描述您想要的内容,而不是不想要的内容:不要说“没有汽车”,而要积极地描述场景,例如“一条空旷的街道,没有交通迹象”。

  • 控制摄像头:引导摄像头视图。使用摄影和电影术语来描述构图,例如“广角镜头”“微距镜头”或“低角度透视”。

  • 图片提示:使用“创建一张…的图片”或“生成一张…的图片”等短语来描述意图。否则,多模态模型可能会以文本而非图片的形式做出回答。

限制:

使用 Gemini 2.5 Flash Image 生成图片存在以下限制:

  • 为获得最佳性能,请使用以下语言:英语、西班牙语(墨西哥)、日语(日本)、中文(中国)、印地语(印度)。

  • 图片生成不支持音频或视频输入。

  • 模型可能不会生成您要求的确切数量的图片。

  • 为获得最佳效果,请在输入中最多添加三张图片。

  • 生成包含文字的图片时,先生成文字,然后生成包含该文字的图片。

  • 在以下情况下,图片或文字生成功能可能无法按预期运行:

    • 模型可能只能生成文本。如果您想要图片,请在请求中明确要求生成图片。例如,“在您操作过程中提供图片”。

    • 模型可能会以图片形式生成文本。如需生成文本,请明确要求输出文本。例如,“生成叙事文本及插图”。

    • 即使模型尚未完成生成,也可能会停止生成内容。如果出现这种情况,请重试或使用其他提示。

生成图片

以下部分介绍了如何使用 Vertex AI Studio 或 API 生成图片。

如需了解提示方面的指南和最佳实践,请参阅设计多模态提示

控制台

如需使用图片生成,请执行以下操作:

  1. 依次打开 Vertex AI Studio > 创建提示
  2. 点击切换模型,然后从菜单中选择 gemini-2.5-flash-image
  3. 输出面板中,从下拉菜单中选择图片和文本
  4. 编写提示文本区域中,撰写要生成的图片的说明。
  5. 点击提示 () 按钮。

Gemini 将根据您的说明生成图片。此过程应需要几秒钟,但可能会相对较慢,具体取决于容量。

Python

安装

pip install --upgrade google-genai

如需了解详情,请参阅 SDK 参考文档

设置环境变量以将 Gen AI SDK 与 Vertex AI 搭配使用:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=global
export GOOGLE_GENAI_USE_VERTEXAI=True

from google import genai
from google.genai.types import GenerateContentConfig, Modality
from PIL import Image
from io import BytesIO

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash-image",
    contents=("Generate an image of the Eiffel tower with fireworks in the background."),
    config=GenerateContentConfig(
        response_modalities=[Modality.TEXT, Modality.IMAGE],
        candidate_count=1,
        safety_settings=[
            {"method": "PROBABILITY"},
            {"category": "HARM_CATEGORY_DANGEROUS_CONTENT"},
            {"threshold": "BLOCK_MEDIUM_AND_ABOVE"},
        ],
    ),
)
for part in response.candidates[0].content.parts:
    if part.text:
        print(part.text)
    elif part.inline_data:
        image = Image.open(BytesIO((part.inline_data.data)))
        image.save("output_folder/example-image-eiffel-tower.png")
# Example response:
#   I will generate an image of the Eiffel Tower at night, with a vibrant display of
#   colorful fireworks exploding in the dark sky behind it. The tower will be
#   illuminated, standing tall as the focal point of the scene, with the bursts of
#   light from the fireworks creating a festive atmosphere.

Node.js

安装

npm install @google/genai

如需了解详情,请参阅 SDK 参考文档

设置环境变量以将 Gen AI SDK 与 Vertex AI 搭配使用:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=global
export GOOGLE_GENAI_USE_VERTEXAI=True

const fs = require('fs');
const {GoogleGenAI, Modality} = require('@google/genai');

const GOOGLE_CLOUD_PROJECT = process.env.GOOGLE_CLOUD_PROJECT;
const GOOGLE_CLOUD_LOCATION =
  process.env.GOOGLE_CLOUD_LOCATION || 'us-central1';

async function generateContent(
  projectId = GOOGLE_CLOUD_PROJECT,
  location = GOOGLE_CLOUD_LOCATION
) {
  const client = new GoogleGenAI({
    vertexai: true,
    project: projectId,
    location: location,
  });

  const response = await client.models.generateContentStream({
    model: 'gemini-2.5-flash-image',
    contents:
      'Generate an image of the Eiffel tower with fireworks in the background.',
    config: {
      responseModalities: [Modality.TEXT, Modality.IMAGE],
    },
  });

  const generatedFileNames = [];
  let imageIndex = 0;
  for await (const chunk of response) {
    const text = chunk.text;
    const data = chunk.data;
    if (text) {
      console.debug(text);
    } else if (data) {
      const fileName = `generate_content_streaming_image_${imageIndex++}.png`;
      console.debug(`Writing response image to file: ${fileName}.`);
      try {
        fs.writeFileSync(fileName, data);
        generatedFileNames.push(fileName);
      } catch (error) {
        console.error(`Failed to write image file ${fileName}:`, error);
      }
    }
  }

  return generatedFileNames;
}

Java

了解如何安装或更新 Java

如需了解详情,请参阅 SDK 参考文档

设置环境变量以将 Gen AI SDK 与 Vertex AI 搭配使用:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=global
export GOOGLE_GENAI_USE_VERTEXAI=True


import com.google.genai.Client;
import com.google.genai.types.Blob;
import com.google.genai.types.Candidate;
import com.google.genai.types.Content;
import com.google.genai.types.GenerateContentConfig;
import com.google.genai.types.GenerateContentResponse;
import com.google.genai.types.Part;
import com.google.genai.types.SafetySetting;
import java.awt.image.BufferedImage;
import java.io.ByteArrayInputStream;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import javax.imageio.ImageIO;

public class ImageGenMmFlashWithText {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String modelId = "gemini-2.5-flash-image";
    String outputFile = "resources/output/example-image-eiffel-tower.png";
    generateContent(modelId, outputFile);
  }

  // Generates an image with text input
  public static void generateContent(String modelId, String outputFile) throws IOException {
    // Client Initialization. Once created, it can be reused for multiple requests.
    try (Client client = Client.builder().location("global").vertexAI(true).build()) {

      GenerateContentConfig contentConfig =
          GenerateContentConfig.builder()
              .responseModalities("TEXT", "IMAGE")
              .candidateCount(1)
              .safetySettings(
                  SafetySetting.builder()
                      .method("PROBABILITY")
                      .category("HARM_CATEGORY_DANGEROUS_CONTENT")
                      .threshold("BLOCK_MEDIUM_AND_ABOVE")
                      .build())
              .build();

      GenerateContentResponse response =
          client.models.generateContent(
              modelId,
              "Generate an image of the Eiffel tower with fireworks in the background.",
              contentConfig);

      // Get parts of the response
      List<Part> parts =
          response
              .candidates()
              .flatMap(candidates -> candidates.stream().findFirst())
              .flatMap(Candidate::content)
              .flatMap(Content::parts)
              .orElse(new ArrayList<>());

      // For each part print text if present, otherwise read image data if present and
      // write it to the output file
      for (Part part : parts) {
        if (part.text().isPresent()) {
          System.out.println(part.text().get());
        } else if (part.inlineData().flatMap(Blob::data).isPresent()) {
          BufferedImage image =
              ImageIO.read(new ByteArrayInputStream(part.inlineData().flatMap(Blob::data).get()));
          ImageIO.write(image, "png", new File(outputFile));
        }
      }

      System.out.println("Content written to: " + outputFile);
      // Example response:
      // Here is the Eiffel Tower with fireworks in the background...
      //
      // Content written to: resources/output/example-image-eiffel-tower.png
    }
  }
}

REST

在终端中运行以下命令,在当前目录中创建或覆盖此文件:

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}:generateContent \
  -d '{
    "contents": {
      "role": "USER",
      "parts": { "text": "Create a tutorial explaining how to make a peanut butter and jelly sandwich in three easy steps."},
    },
    "generation_config": {
      "response_modalities": ["TEXT", "IMAGE"],
     },
     "safetySettings": {
      "method": "PROBABILITY",
      "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    },
  }' 2>/dev/null >response.json

Gemini 将根据您的说明生成图片。此过程应需要几秒钟,但可能会相对较慢,具体取决于容量。

生成图文交织的内容

Gemini 2.5 Flash Image 可以生成图文交织的回答。例如,您可以生成所生成食谱中每个步骤的图片,以便与该步骤的文本搭配使用,而无需单独向模型发出请求。

控制台

如需生成图文交织的回答,请执行以下操作:

  1. 依次打开 Vertex AI Studio > 创建提示
  2. 点击切换模型,然后从菜单中选择 gemini-2.5-flash-image
  3. 输出面板中,从下拉菜单中选择图片和文本
  4. 编写提示文本区域中,撰写要生成的图片的说明。例如,“创建一个教程,说明如何通过三个简单步骤制作花生酱和果酱三明治。对于每个步骤,提供一个包含步骤编号的标题、一段说明,并生成一张图片,每张图片的宽高比为 1:1。”
  5. 点击提示 () 按钮。

Gemini 将根据您的说明生成回答。此过程应需要几秒钟,但可能会相对较慢,具体取决于容量。

Python

安装

pip install --upgrade google-genai

如需了解详情,请参阅 SDK 参考文档

设置环境变量以将 Gen AI SDK 与 Vertex AI 搭配使用:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=global
export GOOGLE_GENAI_USE_VERTEXAI=True

from google import genai
from google.genai.types import GenerateContentConfig, Modality
from PIL import Image
from io import BytesIO

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash-image",
    contents=(
        "Generate an illustrated recipe for a paella."
        "Create images to go alongside the text as you generate the recipe"
    ),
    config=GenerateContentConfig(response_modalities=[Modality.TEXT, Modality.IMAGE]),
)
with open("output_folder/paella-recipe.md", "w") as fp:
    for i, part in enumerate(response.candidates[0].content.parts):
        if part.text is not None:
            fp.write(part.text)
        elif part.inline_data is not None:
            image = Image.open(BytesIO((part.inline_data.data)))
            image.save(f"output_folder/example-image-{i+1}.png")
            fp.write(f"![image](example-image-{i+1}.png)")
# Example response:
#  A markdown page for a Paella recipe(`paella-recipe.md`) has been generated.
#   It includes detailed steps and several images illustrating the cooking process.

Java

了解如何安装或更新 Java

如需了解详情,请参阅 SDK 参考文档

设置环境变量以将 Gen AI SDK 与 Vertex AI 搭配使用:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=global
export GOOGLE_GENAI_USE_VERTEXAI=True


import com.google.genai.Client;
import com.google.genai.types.Blob;
import com.google.genai.types.Candidate;
import com.google.genai.types.Content;
import com.google.genai.types.GenerateContentConfig;
import com.google.genai.types.GenerateContentResponse;
import com.google.genai.types.Part;
import java.awt.image.BufferedImage;
import java.io.BufferedWriter;
import java.io.ByteArrayInputStream;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import javax.imageio.ImageIO;

public class ImageGenMmFlashTextAndImageWithText {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String modelId = "gemini-2.5-flash-image";
    String outputFile = "resources/output/paella-recipe.md";
    generateContent(modelId, outputFile);
  }

  // Generates text and image with text input
  public static void generateContent(String modelId, String outputFile) throws IOException {
    // Client Initialization. Once created, it can be reused for multiple requests.
    try (Client client = Client.builder().location("global").vertexAI(true).build()) {

      GenerateContentResponse response =
          client.models.generateContent(
              modelId,
              Content.fromParts(
                  Part.fromText("Generate an illustrated recipe for a paella."),
                  Part.fromText(
                      "Create images to go alongside the text as you generate the recipe.")),
              GenerateContentConfig.builder().responseModalities("TEXT", "IMAGE").build());

      try (BufferedWriter writer = new BufferedWriter(new FileWriter(outputFile))) {

        // Get parts of the response
        List<Part> parts =
            response
                .candidates()
                .flatMap(candidates -> candidates.stream().findFirst())
                .flatMap(Candidate::content)
                .flatMap(Content::parts)
                .orElse(new ArrayList<>());

        int index = 1;
        // For each part print text if present, otherwise read image data if present and
        // write it to the output file
        for (Part part : parts) {
          if (part.text().isPresent()) {
            writer.write(part.text().get());
          } else if (part.inlineData().flatMap(Blob::data).isPresent()) {
            BufferedImage image =
                ImageIO.read(new ByteArrayInputStream(part.inlineData().flatMap(Blob::data).get()));
            ImageIO.write(
                image, "png", new File("resources/output/example-image-" + index + ".png"));
            writer.write("![image](example-image-" + index + ".png)");
          }
          index++;
        }

        System.out.println("Content written to: " + outputFile);

        // Example response:
        // A markdown page for a Paella recipe(`paella-recipe.md`) has been generated.
        // It includes detailed steps and several images illustrating the cooking process.
        //
        // Content written to:  resources/output/paella-recipe.md
      }
    }
  }
}

REST

在终端中运行以下命令,在当前目录中创建或覆盖此文件:

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}:generateContent \
  -d '{
    "contents": {
      "role": "USER",
      "parts": { "text": "Create a tutorial explaining how to make a peanut butter and jelly sandwich in three easy steps. For each step, provide a title with the number of the step, an explanation, and also generate an image, generate each image in a 1:1 aspect ratio."},
    },
    "generation_config": {
      "response_modalities": ["TEXT", "IMAGE"],
     },
     "safetySettings": {
      "method": "PROBABILITY",
      "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    },
  }' 2>/dev/null >response.json

Gemini 将根据您的说明生成图片。此过程应需要几秒钟,但可能会相对较慢,具体取决于容量。

支持根据语言区域生成图片

Gemini 2.5 Flash Image 还可以在提供文本或图片回答时包含有关您位置的信息。例如,您可以生成考虑您当前位置的地点或体验类型的图片,而无需向模型指定您的位置。

控制台

如需使用支持根据语言区域生成图片的功能,请执行以下操作:

  1. 依次打开 Vertex AI Studio > 创建提示
  2. 点击切换模型,然后从菜单中选择 gemini-2.5-flash-image
  3. 输出面板中,从下拉菜单中选择图片和文本
  4. 编写提示文本区域中,撰写要生成的图片的说明。例如,“Generate a photo of a typical breakfast.”
  5. 点击提示 () 按钮。

Gemini 将根据您的说明生成回答。此过程应需要几秒钟,但可能会相对较慢,具体取决于容量。

Python

安装

pip install --upgrade google-genai

如需了解详情,请参阅 SDK 参考文档

设置环境变量以将 Gen AI SDK 与 Vertex AI 搭配使用:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=global
export GOOGLE_GENAI_USE_VERTEXAI=True

from google import genai
from google.genai.types import GenerateContentConfig, Modality
from PIL import Image
from io import BytesIO

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash-image",
    contents=("Generate a photo of a breakfast meal."),
    config=GenerateContentConfig(response_modalities=[Modality.TEXT, Modality.IMAGE]),
)
for part in response.candidates[0].content.parts:
    if part.text:
        print(part.text)
    elif part.inline_data:
        image = Image.open(BytesIO((part.inline_data.data)))
        image.save("output_folder/example-breakfast-meal.png")
# Example response:
#   Generates a photo of a vibrant and appetizing breakfast meal.
#   The scene will feature a white plate with golden-brown pancakes
#   stacked neatly, drizzled with rich maple syrup and ...

Java

了解如何安装或更新 Java

如需了解详情,请参阅 SDK 参考文档

设置环境变量以将 Gen AI SDK 与 Vertex AI 搭配使用:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=global
export GOOGLE_GENAI_USE_VERTEXAI=True


import com.google.genai.Client;
import com.google.genai.types.Blob;
import com.google.genai.types.Candidate;
import com.google.genai.types.Content;
import com.google.genai.types.GenerateContentConfig;
import com.google.genai.types.GenerateContentResponse;
import com.google.genai.types.Part;
import java.awt.image.BufferedImage;
import java.io.ByteArrayInputStream;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import javax.imageio.ImageIO;

public class ImageGenMmFlashLocaleAwareWithText {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String modelId = "gemini-2.5-flash-image";
    String outputFile = "resources/output/example-breakfast-meal.png";
    generateContent(modelId, outputFile);
  }

  // Generates an image with text input
  public static void generateContent(String modelId, String outputFile) throws IOException {
    // Client Initialization. Once created, it can be reused for multiple requests.
    try (Client client = Client.builder().location("global").vertexAI(true).build()) {

      GenerateContentResponse response =
          client.models.generateContent(
              modelId,
              "Generate a photo of a breakfast meal.",
              GenerateContentConfig.builder().responseModalities("TEXT", "IMAGE").build());

      // Get parts of the response
      List<Part> parts =
          response
              .candidates()
              .flatMap(candidates -> candidates.stream().findFirst())
              .flatMap(Candidate::content)
              .flatMap(Content::parts)
              .orElse(new ArrayList<>());

      // For each part print text if present, otherwise read image data if present and
      // write it to the output file
      for (Part part : parts) {
        if (part.text().isPresent()) {
          System.out.println(part.text().get());
        } else if (part.inlineData().flatMap(Blob::data).isPresent()) {
          BufferedImage image =
              ImageIO.read(new ByteArrayInputStream(part.inlineData().flatMap(Blob::data).get()));
          ImageIO.write(image, "png", new File(outputFile));
        }
      }

      System.out.println("Content written to: " + outputFile);

      // Example response:
      // Here is a photo of a breakfast meal for you!
      //
      // Content written to: resources/output/example-breakfast-meal.png
    }
  }
}

REST

在终端中运行以下命令,在当前目录中创建或覆盖此文件:

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}:generateContent \
  -d '{
    "contents": {
      "role": "USER",
      "parts": { "text": "Generate a photo of a typical breakfast."},
    },
    "generation_config": {
      "response_modalities": ["TEXT", "IMAGE"],
     },
     "safetySettings": {
      "method": "PROBABILITY",
      "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    },
  }' 2>/dev/null >response.json

Gemini 将根据您的说明生成图片。此过程应需要几秒钟,但可能会相对较慢,具体取决于容量。