Diese Seite wurde von der Cloud Translation API übersetzt.

Text aus multimodalem Prompt generieren

Dieses Beispiel zeigt, wie Sie mit dem Gemini-Modell Text aus einem multimodalen Prompt generieren. Der Prompt besteht aus drei Bildern und zwei Text-Prompts. Das Modell generiert eine Textantwort, die die Bilder und Text-Prompts beschreibt.

Codebeispiel

Java

Bevor Sie dieses Beispiel anwenden, folgen Sie den Java-Einrichtungsschritten in der Vertex AI-Kurzanleitung zur Verwendung von Clientbibliotheken. Weitere Informationen finden Sie in der Referenzdokumentation zur Vertex AI Java API.

Richten Sie zur Authentifizierung bei Vertex AI Standardanmeldedaten für Anwendungen ein. Weitere Informationen finden Sie unter Authentifizierung für eine lokale Entwicklungsumgebung einrichten.


import com.google.genai.Client;
import com.google.genai.types.Content;
import com.google.genai.types.GenerateContentResponse;
import com.google.genai.types.HttpOptions;
import com.google.genai.types.Part;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;

public class TextGenerationWithMultiLocalImage {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String modelId = "gemini-2.5-flash";
    String localImageFilePath1 = "your/local/img1.jpg";
    String localImageFilePath2 = "your/local/img2.jpg";
    generateContent(modelId, localImageFilePath1, localImageFilePath2);
  }

  // Generates text using multiple local images
  public static String generateContent(
      String modelId, String localImageFilePath1, String localImageFilePath2) throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    try (Client client =
        Client.builder()
            .location("global")
            .vertexAI(true)
            .httpOptions(HttpOptions.builder().apiVersion("v1").build())
            .build()) {

      // Read content from local files.
      byte[] localFileImg1Bytes = Files.readAllBytes(Paths.get(localImageFilePath1));
      byte[] localFileImg2Bytes = Files.readAllBytes(Paths.get(localImageFilePath2));

      GenerateContentResponse response =
          client.models.generateContent(
              modelId,
              Content.fromParts(
                  Part.fromBytes(localFileImg1Bytes, "image/jpeg"),
                  Part.fromBytes(localFileImg2Bytes, "image/jpeg"),
                  Part.fromText("Generate a list of all the objects contained in both images")),
              null);

      System.out.print(response.text());
      // Example response:
      // Based on both images, here are the objects contained in both:
      //
      // 1.  **Coffee cups (or mugs)**: Both images feature one or more cups containing a beverage.
      // 2.  **Coffee (or a similar beverage)**: Both images contain a liquid beverage in the cups,
      // appearing to be coffee or a coffee-like drink.
      // 3.  **Table (or a flat surface)**: Both compositions are set on a flat surface, likely a
      // table or countertop.
      return response.text();
    }
  }
}

Node.js

Bevor Sie dieses Beispiel anwenden, folgen Sie den Node.js-Einrichtungsschritten in der Vertex AI-Kurzanleitung zur Verwendung von Clientbibliotheken. Weitere Informationen finden Sie in der Referenzdokumentation zur Vertex AI Node.js API.

Richten Sie zur Authentifizierung bei Vertex AI Standardanmeldedaten für Anwendungen ein. Weitere Informationen finden Sie unter Authentifizierung für eine lokale Entwicklungsumgebung einrichten.

const {GoogleGenAI} = require('@google/genai');
const fs = require('fs');

const GOOGLE_CLOUD_PROJECT = process.env.GOOGLE_CLOUD_PROJECT;
const GOOGLE_CLOUD_LOCATION = process.env.GOOGLE_CLOUD_LOCATION || 'global';

function loadImageAsBase64(path) {
  const bytes = fs.readFileSync(path);
  return bytes.toString('base64');
}

async function generateContent(
  projectId = GOOGLE_CLOUD_PROJECT,
  location = GOOGLE_CLOUD_LOCATION,
  imagePath1,
  imagePath2
) {
  const client = new GoogleGenAI({
    vertexai: true,
    project: projectId,
    location: location,
  });

  // TODO(Developer): Update the below file paths to your images
  const image1 = loadImageAsBase64(imagePath1);
  const image2 = loadImageAsBase64(imagePath2);

  const response = await client.models.generateContent({
    model: 'gemini-2.5-flash',
    contents: [
      {
        role: 'user',
        parts: [
          {
            text: 'Generate a list of all the objects contained in both images.',
          },
          {
            inlineData: {
              data: image1,
              mimeType: 'image/jpeg',
            },
          },
          {
            inlineData: {
              data: image2,
              mimeType: 'image/jpeg',
            },
          },
        ],
      },
    ],
  });

  console.log(response.text);

  // Example response:
  //  Okay, here's a jingle combining the elements of both sets of images, focusing on ...
  //  ...

  return response.text;
}

Python

Bevor Sie dieses Beispiel anwenden, folgen Sie den Python-Einrichtungsschritten in der Vertex AI-Kurzanleitung zur Verwendung von Clientbibliotheken. Weitere Informationen finden Sie in der Referenzdokumentation zur Vertex AI Python API.

Richten Sie zur Authentifizierung bei Vertex AI Standardanmeldedaten für Anwendungen ein. Weitere Informationen finden Sie unter Authentifizierung für eine lokale Entwicklungsumgebung einrichten.

from google import genai
from google.genai.types import HttpOptions, Part

client = genai.Client(http_options=HttpOptions(api_version="v1"))
# TODO(Developer): Update the below file paths to your images
# image_path_1 = "path/to/your/image1.jpg"
# image_path_2 = "path/to/your/image2.jpg"
with open(image_path_1, "rb") as f:
    image_1_bytes = f.read()
with open(image_path_2, "rb") as f:
    image_2_bytes = f.read()

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=[
        "Generate a list of all the objects contained in both images.",
        Part.from_bytes(data=image_1_bytes, mime_type="image/jpeg"),
        Part.from_bytes(data=image_2_bytes, mime_type="image/jpeg"),
    ],
)
print(response.text)
# Example response:
# Okay, here's a jingle combining the elements of both sets of images, focusing on ...
# ...

Weitere Informationen

Wenn Sie nach Codebeispielen für andere Google Cloud -Produkte suchen und filtern möchten, können Sie den Google Cloud -Beispielbrowser verwenden.