Membuat teks dari prompt multimodal

Contoh ini menunjukkan cara membuat teks dari prompt multimodal menggunakan model Gemini. Dialog terdiri dari tiga gambar dan dua perintah teks. Model ini menghasilkan respons teks yang menjelaskan gambar dan perintah teks.

Jelajahi lebih lanjut

Untuk dokumentasi mendetail yang menyertakan contoh kode ini, lihat artikel berikut:

Contoh kode

C#

Sebelum mencoba contoh ini, ikuti petunjuk penyiapan C# di Panduan memulai Vertex AI menggunakan library klien. Untuk mengetahui informasi selengkapnya, lihat Dokumentasi referensi API C# Vertex AI.

Untuk melakukan autentikasi ke Vertex AI, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, baca Menyiapkan autentikasi untuk lingkungan pengembangan lokal.


using Google.Api.Gax.Grpc;
using Google.Cloud.AIPlatform.V1;
using Google.Protobuf;
using System.Collections.Generic;
using System.Net.Http;
using System.Text;
using System.Threading.Tasks;

public class MultimodalMultiImage
{
    public async Task<string> GenerateContent(
        string projectId = "your-project-id",
        string location = "us-central1",
        string publisher = "google",
        string model = "gemini-1.0-pro-vision"
    )
    {
        // Create client
        var predictionServiceClient = new PredictionServiceClientBuilder
        {
            Endpoint = $"{location}-aiplatform.googleapis.com"
        }.Build();

        // Images
        ByteString colosseum = await ReadImageFileAsync(
            "https://storage.googleapis.com/cloud-samples-data/vertex-ai/llm/prompts/landmark1.png");

        ByteString forbiddenCity = await ReadImageFileAsync(
            "https://storage.googleapis.com/cloud-samples-data/vertex-ai/llm/prompts/landmark2.png");

        ByteString christRedeemer = await ReadImageFileAsync(
            "https://storage.googleapis.com/cloud-samples-data/vertex-ai/llm/prompts/landmark3.png");

        // Initialize request argument(s)
        var content = new Content
        {
            Role = "USER"
        };
        content.Parts.AddRange(new List<Part>()
        {
            new()
            {
                InlineData = new()
                {
                    MimeType = "image/png",
                    Data = colosseum

                }
            },
            new()
            {
                Text = "city: Rome, Landmark: the Colosseum"
            },
            new()
            {
                InlineData = new()
                {
                    MimeType = "image/png",
                    Data = forbiddenCity
                }
            },
            new()
            {
                Text = "city: Beijing, Landmark: Forbidden City"
            },
            new()
            {
                InlineData = new()
                {
                    MimeType = "image/png",
                    Data = christRedeemer
                }
            }
        });

        var generateContentRequest = new GenerateContentRequest
        {
            Model = $"projects/{projectId}/locations/{location}/publishers/{publisher}/models/{model}"
        };
        generateContentRequest.Contents.Add(content);

        // Make the request, returning a streaming response
        using PredictionServiceClient.StreamGenerateContentStream response = predictionServiceClient.StreamGenerateContent(generateContentRequest);

        StringBuilder fullText = new();

        // Read streaming responses from server until complete
        AsyncResponseStream<GenerateContentResponse> responseStream = response.GetResponseStream();
        await foreach (GenerateContentResponse responseItem in responseStream)
        {
            fullText.Append(responseItem.Candidates[0].Content.Parts[0].Text);
        }
        return fullText.ToString();
    }

    private static async Task<ByteString> ReadImageFileAsync(string url)
    {
        using HttpClient client = new();
        using var response = await client.GetAsync(url);
        byte[] imageBytes = await response.Content.ReadAsByteArrayAsync();
        return ByteString.CopyFrom(imageBytes);
    }
}

Go

Sebelum mencoba contoh ini, ikuti petunjuk penyiapan Go di Panduan memulai Vertex AI menggunakan library klien. Untuk mengetahui informasi selengkapnya, lihat Dokumentasi referensi API Go Vertex AI.

Untuk melakukan autentikasi ke Vertex AI, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, baca Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

import (
	"context"
	"fmt"
	"io"
	"log"
	"net/http"
	"net/url"
	"os"
	"strings"

	"cloud.google.com/go/vertexai/genai"
)

func main() {
	projectID := os.Getenv("GOOGLE_CLOUD_PROJECT")
	location := "us-central1"
	modelName := "gemini-1.0-pro-vision"
	temperature := 0.4

	if projectID == "" {
		log.Fatal("require environment variable GOOGLE_CLOUD_PROJECT")
	}

	// construct this multimodal prompt:
	// [image of colosseum] city: Rome, Landmark: the Colosseum
	// [image of forbidden city]  city: Beijing, Landmark: the Forbidden City
	// [new image]

	// create prompt image parts
	// colosseum
	colosseum, err := partFromImageURL("https://storage.googleapis.com/cloud-samples-data/vertex-ai/llm/prompts/landmark1.png")
	if err != nil {
		log.Fatalf("unable to read image: %v", err)
	}
	// forbidden city
	forbiddenCity, err := partFromImageURL("https://storage.googleapis.com/cloud-samples-data/vertex-ai/llm/prompts/landmark2.png")
	if err != nil {
		log.Fatalf("unable to read image: %v", err)
	}
	// new image
	newImage, err := partFromImageURL("https://storage.googleapis.com/cloud-samples-data/vertex-ai/llm/prompts/landmark3.png")
	if err != nil {
		log.Fatalf("unable to read image: %v", err)
	}

	// create a multimodal (multipart) prompt
	prompt := []genai.Part{
		colosseum,
		genai.Text("city: Rome, Landmark: the Colosseum "),
		forbiddenCity,
		genai.Text("city: Beijing, Landmark: the Forbidden City "),
		newImage,
	}

	// generate the response
	err = generateMultimodalContent(os.Stdout, prompt, projectID, location, modelName, float32(temperature))
	if err != nil {
		log.Fatalf("unable to generate: %v", err)
	}
}

// generateMultimodalContent provide a generated response using multimodal input
func generateMultimodalContent(w io.Writer, parts []genai.Part, projectID, location, modelName string, temperature float32) error {
	ctx := context.Background()

	client, err := genai.NewClient(ctx, projectID, location)
	if err != nil {
		log.Fatal(err)
	}
	defer client.Close()

	model := client.GenerativeModel(modelName)
	model.SetTemperature(temperature)

	res, err := model.GenerateContent(ctx, parts...)
	if err != nil {
		return fmt.Errorf("unable to generate contents: %v", err)
	}

	fmt.Fprintf(w, "generated response: %s\n", res.Candidates[0].Content.Parts[0])

	return nil
}

// partFromImageURL create a multimodal prompt part from an image URL
func partFromImageURL(image string) (genai.Part, error) {
	var img genai.Blob

	imageURL, err := url.Parse(image)
	if err != nil {
		return img, err
	}
	res, err := http.Get(image)
	if err != nil || res.StatusCode != 200 {
		return img, err
	}
	defer res.Body.Close()
	data, err := io.ReadAll(res.Body)
	if err != nil {
		return img, fmt.Errorf("unable to read from http: %v", err)
	}

	position := strings.LastIndex(imageURL.Path, ".")
	if position == -1 {
		return img, fmt.Errorf("couldn't find a period to indicate a file extension")
	}
	ext := imageURL.Path[position+1:]

	img = genai.ImageData(ext, data)
	return img, nil
}

Java

Sebelum mencoba contoh ini, ikuti petunjuk penyiapan Java di Panduan memulai Vertex AI menggunakan library klien. Untuk mengetahui informasi selengkapnya, lihat Dokumentasi referensi API Java Vertex AI.

Untuk melakukan autentikasi ke Vertex AI, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, baca Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

import com.google.cloud.vertexai.VertexAI;
import com.google.cloud.vertexai.api.Content;
import com.google.cloud.vertexai.api.GenerateContentResponse;
import com.google.cloud.vertexai.generativeai.ContentMaker;
import com.google.cloud.vertexai.generativeai.GenerativeModel;
import com.google.cloud.vertexai.generativeai.PartMaker;
import com.google.cloud.vertexai.generativeai.ResponseHandler;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.HttpURLConnection;
import java.net.URL;

public class MultimodalMultiImage {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-google-cloud-project-id";
    String location = "us-central1";
    String modelName = "gemini-1.0-pro-vision";

    multimodalMultiImage(projectId, location, modelName);
  }

  // Generates content from multiple input images.
  public static void multimodalMultiImage(String projectId, String location, String modelName)
      throws IOException {
    // Initialize client that will be used to send requests. This client only needs
    // to be created once, and can be reused for multiple requests.
    try (VertexAI vertexAI = new VertexAI(projectId, location)) {
      GenerativeModel model = new GenerativeModel(modelName, vertexAI);

      Content content = ContentMaker.fromMultiModalData(
          PartMaker.fromMimeTypeAndData("image/png", readImageFile(
              "https://storage.googleapis.com/cloud-samples-data/vertex-ai/llm/prompts/landmark1.png")),
          "city: Rome, Landmark: the Colosseum",
          PartMaker.fromMimeTypeAndData("image/png", readImageFile(
              "https://storage.googleapis.com/cloud-samples-data/vertex-ai/llm/prompts/landmark2.png")),
          "city: Beijing, Landmark: Forbidden City",
          PartMaker.fromMimeTypeAndData("image/png", readImageFile(
              "https://storage.googleapis.com/cloud-samples-data/vertex-ai/llm/prompts/landmark3.png"))
      );

      GenerateContentResponse response = model.generateContent(content);

      String output = ResponseHandler.getText(response);
      System.out.println(output);
    }
  }

  // Reads the image data from the given URL.
  public static byte[] readImageFile(String url) throws IOException {
    URL urlObj = new URL(url);
    HttpURLConnection connection = (HttpURLConnection) urlObj.openConnection();
    connection.setRequestMethod("GET");

    int responseCode = connection.getResponseCode();

    if (responseCode == HttpURLConnection.HTTP_OK) {
      InputStream inputStream = connection.getInputStream();
      ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

      byte[] buffer = new byte[1024];
      int bytesRead;
      while ((bytesRead = inputStream.read(buffer)) != -1) {
        outputStream.write(buffer, 0, bytesRead);
      }

      return outputStream.toByteArray();
    } else {
      throw new RuntimeException("Error fetching file: " + responseCode);
    }
  }
}

Node.js

Sebelum mencoba contoh ini, ikuti petunjuk penyiapan Node.js di Panduan memulai Vertex AI menggunakan library klien. Untuk mengetahui informasi selengkapnya, lihat Dokumentasi referensi API Node.js Vertex AI.

Untuk melakukan autentikasi ke Vertex AI, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, baca Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

const {VertexAI} = require('@google-cloud/vertexai');
const axios = require('axios');

async function getBase64(url) {
  const image = await axios.get(url, {responseType: 'arraybuffer'});
  return Buffer.from(image.data).toString('base64');
}

/**
 * TODO(developer): Update these variables before running the sample.
 */
async function sendMultiModalPromptWithImage(
  projectId = 'PROJECT_ID',
  location = 'us-central1',
  model = 'gemini-1.0-pro-vision'
) {
  // For images, the SDK supports base64 strings
  const landmarkImage1 = await getBase64(
    'https://storage.googleapis.com/cloud-samples-data/vertex-ai/llm/prompts/landmark1.png'
  );
  const landmarkImage2 = await getBase64(
    'https://storage.googleapis.com/cloud-samples-data/vertex-ai/llm/prompts/landmark2.png'
  );
  const landmarkImage3 = await getBase64(
    'https://storage.googleapis.com/cloud-samples-data/vertex-ai/llm/prompts/landmark3.png'
  );

  // Initialize Vertex with your Cloud project and location
  const vertexAI = new VertexAI({project: projectId, location: location});

  const generativeVisionModel = vertexAI.getGenerativeModel({
    model: model,
  });

  // Pass multimodal prompt
  const request = {
    contents: [
      {
        role: 'user',
        parts: [
          {
            inlineData: {
              data: landmarkImage1,
              mimeType: 'image/png',
            },
          },
          {
            text: 'city: Rome, Landmark: the Colosseum',
          },

          {
            inlineData: {
              data: landmarkImage2,
              mimeType: 'image/png',
            },
          },
          {
            text: 'city: Beijing, Landmark: Forbidden City',
          },
          {
            inlineData: {
              data: landmarkImage3,
              mimeType: 'image/png',
            },
          },
        ],
      },
    ],
  };

  // Create the response
  const response = await generativeVisionModel.generateContent(request);
  // Wait for the response to complete
  const aggregatedResponse = await response.response;
  // Select the text from the response
  const fullTextResponse =
    aggregatedResponse.candidates[0].content.parts[0].text;

  console.log(fullTextResponse);
}

Langkah selanjutnya

Untuk menelusuri dan memfilter contoh kode untuk produk Google Cloud lainnya, lihat browser contoh Google Cloud.