Memproses gambar, video, audio, dan teks dengan Gemini 1.5 Pro

Contoh ini menunjukkan cara memproses gambar, video, audio, dan teks secara bersamaan. Contoh ini hanya berfungsi dengan Gemini 1.5 Pro.

Contoh kode

C#

Sebelum mencoba contoh ini, ikuti petunjuk penyiapan C# di Panduan memulai Vertex AI menggunakan library klien. Untuk mengetahui informasi selengkapnya, lihat Dokumentasi referensi API C# Vertex AI.

Untuk melakukan autentikasi ke Vertex AI, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, baca Menyiapkan autentikasi untuk lingkungan pengembangan lokal.


using Google.Cloud.AIPlatform.V1;
using System;
using System.Threading.Tasks;

public class MultimodalAllInput
{
    public async Task<string> AnswerFromMultimodalInput(
        string projectId = "your-project-id",
        string location = "us-central1",
        string publisher = "google",
        string model = "gemini-1.5-flash-001")
    {

        var predictionServiceClient = new PredictionServiceClientBuilder
        {
            Endpoint = $"{location}-aiplatform.googleapis.com"
        }.Build();

        string prompt = "Watch each frame in the video carefully and answer the questions.\n"
                  + "Only base your answers strictly on what information is available in "
                  + "the video attached. Do not make up any information that is not part "
                  + "of the video and do not be too verbose, be to the point.\n\n"
                  + "Questions:\n"
                  + "- When is the moment in the image happening in the video? "
                  + "Provide a timestamp.\n"
                  + "- What is the context of the moment and what does the narrator say about it?";

        var generateContentRequest = new GenerateContentRequest
        {
            Model = $"projects/{projectId}/locations/{location}/publishers/{publisher}/models/{model}",
            Contents =
            {
                new Content
                {
                    Role = "USER",
                    Parts =
                    {
                        new Part { Text = prompt },
                        new Part { FileData = new() { MimeType = "video/mp4", FileUri = "gs://cloud-samples-data/generative-ai/video/behind_the_scenes_pixel.mp4" } },
                        new Part { FileData = new() { MimeType = "image/png", FileUri = "gs://cloud-samples-data/generative-ai/image/a-man-and-a-dog.png" } }
                    }
                }
            }
        };

        GenerateContentResponse response = await predictionServiceClient.GenerateContentAsync(generateContentRequest);

        string responseText = response.Candidates[0].Content.Parts[0].Text;
        Console.WriteLine(responseText);

        return responseText;
    }
}

Go

Sebelum mencoba contoh ini, ikuti petunjuk penyiapan Go di Panduan memulai Vertex AI menggunakan library klien. Untuk mengetahui informasi selengkapnya, lihat Dokumentasi referensi API Go Vertex AI.

Untuk melakukan autentikasi ke Vertex AI, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, baca Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

import (
	"context"
	"errors"
	"fmt"
	"io"
	"mime"
	"path/filepath"

	"cloud.google.com/go/vertexai/genai"
)

// generateContentFromVideoWithAudio shows how to send a multi-modal prompt to a model, writing the response to
// the provided io.Writer.
func generateContentFromVideoWithAudio(w io.Writer, projectID, location, modelName string) error {
	// location := "us-central1"
	// modelName := "gemini-1.5-flash-001"
	ctx := context.Background()

	client, err := genai.NewClient(ctx, projectID, location)
	if err != nil {
		return fmt.Errorf("unable to create client: %w", err)
	}
	defer client.Close()

	model := client.GenerativeModel(modelName)

	vidPart := genai.FileData{
		MIMEType: mime.TypeByExtension(filepath.Ext("behind_the_scenes_pixel.mp4")),
		FileURI:  "gs://cloud-samples-data/generative-ai/video/behind_the_scenes_pixel.mp4",
	}

	imgPart := genai.FileData{
		MIMEType: mime.TypeByExtension(filepath.Ext("a-man-and-a-dog.png")),
		FileURI:  "gs://cloud-samples-data/generative-ai/image/a-man-and-a-dog.png",
	}

	res, err := model.GenerateContent(ctx, vidPart, imgPart, genai.Text(`
		Watch each frame in the video carefully and answer the questions.
		Only base your answers strictly on what information is available in the video attached.
		Do not make up any information that is not part of the video and do not be too
		verbose, be to the point.

		Questions:
		- When is the moment in the image happening in the video? Provide a timestamp.
		- What is the context of the moment and what does the narrator say about it?
	`))
	if err != nil {
		return fmt.Errorf("unable to generate contents: %w", err)
	}

	if len(res.Candidates) == 0 ||
		len(res.Candidates[0].Content.Parts) == 0 {
		return errors.New("empty response from model")
	}

	fmt.Fprintf(w, "generated response: %s\n", res.Candidates[0].Content.Parts[0])
	return nil
}

Java

Sebelum mencoba contoh ini, ikuti petunjuk penyiapan Java di Panduan memulai Vertex AI menggunakan library klien. Untuk mengetahui informasi selengkapnya, lihat Dokumentasi referensi API Java Vertex AI.

Untuk melakukan autentikasi ke Vertex AI, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, baca Menyiapkan autentikasi untuk lingkungan pengembangan lokal.


import com.google.cloud.vertexai.VertexAI;
import com.google.cloud.vertexai.api.GenerateContentResponse;
import com.google.cloud.vertexai.generativeai.ContentMaker;
import com.google.cloud.vertexai.generativeai.GenerativeModel;
import com.google.cloud.vertexai.generativeai.PartMaker;
import com.google.cloud.vertexai.generativeai.ResponseHandler;
import java.io.IOException;

public class MultimodalAllInput {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-google-cloud-project-id";
    String location = "us-central1";
    String modelName = "gemini-1.5-flash-001";

    multimodalAllInput(projectId, location, modelName);
  }

  // A request containing a text prompt, a video, and a picture.
  public static String multimodalAllInput(String projectId, String location, String modelName)
      throws IOException {
    // Initialize client that will be used to send requests. This client only needs
    // to be created once, and can be reused for multiple requests.
    try (VertexAI vertexAI = new VertexAI(projectId, location)) {
      String videoUri = "gs://cloud-samples-data/generative-ai/video/behind_the_scenes_pixel.mp4";
      String imageUri = "gs://cloud-samples-data/generative-ai/image/a-man-and-a-dog.png";

      GenerativeModel model = new GenerativeModel(modelName, vertexAI);
      GenerateContentResponse response = model.generateContent(
          ContentMaker.fromMultiModalData(
              PartMaker.fromMimeTypeAndData("video/mp4", videoUri),
              PartMaker.fromMimeTypeAndData("image/png", imageUri),
              "Watch each frame in the video carefully and answer the questions.\n"
                  + "Only base your answers strictly on what information is available in "
                  + "the video attached. Do not make up any information that is not part "
                  + "of the video and do not be too verbose, be to the point.\n\n"
                  + "Questions:\n"
                  + "- When is the moment in the image happening in the video? "
                  + "Provide a timestamp.\n"
                  + "- What is the context of the moment and what does the narrator say about it?"
          ));

      String output = ResponseHandler.getText(response);
      System.out.println(output);

      return output;
    }
  }
}

Node.js

Sebelum mencoba contoh ini, ikuti petunjuk penyiapan Node.js di Panduan memulai Vertex AI menggunakan library klien. Untuk mengetahui informasi selengkapnya, lihat Dokumentasi referensi API Node.js Vertex AI.

Untuk melakukan autentikasi ke Vertex AI, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, baca Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

const {VertexAI} = require('@google-cloud/vertexai');

/**
 * TODO(developer): Update these variables before running the sample.
 */
async function analyze_all_modalities(projectId = 'PROJECT_ID') {
  const vertexAI = new VertexAI({project: projectId, location: 'us-central1'});

  const generativeModel = vertexAI.getGenerativeModel({
    model: 'gemini-1.5-flash-001',
  });

  const videoFilePart = {
    file_data: {
      file_uri:
        'gs://cloud-samples-data/generative-ai/video/behind_the_scenes_pixel.mp4',
      mime_type: 'video/mp4',
    },
  };
  const imageFilePart = {
    file_data: {
      file_uri:
        'gs://cloud-samples-data/generative-ai/image/a-man-and-a-dog.png',
      mime_type: 'image/png',
    },
  };

  const textPart = {
    text: `
    Watch each frame in the video carefully and answer the questions.
    Only base your answers strictly on what information is available in the video attached.
    Do not make up any information that is not part of the video and do not be too
    verbose, be to the point.

    Questions:
    - When is the moment in the image happening in the video? Provide a timestamp.
    - What is the context of the moment and what does the narrator say about it?`,
  };

  const request = {
    contents: [{role: 'user', parts: [videoFilePart, imageFilePart, textPart]}],
  };

  const resp = await generativeModel.generateContent(request);
  const contentResponse = await resp.response;
  console.log(JSON.stringify(contentResponse));
}

Python

Sebelum mencoba contoh ini, ikuti petunjuk penyiapan Python di Panduan memulai Vertex AI menggunakan library klien. Untuk mengetahui informasi selengkapnya, lihat Dokumentasi referensi API Python Vertex AI.

Untuk melakukan autentikasi ke Vertex AI, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, baca Menyiapkan autentikasi untuk lingkungan pengembangan lokal.


import vertexai
from vertexai.generative_models import GenerativeModel, Part

# TODO(developer): Update and un-comment below line
# PROJECT_ID = "your-project-id"
vertexai.init(project=PROJECT_ID, location="us-central1")

model = GenerativeModel("gemini-1.5-flash-002")

video_file_uri = (
    "gs://cloud-samples-data/generative-ai/video/behind_the_scenes_pixel.mp4"
)

image_file_uri = "gs://cloud-samples-data/generative-ai/image/a-man-and-a-dog.png"

prompt = """
Watch each frame in the video carefully and answer the questions.
Only base your answers strictly on what information is available in the video attached.
Do not make up any information that is not part of the video and do not be too
verbose, be to the point.

Questions:
- When is the moment in the image happening in the video? Provide a timestamp.
- What is the context of the moment and what does the narrator say about it?
"""

contents = [
    Part.from_uri(video_file_uri, mime_type="video/mp4"),
    Part.from_uri(image_file_uri, mime_type="image/png"),
    prompt,
]

response = model.generate_content(contents)
print(response.text)
# Example response:
# Here are the answers to your questions.
# - **Timestamp:** 0:48
# - **Context and Narration:** A man and his dog are sitting on a sofa
# and taking a selfie. The narrator says that the story is about a blind man
# and his girlfriend and follows them on their journey together and growing closer.

Langkah selanjutnya

Untuk menelusuri dan memfilter contoh kode untuk produk Google Cloud lainnya, lihat browser contoh Google Cloud.