Esta página foi traduzida pela API Cloud Translation.

Gerar texto com base em uma imagem

Solicitar o modelo do Gemini com uma imagem e um comando de texto e retornar o texto gerado.

Mais informações

Para ver a documentação detalhada que inclui este exemplo de código, consulte:

Compreensão de imagens

Exemplo de código

Antes de testar esse exemplo, siga as instruções de configuração para Go no Guia de início rápido da Vertex AI sobre como usar bibliotecas de cliente. Para mais informações, consulte a documentação de referência da API Vertex AI para Go.

Para autenticar na Vertex AI, configure o Application Default Credentials. Para mais informações, consulte Configurar a autenticação para um ambiente de desenvolvimento local.

import (
	"context"
	"errors"
	"fmt"
	"io"
	"mime"
	"path/filepath"

	"cloud.google.com/go/vertexai/genai"
)

// generateMultimodalContent generates a response into w, based upon the prompt and image.
func generateMultimodalContent(w io.Writer, projectID, location, modelName string) error {
	// location := "us-central1"
	// modelName := "gemini-1.5-flash-001"
	ctx := context.Background()

	client, err := genai.NewClient(ctx, projectID, location)
	if err != nil {
		return fmt.Errorf("unable to create client: %v", err)
	}
	defer client.Close()

	model := client.GenerativeModel(modelName)
	model.SetTemperature(0.4)

	// Given an image file URL, prepare image file as genai.Part
	img := genai.FileData{
		MIMEType: mime.TypeByExtension(filepath.Ext("scones.jpg")),
		FileURI:  "gs://generativeai-downloads/images/scones.jpg",
	}

	res, err := model.GenerateContent(ctx, img, genai.Text("Describe what is in this picture"))
	if err != nil {
		return fmt.Errorf("unable to generate contents: %v", err)
	}

	if len(res.Candidates) == 0 ||
		len(res.Candidates[0].Content.Parts) == 0 {
		return errors.New("empty response from model")
	}

	fmt.Fprintf(w, "generated response: %s\n", res.Candidates[0].Content.Parts[0])
	return nil
}

Antes de testar esse exemplo, siga as instruções de configuração para Java no Guia de início rápido da Vertex AI sobre como usar bibliotecas de cliente. Para mais informações, consulte a documentação de referência da API Vertex AI para Java.

Para autenticar na Vertex AI, configure o Application Default Credentials. Para mais informações, consulte Configurar a autenticação para um ambiente de desenvolvimento local.

import com.google.cloud.vertexai.VertexAI;
import com.google.cloud.vertexai.api.GenerateContentResponse;
import com.google.cloud.vertexai.generativeai.ContentMaker;
import com.google.cloud.vertexai.generativeai.GenerativeModel;
import com.google.cloud.vertexai.generativeai.PartMaker;
import com.google.cloud.vertexai.generativeai.ResponseHandler;
import java.util.Base64;

public class MultimodalQuery {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-google-cloud-project-id";
    String location = "us-central1";
    String modelName = "gemini-1.5-flash-001";
    String dataImageBase64 = "your-base64-encoded-image";

    String output = multimodalQuery(projectId, location, modelName, dataImageBase64);
    System.out.println(output);
  }


  // Ask the model to recognise the brand associated with the logo image.
  public static String multimodalQuery(String projectId, String location, String modelName,
      String dataImageBase64) throws Exception {
    // Initialize client that will be used to send requests. This client only needs
    // to be created once, and can be reused for multiple requests.
    try (VertexAI vertexAI = new VertexAI(projectId, location)) {
      String output;
      byte[] imageBytes = Base64.getDecoder().decode(dataImageBase64);

      GenerativeModel model = new GenerativeModel(modelName, vertexAI);
      GenerateContentResponse response = model.generateContent(
          ContentMaker.fromMultiModalData(
              "What is this image?",
              PartMaker.fromMimeTypeAndData("image/png", imageBytes)
          ));

      output = ResponseHandler.getText(response);
      return output;
    }
  }
}

Antes de testar esse exemplo, siga as instruções de configuração para Python no Guia de início rápido da Vertex AI sobre como usar bibliotecas de cliente. Para mais informações, consulte a documentação de referência da API Vertex AI para Python.

Para autenticar na Vertex AI, configure o Application Default Credentials. Para mais informações, consulte Configurar a autenticação para um ambiente de desenvolvimento local.

import vertexai

from vertexai.generative_models import GenerativeModel, Part

# TODO(developer): Update and un-comment below line
# PROJECT_ID = "your-project-id"
vertexai.init(project=PROJECT_ID, location="us-central1")

model = GenerativeModel("gemini-1.5-flash-002")

image_file = Part.from_uri(
    "gs://cloud-samples-data/generative-ai/image/scones.jpg", "image/jpeg"
)

# Query the model
response = model.generate_content([image_file, "what is this image?"])
print(response.text)
# Example response:
# That's a lovely overhead flatlay photograph of blueberry scones.
# The image features:
# * **Several blueberry scones:** These are the main focus,
# arranged on parchment paper with some blueberry juice stains.
# ...

A seguir

Para pesquisar e filtrar exemplos de código de outros Google Cloud produtos, consulte a pesquisa de exemplos de código doGoogle Cloud .