Model yang Didukung:
- multimodalembedding@001
Sintaksis
- ID_PROJECT =
PROJECT_ID
- WILAYAH =
us-central1
curl
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://${REGION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${REGION}/publishers/google/models/${MODEL_ID}:predict \ -d '{ "instances": [ ... ], }'
Python
from vertexai.vision_models import MultiModalEmbeddingModel model = MultiModalEmbeddingModel.from_pretrained("multimodalembedding") model.get_embeddings(...)
Daftar parameter
Isi Permintaan
Parameter | |
---|---|
|
Opsional. Teks yang ingin Anda buatkan embedding-nya. |
|
Opsional. Gambar yang ingin Anda buatkan embeddingsnya. |
|
Opsional. Segmen video yang ingin Anda buatkan embeddingsnya. |
|
Opsional. Parameter ini menerima salah satu nilai berikut: 128, 256, 512, atau 1408. Responsnya menyertakan embedding dimensi tersebut. Ini hanya berlaku untuk input teks dan gambar. |
Gambar
Parameter | |
---|---|
|
Opsional. Byte gambar yang dienkode dalam string base64. Salah satu dari |
|
Opsional. Lokasi Cloud Storage gambar tempat penyematan gambar dilakukan. Salah satu dari |
|
Opsional. Jenis MIME konten gambar. |
VideoSegmentConfig
Parameter | |
---|---|
|
Opsional. Offset awal segmen video dalam detik. Jika offset awal tidak ditentukan, offset awal akan dihitung dengan |
|
Opsional. Offset akhir segmen video dalam detik. Jika offset akhir tidak ditentukan, offset akhir akan dihitung dengan |
|
Opsional. Interval video saat penyematan akan dibuat. Nilai minimum untuk |
Video
Parameter | |
---|---|
|
Opsional. Byte video yang dienkode dalam string base64. Salah satu dari |
|
Opsional. Lokasi Cloud Storage video yang akan menjadi tempat penyematan. Salah satu dari |
|
Opsional. Konfigurasi segmen video. |
Contoh
- ID_PROJECT =
PROJECT_ID
- WILAYAH =
us-central1
- MODEL_ID =
multimodalembedding@001
Kasus Penggunaan Dasar
Model embedding multimodal menghasilkan vektor berdasarkan input yang Anda berikan, yang dapat mencakup kombinasi data gambar, teks, dan video.
curl
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://${REGION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${REGION}/publishers/google/models/${MODEL_ID}:predict \ -d '{ "instances": [ { "image": { "gcsUri": "gs://your-public-uri-test/flower.jpg" }, "text": "white shoes", "video": { "gcsUri": "gs://your-public-uri-test/Okabashi.mp4" }, } ], }'
Python
# @title Client for multimodal embedding import base64 import time import typing from dataclasses import dataclass from absl import app from absl import flags # Need to do pip install google-cloud-aiplatform for the following two imports. # Also run: gcloud auth application-default login. from google.cloud import aiplatform from google.protobuf import struct_pb2 PROJECT_ID = {PROJECT_ID} IMAGE_URI = "gs://your-public-uri-test/flower.jpg" # @param {type:"string"} TEXT = "white shoes" # @param {type:"string"} VIDEO_URI = "gs://your-public-uri-test/Okabashi.mp4" # @param {type:"string"} VIDEO_START_OFFSET_SEC=0 VIDEO_END_OFFSET_SEC=120 VIDEO_EMBEDDING_INTERVAL_SEC=16 # Inspired from https://stackoverflow.com/questions/34269772/type-hints-in-namedtuple. class EmbeddingResponse(typing.NamedTuple): @dataclass class VideoEmbedding: start_offset_sec: int end_offset_sec: int embedding: typing.Sequence[float] text_embedding: typing.Sequence[float] image_embedding: typing.Sequence[float] video_embeddings: typing.Sequence[VideoEmbedding] class EmbeddingPredictionClient: """Wrapper around Prediction Service Client.""" def __init__(self, project: str, location: str = "us-central1", api_regional_endpoint: str = "us-central1-aiplatform.googleapis.com"): client_options = {"api_endpoint": api_regional_endpoint} # Initialize client that will be used to create and send requests. # This client only needs to be created once, and can be reused for multiple requests. self.client = aiplatform.gapic.PredictionServiceClient(client_options=client_options) self.location = location self.project = project def get_embedding(self, text: str = None, image_uri: str = None, video_uri: str = None, start_offset_sec: int = 0, end_offset_sec: int = 120, interval_sec: int = 16): if not text and not image_uri and not video_uri: raise ValueError('At least one of text or image_uri or video_uri must be specified.') instance = struct_pb2.Struct() if text: instance.fields['text'].string_value = text if image_uri: image_struct = instance.fields['image'].struct_value image_struct.fields['gcsUri'].string_value = image_uri if video_uri: video_struct = instance.fields['video'].struct_value video_struct.fields['gcsUri'].string_value = video_uri video_config_struct = video_struct.fields['videoSegmentConfig'].struct_value video_config_struct.fields['startOffsetSec'].number_value = start_offset_sec video_config_struct.fields['endOffsetSec'].number_value = end_offset_sec video_config_struct.fields['intervalSec'].number_value = interval_sec instances = [instance] endpoint = (f"projects/{self.project}/locations/{self.location}" "/publishers/google/models/multimodalembedding@001") response = self.client.predict(endpoint=endpoint, instances=instances) text_embedding = None if text: text_emb_value = response.predictions[0]['textEmbedding'] text_embedding = [v for v in text_emb_value] image_embedding = None if image_uri: image_emb_value = response.predictions[0]['imageEmbedding'] image_embedding = [v for v in image_emb_value] video_embeddings = None if video_uri: video_emb_values = response.predictions[0]['videoEmbeddings'] video_embeddings = [ EmbeddingResponse.VideoEmbedding(start_offset_sec=v['startOffsetSec'], end_offset_sec=v['endOffsetSec'], embedding=[x for x in v['embedding']]) for v in video_emb_values] return EmbeddingResponse( text_embedding=text_embedding, image_embedding=image_embedding, video_embeddings=video_embeddings) # client can be reused. client = EmbeddingPredictionClient(project=PROJECT_ID) start = time.time() response = client.get_embedding(text=TEXT, image_uri=IMAGE_URI, video_uri=VIDEO_URI, start_offset_sec=VIDEO_START_OFFSET_SEC, end_offset_sec=VIDEO_END_OFFSET_SEC, interval_sec=VIDEO_EMBEDDING_INTERVAL_SEC) end = time.time() print(response) print('Time taken: ', end - start)
Kasus Penggunaan Lanjutan
Pengguna dapat menentukan dimensi untuk penyematan teks dan gambar. Untuk penyematan video, pengguna dapat menentukan segmen video dan kepadatan penyematan.
curl - gambar
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://${REGION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${REGION}/publishers/google/models/${MODEL_ID}:predict \ -d '{ "instances": [ { "image": { "gcsUri": "gs://your-public-uri-test/flower.jpg" }, "text": "white shoes", } ], "parameters": { "dimension": 128 } }'
curl - video
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://${REGION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${REGION}/publishers/google/models/${MODEL_ID}:predict \ -d '{ "instances": [ { "video": { "gcsUri": "gs://your-public-uri-test/Okabashi.mp4", "videoSegmentConfig": { "startOffsetSec": 10, "endOffsetSec": 60, "intervalSec": 10 } }, } ], }'
Python
# @title Client for multimodal embedding import base64 import time import typing from dataclasses import dataclass from absl import app from absl import flags # Need to do pip install google-cloud-aiplatform for the following two imports. # Also run: gcloud auth application-default login. from google.cloud import aiplatform from google.protobuf import struct_pb2 PROJECT_ID = {PROJECT_ID} IMAGE_URI = "gs://your-public-uri-test/flower.jpg" TEXT = "white shoes" VIDEO_URI = "gs://your-public-uri-test/brahms.mp4" VIDEO_START_OFFSET_SEC=10 VIDEO_END_OFFSET_SEC=60 VIDEO_EMBEDDING_INTERVAL_SEC=10 DIMENSION= 128 # Inspired from https://stackoverflow.com/questions/34269772/type-hints-in-namedtuple. class EmbeddingResponse(typing.NamedTuple): @dataclass class VideoEmbedding: start_offset_sec: int end_offset_sec: int embedding: typing.Sequence[float] text_embedding: typing.Sequence[float] image_embedding: typing.Sequence[float] video_embeddings: typing.Sequence[VideoEmbedding] class EmbeddingPredictionClient: """Wrapper around Prediction Service Client.""" def __init__(self, project: str, location: str = "us-central1", api_regional_endpoint: str = "us-central1-aiplatform.googleapis.com"): client_options = {"api_endpoint": api_regional_endpoint} # Initialize client that will be used to create and send requests. # This client only needs to be created once, and can be reused for multiple requests. self.client = aiplatform.gapic.PredictionServiceClient(client_options=client_options) self.location = location self.project = project def get_embedding(self, text: str = None, image_uri: str = None, video_uri: str = None, start_offset_sec: int = 0, end_offset_sec: int = 120, interval_sec: int = 16, dimension=1408): if not text and not image_uri and not video_uri: raise ValueError('At least one of text or image_uri or video_uri must be specified.') instance = struct_pb2.Struct() if text: instance.fields['text'].string_value = text if image_uri: image_struct = instance.fields['image'].struct_value image_struct.fields['gcsUri'].string_value = image_uri if video_uri: video_struct = instance.fields['video'].struct_value video_struct.fields['gcsUri'].string_value = video_uri video_config_struct = video_struct.fields['videoSegmentConfig'].struct_value video_config_struct.fields['startOffsetSec'].number_value = start_offset_sec video_config_struct.fields['endOffsetSec'].number_value = end_offset_sec video_config_struct.fields['intervalSec'].number_value = interval_sec parameters = struct_pb2.Struct() parameters.fields['dimension'].number_value = dimension instances = [instance] endpoint = (f"projects/{self.project}/locations/{self.location}" "/publishers/google/models/multimodalembedding@001") response = self.client.predict(endpoint=endpoint, instances=instances, parameters=parameters) text_embedding = None if text: text_emb_value = response.predictions[0]['textEmbedding'] text_embedding = [v for v in text_emb_value] image_embedding = None if image_uri: image_emb_value = response.predictions[0]['imageEmbedding'] image_embedding = [v for v in image_emb_value] video_embeddings = None if video_uri: video_emb_values = response.predictions[0]['videoEmbeddings'] video_embeddings = [ EmbeddingResponse.VideoEmbedding(start_offset_sec=v['startOffsetSec'], end_offset_sec=v['endOffsetSec'], embedding=[x for x in v['embedding']]) for v in video_emb_values] return EmbeddingResponse( text_embedding=text_embedding, image_embedding=image_embedding, video_embeddings=video_embeddings) # client can be reused. client = EmbeddingPredictionClient(project=PROJECT_ID) start = time.time() response = client.get_embedding(text=TEXT, image_uri=IMAGE_URI, video_uri=VIDEO_URI, start_offset_sec=VIDEO_START_OFFSET_SEC, end_offset_sec=VIDEO_END_OFFSET_SEC, interval_sec=VIDEO_EMBEDDING_INTERVAL_SEC, dimension=DIMENSION) end = time.time() print(response) print('Time taken: ', end - start)
Jelajahi lebih lanjut
Untuk dokumentasi mendetail, lihat berikut ini: