The Multimodal embeddings API generates vectors based on the input you
provide, which can include a combination of image, text, and video data. The
embedding vectors can then be used for subsequent tasks like image
classification or video content moderation. For additional conceptual information, see Multimodal embeddings. Supported Models: Syntax to send a multimodal embeddings API request. See examples for implementation details. Optional: The image to generate embeddings
for.
Optional: The text to generate embeddings
for.
Optional: The video segment to generate
embeddings for.
Optional: The dimension of the embedding,
included in the response. Only applies to text and image input. Accepted
values: Optional: Image bytes encoded in a base64 string. Must be one of Optional. The Cloud Storage location of the image to perform the embedding. One of Optional. The MIME type of the content of the image. Supported values: Optional: Video bytes encoded in base64 string. One of Optional: The Cloud Storage location of the video on which to perform the embedding. One of Optional: The video segment config. Optional: The start offset of the video segment in seconds. If not specified, it's calculated with Optional: The end offset of the video segment in seconds. If not specified, it's calculated with Optional. The interval of the video the embedding will be generated. The minimum value for Use the following sample to generate embeddings for an image.
Before using any of the request data,
make the following replacements:
HTTP method and URL:
Request JSON body:
To send your request, choose one of these options:
Save the request body in a file named
Save the request body in a file named To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python.
For more information, see the
Python API reference documentation.
Before trying this sample, follow the Node.js setup instructions in the
Vertex AI quickstart using
client libraries.
For more information, see the
Vertex AI Node.js API
reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials.
For more information, see
Set up authentication for a local development environment.
Before trying this sample, follow the Java setup instructions in the
Vertex AI quickstart using
client libraries.
For more information, see the
Vertex AI Java API
reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials.
For more information, see
Set up authentication for a local development environment.
Before trying this sample, follow the Go setup instructions in the
Vertex AI quickstart using
client libraries.
For more information, see the
Vertex AI Go API
reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials.
For more information, see
Set up authentication for a local development environment.
Model
Code
Embeddings for Multimodal
multimodalembedding@001
Example syntax
curl
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:predict \
-d '{
"instances": [
...
],
}'
Python
from vertexai.vision_models import MultiModalEmbeddingModel
model = MultiModalEmbeddingModel.from_pretrained("multimodalembedding")
model.get_embeddings(...)
Parameter list
Request Body
{
"instances": [
{
"text": string,
"image": {
// Union field can be only one of the following:
"bytesBase64Encoded": string,
"gcsUri": string,
// End of list of possible types for union field.
"mimeType": string
},
"video": {
// Union field can be only one of the following:
"bytesBase64Encoded": string,
"gcsUri": string,
// End of list of possible types for union field.
"videoSegmentConfig": {
"startOffsetSec": integer,
"endOffsetSec": integer,
"intervalSec": integer
}
},
"parameters": {
"dimension": integer
}
}
]
}
Parameters
image
Image
text
String
video
Video
dimension
Int
128
, 256
, 512
, or
1408
.
Image
Parameters
bytesBase64Encoded
String
bytesBase64Encoded
or gcsUri
.
gcsUri
String
bytesBase64Encoded
or gcsUri
.
mimeType
String
image/jpeg
and image/png
.Video
Parameters
bytesBase64Encoded
String
bytesBase64Encoded
or gcsUri
.
gcsUri
String
bytesBase64Encoded
or gcsUri
.
videoSegmentConfig
VideoSegmentConfig
VideoSegmentConfig
Parameters
startOffsetSec
Int
max(0, endOffsetSec - 120)
.
endOffsetSec
Int
min(video length, startOffSec + 120)
. If both startOffSec
and endOffSec
are specified, endOffsetSec
is adjusted to min(startOffsetSec+120, endOffsetSec)
.
intervalSec
Int
interval_sec
is 4. If the interval is less than 4
, an InvalidArgumentError
is returned. There are no limitations on the maximum value of the interval. However, if the interval is larger than min(video length, 120s)
, it impacts the quality of the generated embeddings. Default value: 16
.Response body
{
"predictions": [
{
"textEmbedding": [
float,
// array of 128, 256, 512, or 1408 float values
float
],
"imageEmbedding": [
float,
// array of 128, 256, 512, or 1408 float values
float
],
"videoEmbeddings": [
{
"startOffsetSec": integer,
"endOffsetSec": integer,
"embedding": [
float,
// array of 1408 float values
float
]
}
]
}
],
"deployedModelId": string
}
Response element
Description
imageEmbedding
128, 256, 512, or 1408 dimension list of floats.
textEmbedding
128, 256, 512, or 1408 dimension list of floats.
videoEmbeddings
1408 dimension list of floats with the start and end time (in seconds) of the video segment that the embeddings are generated for.
Examples
Basic use case
Generate embeddings from image
REST
us-central1
, europe-west2
, or asia-northeast3
. For a list
of available regions, see
Generative AI on Vertex AI locations.
a cat
.POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict
{
"instances": [
{
"text": "TEXT",
"image": {
"bytesBase64Encoded": "B64_ENCODED_IMG"
}
}
]
}
curl
request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict"PowerShell
request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict" | Select-Object -Expand Content
{
"predictions": [
{
"textEmbedding": [
0.010477379,
-0.00399621,
0.00576670747,
[...]
-0.00823613815,
-0.0169572588,
-0.00472954148
],
"imageEmbedding": [
0.00262696808,
-0.00198890246,
0.0152047109,
-0.0103145819,
[...]
0.0324628279,
0.0284924973,
0.011650892,
-0.00452344026
]
}
],
"deployedModelId": "DEPLOYED_MODEL_ID"
}
Python
Node.js
Java
Go
Generate embeddings from video
Use the following sample to generating embeddings for video content.
REST
The following example uses a video located in Cloud Storage. You can
also use the video.bytesBase64Encoded
field to provide a
base64-encoded string representation of the
video.
Before using any of the request data, make the following replacements:
- LOCATION: Your project's region. For example,
us-central1
,europe-west2
, orasia-northeast3
. For a list of available regions, see Generative AI on Vertex AI locations. - PROJECT_ID: Your Google Cloud project ID.
- VIDEO_URI: The Cloud Storage URI of the target video to get embeddings for.
For example,
gs://my-bucket/embeddings/supermarket-video.mp4
.You can also provide the video as a base64-encoded byte string:
[...] "video": { "bytesBase64Encoded": "B64_ENCODED_VIDEO" } [...]
videoSegmentConfig
(START_SECOND, END_SECOND, INTERVAL_SECONDS). Optional. The specific video segments (in seconds) the embeddings are generated for.For example:
[...] "videoSegmentConfig": { "startOffsetSec": 10, "endOffsetSec": 60, "intervalSec": 10 } [...]
Using this config specifies video data from 10 seconds to 60 seconds and generates embeddings for the following 10 second video intervals: [10, 20), [20, 30), [30, 40), [40, 50), [50, 60). This video interval (
"intervalSec": 10
) falls in the Standard video embedding mode, and the user is charged at the Standard mode pricing rate.If you omit
videoSegmentConfig
, the service uses the following default values:"videoSegmentConfig": { "startOffsetSec": 0, "endOffsetSec": 120, "intervalSec": 16 }
. This video interval ("intervalSec": 16
) falls in the Essential video embedding mode, and the user is charged at the Essential mode pricing rate.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict
Request JSON body:
{ "instances": [ { "video": { "gcsUri": "VIDEO_URI", "videoSegmentConfig": { "startOffsetSec": START_SECOND, "endOffsetSec": END_SECOND, "intervalSec": INTERVAL_SECONDS } } } ] }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict" | Select-Object -Expand Content
Response (7 second video, no videoSegmentConfig
specified):
{ "predictions": [ { "videoEmbeddings": [ { "endOffsetSec": 7, "embedding": [ -0.0045467657, 0.0258095954, 0.0146885719, 0.00945400633, [...] -0.0023291884, -0.00493789, 0.00975185353, 0.0168156829 ], "startOffsetSec": 0 } ] } ], "deployedModelId": "DEPLOYED_MODEL_ID" }
Response (59 second video, with the following video segment config: "videoSegmentConfig": { "startOffsetSec": 0, "endOffsetSec": 60, "intervalSec": 10 }
):
{ "predictions": [ { "videoEmbeddings": [ { "endOffsetSec": 10, "startOffsetSec": 0, "embedding": [ -0.00683252793, 0.0390476175, [...] 0.00657121744, 0.013023301 ] }, { "startOffsetSec": 10, "endOffsetSec": 20, "embedding": [ -0.0104404651, 0.0357737206, [...] 0.00509833824, 0.0131902946 ] }, { "startOffsetSec": 20, "embedding": [ -0.0113538112, 0.0305239167, [...] -0.00195809244, 0.00941874553 ], "endOffsetSec": 30 }, { "embedding": [ -0.00299320649, 0.0322436653, [...] -0.00993082579, 0.00968887936 ], "startOffsetSec": 30, "endOffsetSec": 40 }, { "endOffsetSec": 50, "startOffsetSec": 40, "embedding": [ -0.00591270532, 0.0368893594, [...] -0.00219071587, 0.0042470959 ] }, { "embedding": [ -0.00458270218, 0.0368121453, [...] -0.00317760976, 0.00595594104 ], "endOffsetSec": 59, "startOffsetSec": 50 } ] } ], "deployedModelId": "DEPLOYED_MODEL_ID" }
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Go
Before trying this sample, follow the Go setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Go API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Advanced use case
Use the following sample to get embeddings for video, text, and image content.
For video embedding, you can specify the video segment and embedding density.
REST
The following example uses image, text, and video data. You can use any combination of these data types in your request body.
This sample uses a video located in Cloud Storage. You can
also use the video.bytesBase64Encoded
field to provide a
base64-encoded string representation of the
video.
Before using any of the request data, make the following replacements:
- LOCATION: Your project's region. For example,
us-central1
,europe-west2
, orasia-northeast3
. For a list of available regions, see Generative AI on Vertex AI locations. - PROJECT_ID: Your Google Cloud project ID.
- TEXT: The target text to get embeddings for. For example,
a cat
. - IMAGE_URI: The Cloud Storage URI of the target image to get embeddings for.
For example,
gs://my-bucket/embeddings/supermarket-img.png
.You can also provide the image as a base64-encoded byte string:
[...] "image": { "bytesBase64Encoded": "B64_ENCODED_IMAGE" } [...]
- VIDEO_URI: The Cloud Storage URI of the target video to get embeddings for.
For example,
gs://my-bucket/embeddings/supermarket-video.mp4
.You can also provide the video as a base64-encoded byte string:
[...] "video": { "bytesBase64Encoded": "B64_ENCODED_VIDEO" } [...]
videoSegmentConfig
(START_SECOND, END_SECOND, INTERVAL_SECONDS). Optional. The specific video segments (in seconds) the embeddings are generated for.For example:
[...] "videoSegmentConfig": { "startOffsetSec": 10, "endOffsetSec": 60, "intervalSec": 10 } [...]
Using this config specifies video data from 10 seconds to 60 seconds and generates embeddings for the following 10 second video intervals: [10, 20), [20, 30), [30, 40), [40, 50), [50, 60). This video interval (
"intervalSec": 10
) falls in the Standard video embedding mode, and the user is charged at the Standard mode pricing rate.If you omit
videoSegmentConfig
, the service uses the following default values:"videoSegmentConfig": { "startOffsetSec": 0, "endOffsetSec": 120, "intervalSec": 16 }
. This video interval ("intervalSec": 16
) falls in the Essential video embedding mode, and the user is charged at the Essential mode pricing rate.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict
Request JSON body:
{ "instances": [ { "text": "TEXT", "image": { "gcsUri": "IMAGE_URI" }, "video": { "gcsUri": "VIDEO_URI", "videoSegmentConfig": { "startOffsetSec": START_SECOND, "endOffsetSec": END_SECOND, "intervalSec": INTERVAL_SECONDS } } } ] }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict" | Select-Object -Expand Content
{ "predictions": [ { "textEmbedding": [ 0.0105433334, -0.00302835181, 0.00656806398, 0.00603460241, [...] 0.00445805816, 0.0139605571, -0.00170318608, -0.00490092579 ], "videoEmbeddings": [ { "startOffsetSec": 0, "endOffsetSec": 7, "embedding": [ -0.00673126569, 0.0248149596, 0.0128901172, 0.0107588246, [...] -0.00180952181, -0.0054573305, 0.0117037306, 0.0169312079 ] } ], "imageEmbedding": [ -0.00728622358, 0.031021487, -0.00206603738, 0.0273937676, [...] -0.00204976718, 0.00321615417, 0.0121978866, 0.0193375275 ] } ], "deployedModelId": "DEPLOYED_MODEL_ID" }
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Go
Before trying this sample, follow the Go setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Go API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
What's next
For detailed documentation, see the following: