The multimodal embeddings model generates 1408-dimension vectors* based on the input you provide, which can include a combination of image, text, and video data. The embedding vectors can then be used for subsequent tasks like image classification or video content moderation.
The image embedding vector and text embedding vector are in the same semantic space with the same dimensionality. Consequently, these vectors can be used interchangeably for use cases like searching image by text, or searching video by image.
For text-only embedding use cases, we recommend using the Vertex AI text-embeddings API instead. For example, the text-embeddings API might be better for text-based semantic search, clustering, long-form document analysis, and other text retrieval or question-answering use cases. For more information, see Get text embeddings.
Supported models
You can get multimodal embeddings by using the following model:
multimodalembedding
Best practices
Consider the following input aspects when using the multimodal embeddings model:
- Text in images - The model can distinguish text in images, similar to
optical character recognition (OCR). If you need to distinguish between a
description of the image content and the text within an image, consider
using prompt engineering to specify your target content.
For example: instead of just "cat", specify "picture of a cat" or
"the text 'cat'", depending on your use case.
the text 'cat'
picture of a cat
Image credit: Manja Vitolic on Unsplash. - Embedding similarities - The dot product of embeddings isn't a calibrated probability. The dot product is a similarity metric and might have different score distributions for different use cases. Consequently, avoid using a fixed value threshold to measure quality. Instead, use ranking approaches for retrieval, or use sigmoid for classification.
API usage
API limits
The following limits apply when you use the multimodalembedding
model for
text and image embeddings:
Limit | Value and description |
---|---|
Text and image data | |
Maximum number of API requests per minute per project | 120 |
Maximum text length | 32 tokens (~32 words) The maximum text length is 32 tokens (approximately 32 words). If the input exceeds 32 tokens, the model internally shortens the input to this length. |
Language | English |
Image formats | BMP, GIF, JPG, PNG |
Image size | Base64-encoded images: 20 MB (when transcoded to PNG) Cloud Storage images: 20MB (original file format) The maximum image size accepted is 20 MB. To avoid increased network latency, use smaller images. Additionally, the model resizes images to 512 x 512 pixel resolution. Consequently, you don't need to provide higher resolution images. |
Video data | |
Audio supported | N/A - The model doesn't consider audio content when generating video embeddings |
Video formats | AVI, FLV, MKV, MOV, MP4, MPEG, MPG, WEBM, and WMV |
Maximum video length (Cloud Storage) | No limit. However, only 2 minutes of content can be analyzed at a time. |
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the Vertex AI API.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the Vertex AI API.
-
Set up authentication for your environment.
Select the tab for how you plan to use the samples on this page:
Java
To use the Java samples on this page in a local development environment, install and initialize the gcloud CLI, and then set up Application Default Credentials with your user credentials.
- Install the Google Cloud CLI.
-
To initialize the gcloud CLI, run the following command:
gcloud init
-
Update and install
gcloud
components:gcloud components update
gcloud components install beta -
If you're using a local shell, then create local authentication credentials for your user account:
gcloud auth application-default login
You don't need to do this if you're using Cloud Shell.
For more information, see Set up authentication for a local development environment in the Google Cloud authentication documentation.
Node.js
To use the Node.js samples on this page in a local development environment, install and initialize the gcloud CLI, and then set up Application Default Credentials with your user credentials.
- Install the Google Cloud CLI.
-
To initialize the gcloud CLI, run the following command:
gcloud init
-
Update and install
gcloud
components:gcloud components update
gcloud components install beta -
If you're using a local shell, then create local authentication credentials for your user account:
gcloud auth application-default login
You don't need to do this if you're using Cloud Shell.
For more information, see Set up authentication for a local development environment in the Google Cloud authentication documentation.
Python
To use the Python samples on this page in a local development environment, install and initialize the gcloud CLI, and then set up Application Default Credentials with your user credentials.
- Install the Google Cloud CLI.
-
To initialize the gcloud CLI, run the following command:
gcloud init
-
Update and install
gcloud
components:gcloud components update
gcloud components install beta -
If you're using a local shell, then create local authentication credentials for your user account:
gcloud auth application-default login
You don't need to do this if you're using Cloud Shell.
For more information, see Set up authentication for a local development environment in the Google Cloud authentication documentation.
REST
To use the REST API samples on this page in a local development environment, you use the credentials you provide to the gcloud CLI.
- Install the Google Cloud CLI.
-
To initialize the gcloud CLI, run the following command:
gcloud init
-
Update and install
gcloud
components:gcloud components update
gcloud components install beta
For more information, see Authenticate for using REST in the Google Cloud authentication documentation.
- To use the Python SDK, follow instructions at Install the Vertex AI SDK for Python. For more information, see the Vertex AI SDK for Python API reference documentation.
- Optional. Review pricing for this feature. Pricing for embeddings depends on the type of data you send (such as image or text), and also depends on the mode you use for certain data types (such as Video Plus, Video Standard, or Video Essential).
Locations
A location is a region you can specify in a request to control where data is stored at rest. For a list of available regions, see Generative AI on Vertex AI locations.
Error messages
Quota exceeded error
google.api_core.exceptions.ResourceExhausted: 429 Quota exceeded for
aiplatform.googleapis.com/online_prediction_requests_per_base_model with base
model: multimodalembedding. Please submit a quota increase request.
If this is the first time you receive this error, use the Google Cloud console to request a quota increase for your project. Use the following filters before requesting your increase:
Service ID: aiplatform.googleapis.com
metric: aiplatform.googleapis.com/online_prediction_requests_per_base_model
base_model:multimodalembedding
If you have already sent a quota increase request, wait before sending another request. If you need to further increase the quota, repeat the quota increase request with your justification for a sustained quota request.
Specify lower-dimension embeddings
By default an embedding request returns a 1408 float vector for a data type. You can also specify lower-dimension embeddings (128, 256, or 512 float vectors) for text and image data. This option lets you optimize for latency and storage or quality based on how you plan to use the embeddings. Lower-dimension embeddings provide decreased storage needs and lower latency for subsequent embedding tasks (like search or recommendation), while higher-dimension embeddings offer greater accuracy for the same tasks.
REST
Low-dimension can be accessed by adding the parameters.dimension
field.
The parameter accepts one of the following values: 128
, 256
, 512
or
1408
. The response includes the embedding of that dimension.
Before using any of the request data, make the following replacements:
- LOCATION: Your project's region. For example,
us-central1
,europe-west2
, orasia-northeast3
. For a list of available regions, see Generative AI on Vertex AI locations. - PROJECT_ID: Your Google Cloud project ID.
- IMAGE_URI: The Cloud Storage URI of the target video to get embeddings for.
For example,
gs://my-bucket/embeddings/supermarket-img.png
.You can also provide the image as a base64-encoded byte string:
[...] "image": { "bytesBase64Encoded": "B64_ENCODED_IMAGE" } [...]
- TEXT: The target text to get embeddings for. For example,
a cat
. - EMBEDDING_DIMENSION: The number of embedding dimensions. Lower values offer decreased
latency when using these embeddings for subsequent tasks, while higher values offer better
accuracy. Available values:
128
,256
,512
, and1408
(default).
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict
Request JSON body:
{ "instances": [ { "image": { "gcsUri": "IMAGE_URI" }, "text": "TEXT" } ], "parameters": { "dimension": EMBEDDING_DIMENSION } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict" | Select-Object -Expand Content
128 dimensions:
{ "predictions": [ { "imageEmbedding": [ 0.0279239565, [...128 dimension vector...] 0.00403284049 ], "textEmbedding": [ 0.202921599, [...128 dimension vector...] -0.0365431122 ] } ], "deployedModelId": "DEPLOYED_MODEL_ID" }
256 dimensions:
{ "predictions": [ { "imageEmbedding": [ 0.248620048, [...256 dimension vector...] -0.0646447465 ], "textEmbedding": [ 0.0757875815, [...256 dimension vector...] -0.02749932 ] } ], "deployedModelId": "DEPLOYED_MODEL_ID" }
512 dimensions:
{ "predictions": [ { "imageEmbedding": [ -0.0523675755, [...512 dimension vector...] -0.0444030389 ], "textEmbedding": [ -0.0592851527, [...512 dimension vector...] 0.0350437127 ] } ], "deployedModelId": "DEPLOYED_MODEL_ID" }
Python
Send an embedding request (image and text)
Use the following code samples to send an embedding request with image and text data. The samples show how to send a request with both data types, but you can also use the service with an individual data type.
Get text and image embeddings
REST
For more information about multimodalembedding
model requests, see the
multimodalembedding
model API reference.
Before using any of the request data, make the following replacements:
- LOCATION: Your project's region. For example,
us-central1
,europe-west2
, orasia-northeast3
. For a list of available regions, see Generative AI on Vertex AI locations. - PROJECT_ID: Your Google Cloud project ID.
- TEXT: The target text to get embeddings for. For example,
a cat
. - B64_ENCODED_IMG: The target image to get embeddings for. The image must be specified as a base64-encoded byte string.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict
Request JSON body:
{ "instances": [ { "text": "TEXT", "image": { "bytesBase64Encoded": "B64_ENCODED_IMG" } } ] }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict" | Select-Object -Expand Content
{ "predictions": [ { "textEmbedding": [ 0.010477379, -0.00399621, 0.00576670747, [...] -0.00823613815, -0.0169572588, -0.00472954148 ], "imageEmbedding": [ 0.00262696808, -0.00198890246, 0.0152047109, -0.0103145819, [...] 0.0324628279, 0.0284924973, 0.011650892, -0.00452344026 ] } ], "deployedModelId": "DEPLOYED_MODEL_ID" }
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Send an embedding request (video, image, or text)
When sending an embedding request you can specify an input video alone, or you can specify a combination of video, image, and text data.
Video embedding modes
There are three modes you can use with video embeddings: Essential, Standard, or
Plus. The mode corresponds to the density of the embeddings generated, which can
be specified by the interval_sec
config in the request. For each video
interval with interval_sec
length, an embedding is generated. The minimal
video interval length is 4 seconds. Interval lengths greater than 120 seconds
might negatively affect the quality of the generated embeddings.
Pricing for video embedding depends on the mode you use. For more information, see pricing.
The following table summarizes the three modes you can use for video embeddings:
Mode | Maximum number of embeddings per minute | Video embedding interval (minimum value) |
---|---|---|
Essential | 4 | 15 This corresponds to: intervalSec >= 15 |
Standard | 8 | 8 This corresponds to: 8 <= intervalSec < 15 |
Plus | 15 | 4 This corresponds to: 4 <= intervalSec < 8 |
Video embeddings best practices
Consider the following when you send video embedding requests:
To generate a single embedding for the first two minutes of an input video of any length, use the following
videoSegmentConfig
setting:request.json
:// other request body content "videoSegmentConfig": { "intervalSec": 120 } // other request body content
To generate embedding for a video with a length greater than two minutes, you can send multiple requests that specify the start and end times in the
videoSegmentConfig
:request1.json
:// other request body content "videoSegmentConfig": { "startOffsetSec": 0, "endOffsetSec": 120 } // other request body content
request2.json
:// other request body content "videoSegmentConfig": { "startOffsetSec": 120, "endOffsetSec": 240 } // other request body content
Get video embeddings
Use the following sample to get embeddings for video content alone.
REST
For more information about multimodalembedding
model requests, see the
multimodalembedding
model API reference.
The following example uses a video located in Cloud Storage. You can
also use the video.bytesBase64Encoded
field to provide a
base64-encoded string representation of the
video.
Before using any of the request data, make the following replacements:
- LOCATION: Your project's region. For example,
us-central1
,europe-west2
, orasia-northeast3
. For a list of available regions, see Generative AI on Vertex AI locations. - PROJECT_ID: Your Google Cloud project ID.
- VIDEO_URI: The Cloud Storage URI of the target video to get embeddings for.
For example,
gs://my-bucket/embeddings/supermarket-video.mp4
.You can also provide the video as a base64-encoded byte string:
[...] "video": { "bytesBase64Encoded": "B64_ENCODED_VIDEO" } [...]
videoSegmentConfig
(START_SECOND, END_SECOND, INTERVAL_SECONDS). Optional. The specific video segments (in seconds) the embeddings are generated for.For example:
[...] "videoSegmentConfig": { "startOffsetSec": 10, "endOffsetSec": 60, "intervalSec": 10 } [...]
Using this config specifies video data from 10 seconds to 60 seconds and generates embeddings for the following 10 second video intervals: [10, 20), [20, 30), [30, 40), [40, 50), [50, 60). This video interval (
"intervalSec": 10
) falls in the Standard video embedding mode, and the user is charged at the Standard mode pricing rate.If you omit
videoSegmentConfig
, the service uses the following default values:"videoSegmentConfig": { "startOffsetSec": 0, "endOffsetSec": 120, "intervalSec": 16 }
. This video interval ("intervalSec": 16
) falls in the Essential video embedding mode, and the user is charged at the Essential mode pricing rate.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict
Request JSON body:
{ "instances": [ { "video": { "gcsUri": "VIDEO_URI", "videoSegmentConfig": { "startOffsetSec": START_SECOND, "endOffsetSec": END_SECOND, "intervalSec": INTERVAL_SECONDS } } } ] }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict" | Select-Object -Expand Content
Response (7 second video, no videoSegmentConfig
specified):
{ "predictions": [ { "videoEmbeddings": [ { "endOffsetSec": 7, "embedding": [ -0.0045467657, 0.0258095954, 0.0146885719, 0.00945400633, [...] -0.0023291884, -0.00493789, 0.00975185353, 0.0168156829 ], "startOffsetSec": 0 } ] } ], "deployedModelId": "DEPLOYED_MODEL_ID" }
Response (59 second video, with the following video segment config: "videoSegmentConfig": { "startOffsetSec": 0, "endOffsetSec": 60, "intervalSec": 10 }
):
{ "predictions": [ { "videoEmbeddings": [ { "endOffsetSec": 10, "startOffsetSec": 0, "embedding": [ -0.00683252793, 0.0390476175, [...] 0.00657121744, 0.013023301 ] }, { "startOffsetSec": 10, "endOffsetSec": 20, "embedding": [ -0.0104404651, 0.0357737206, [...] 0.00509833824, 0.0131902946 ] }, { "startOffsetSec": 20, "embedding": [ -0.0113538112, 0.0305239167, [...] -0.00195809244, 0.00941874553 ], "endOffsetSec": 30 }, { "embedding": [ -0.00299320649, 0.0322436653, [...] -0.00993082579, 0.00968887936 ], "startOffsetSec": 30, "endOffsetSec": 40 }, { "endOffsetSec": 50, "startOffsetSec": 40, "embedding": [ -0.00591270532, 0.0368893594, [...] -0.00219071587, 0.0042470959 ] }, { "embedding": [ -0.00458270218, 0.0368121453, [...] -0.00317760976, 0.00595594104 ], "endOffsetSec": 59, "startOffsetSec": 50 } ] } ], "deployedModelId": "DEPLOYED_MODEL_ID" }
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Get image, text, and video embeddings
Use the following sample to get embeddings for video, text, and image content.
REST
For more information about multimodalembedding
model requests, see the
multimodalembedding
model API reference.
The following example uses image, text, and video data. You can use any combination of these data types in your request body.
Additionally, this
sample uses a video located in Cloud Storage. You can
also use the video.bytesBase64Encoded
field to provide a
base64-encoded string representation of the
video.
Before using any of the request data, make the following replacements:
- LOCATION: Your project's region. For example,
us-central1
,europe-west2
, orasia-northeast3
. For a list of available regions, see Generative AI on Vertex AI locations. - PROJECT_ID: Your Google Cloud project ID.
- TEXT: The target text to get embeddings for. For example,
a cat
. - IMAGE_URI: The Cloud Storage URI of the target video to get embeddings for.
For example,
gs://my-bucket/embeddings/supermarket-img.png
.You can also provide the image as a base64-encoded byte string:
[...] "image": { "bytesBase64Encoded": "B64_ENCODED_IMAGE" } [...]
- VIDEO_URI: The Cloud Storage URI of the target video to get embeddings for.
For example,
gs://my-bucket/embeddings/supermarket-video.mp4
.You can also provide the video as a base64-encoded byte string:
[...] "video": { "bytesBase64Encoded": "B64_ENCODED_VIDEO" } [...]
videoSegmentConfig
(START_SECOND, END_SECOND, INTERVAL_SECONDS). Optional. The specific video segments (in seconds) the embeddings are generated for.For example:
[...] "videoSegmentConfig": { "startOffsetSec": 10, "endOffsetSec": 60, "intervalSec": 10 } [...]
Using this config specifies video data from 10 seconds to 60 seconds and generates embeddings for the following 10 second video intervals: [10, 20), [20, 30), [30, 40), [40, 50), [50, 60). This video interval (
"intervalSec": 10
) falls in the Standard video embedding mode, and the user is charged at the Standard mode pricing rate.If you omit
videoSegmentConfig
, the service uses the following default values:"videoSegmentConfig": { "startOffsetSec": 0, "endOffsetSec": 120, "intervalSec": 16 }
. This video interval ("intervalSec": 16
) falls in the Essential video embedding mode, and the user is charged at the Essential mode pricing rate.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict
Request JSON body:
{ "instances": [ { "text": "TEXT", "image": { "gcsUri": "IMAGE_URI" }, "video": { "gcsUri": "VIDEO_URI", "videoSegmentConfig": { "startOffsetSec": START_SECOND, "endOffsetSec": END_SECOND, "intervalSec": INTERVAL_SECONDS } } } ] }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict" | Select-Object -Expand Content
{ "predictions": [ { "textEmbedding": [ 0.0105433334, -0.00302835181, 0.00656806398, 0.00603460241, [...] 0.00445805816, 0.0139605571, -0.00170318608, -0.00490092579 ], "videoEmbeddings": [ { "startOffsetSec": 0, "endOffsetSec": 7, "embedding": [ -0.00673126569, 0.0248149596, 0.0128901172, 0.0107588246, [...] -0.00180952181, -0.0054573305, 0.0117037306, 0.0169312079 ] } ], "imageEmbedding": [ -0.00728622358, 0.031021487, -0.00206603738, 0.0273937676, [...] -0.00204976718, 0.00321615417, 0.0121978866, 0.0193375275 ] } ], "deployedModelId": "DEPLOYED_MODEL_ID" }
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
What's next
- Read the blog "What is Multimodal Search: 'LLMs with vision' change businesses".
- For information about text-only use cases (text-based semantic search, clustering, long-form document analysis, and other text retrieval or question-answering use cases), read Get text embeddings.
- View all Vertex AI image generative AI offerings in the Imagen on Vertex AI overview.
- Explore more pretrained models in Model Garden.
- Learn about responsible AI best practices and safety filters in Vertex AI.