Data Analytics

Simplify your AI workflow with autonomous embedding generation in BigQuery

February 19, 2026

Andong Li

Software Engineer

Brian Seung

Software Engineer

Try Nano Banana 2

State-of-the-art image generation and editing

Try now

In the world of generative AI and Retrieval-Augmented Generation (RAG), embeddings are the "secret sauce" that allow machines and AI agents to understand the semantic meaning of data. As BigQuery extends its autonomous data-to-AI platform, embeddings unblock valuable multimodal use cases. However, for many data engineers, managing embeddings is a headache. Traditionally, users have to set up embedding generation pipelines themselves to propagate source content updates, embedding generation, and storage.

To help BigQuery users with their AI workloads, we’re introducing autonomous embedding generation. This feature allows BigQuery to automatically maintain an embedding column on a table based on a source column. No more manual pipelines, no more synchronization issues, just easy, AI-ready data.

Managing embeddings, the old way

Before autonomous generation, the process of updating your vector search database usually looked like this:

Detect new rows in your source table.
Generate embeddings via functions like AI.EMBED.
Handle rate limits and retries.
Update the destination table with the new vectors.
Monitor the progress of your embedding generations.

If your data changes frequently, keeping these vectors in sync can be a full-time job for the user/administrator. With this as the backdrop, we set out to enhance BigQuery with the following capabilities.

1. Help the user directly work with their data

We want to simplify the search experience for the user, so that they can do simple things like AI.SEARCH(TABLE mydataset.products, 'product_description', "A really fun toy"), without having to interact or understand the embeddings.

2. Automatic synchronization

BigQuery should manage embedding generation on behalf of the user and keep generated embeddings in sync with the source data.

3. Tight integration with vector indexes

BigQuery’s VECTOR_SEARCH has many users, and we want to ensure that the managed embedding was integrated into it.

The solution: autonomously generated embedding columns

We solved this by treating embeddings as a managed part of your table. Using a familiar SQL syntax, you can now define an autonomous embedding column that BigQuery manages for you.

For more information, please refer to the guide.

Integration with vector index and vector search

In addition, BigQuery’s vector index and vector search are also integrated with the generated embedding column. You can directly create a vector index associated with the source data column and query your data without managing embeddings manually. BigQuery automatically applies the base table's model to generate compatible embeddings for your query.

Introducing AI.SEARCH

We also launched a new function, AI.SEARCH, to provide a simplified signature for you to get started with the data-centric search experience. AI.SEARCH automatically uses the embedding model associated with the generated embedding column from the base table, so you don’t need to interact with the embedding configuration when using AI.SEARCH or VECTOR_SEARCH.

Simple management

Autonomous embedding generation is in preview, ready for you to use as part of your data analytics pipelines today. We’ve also invested in a few features to help make the process simpler to manage end to end:

Embedding status metadata: You can track the progress of embedding generation by querying the percentage of non-null embeddings in your table:

While you can initiate the creation of the vector index at any time, generating an index model will only happen at a scale when performance will benefit.

Native access to Vertex AI models: By ensuring your BigQuery connection has the Vertex AI User role, embedding generation can securely "talk" to a remote state of the art Vertex AI embedding models on your behalf.
Native error monitoring: If any step in the embedding generation pipeline fails, , you can view the status of recent background jobs via INFORMATION_SCHEMA jobs view (example), Here you can find detailed error info to help you resolve the issue.

What’s next

Autonomous embedding generation represents a shift toward AI-native multimodal data foundation that’s built for processing and activation of all data types. By automating and coupling embedding generation within the data platform, we’re helping developers spend less time on plumbing and more time on building intelligent applications. And we’re not done yet, and are hard at work building:

Simpler connection creation via Data Definition Library (DDL) and Data Control Language (DCL)
The ability to add a generated embedding column to existing tables via ALTER TABLE ADD COLUMN DDL
API and UI support for managing generated embedding columns
Direct support for multimodal data using ObjectRef