Data Analytics

Introducing BigQuery managed and SQL-native inference for open models

January 15, 2026

Jiashang Liu

Software Engineer

Yunmeng Xie

Software Engineer

Try Gemini 3.1 Pro

Our most intelligent model available yet for complex tasks on Gemini Enterprise and Vertex AI

Try now

BigQuery provides access to a variety of LLMs for text and embedding generation, including Google's Gemini models, Google-managed models from partners like Anthropic and Mistral. Using Gemini models and Google-managed partner models in BigQuery is simple — just create the model with the foundation model name and run inference directly in SQL queries. Today, we are bringing this same simplicity and power to any model you may choose from Hugging Face or Vertex AI Model Garden.

A SQL-native workflow with automated management

With the launch of managed third-party generative AI inference in BigQuery (Preview), you can now run open models using just two SQL statements.

This new capability delivers four key benefits:

Simplified deployment: Deploy open models using a single CREATE MODEL SQL statement with the model id string (e.g., google/gemma-3-1b-it). BigQuery automatically provisions the compute resources with default configurations.
Automated resource management: BigQuery automatically releases idle compute resources, preventing unintended costs. You can configure idle time via endpoint_idle_ttl.
Granular resource control: You can customize backend computing resources (like machine types and min/max replicas) directly within your CREATE MODEL statement to meet your performance and cost needs.
Unified SQL interface: The entire workflow — from model creation and inference to cost management and cleanup — is managed directly in BigQuery using SQL.

How it works: A practical example

Let’s take a look at the process of creating and utilizing an open model.

Step 1: Create a BigQuery managed open model
To use an open model from Hugging Face or Vertex AI Model Garden, use a CREATE MODEL statement along with the open model ID. It typically takes a few minutes for the query to complete, depending on the model size and machine types.

Hugging Face models
Specify the option hugging_face_model_id in the format of provider_name/model_name. For example, sentence-transformers/all-MiniLM-L6-v2 .

Vertex AI Model Garden models
Specify the option model_garden_model_name in the format publishers/publisher/models/model_name@model_version. For example, publishers/google/models/gemma3@gemma-3-1b-it .

For demanding workloads, you can customize deployment settings (machine types, replica counts, endpoint idle time) to improve scalability and manage costs. You can also use Compute Engine reservations to secure GPU instances for consistent performance. See CREATE MODEL syntax for all the options.

Step 2: Run batch inference
Once the above CREATE MODEL job finishes, you can use it with AI.GENERATE_TEXT (for LLM inference) or AI.GENERATE_EMBEDDING (for embedding generation) on your data in BigQuery.

Vertex AI endpoint lifecycle management and cost control
BigQuery offers flexible controls over Vertex AI endpoint lifecycle and costs through both automated and manual options.

Automated control: The endpoint_idle_ttl option enables automated resource recycling. If the model isn't used for the specified duration (e.g., INTERVAL 10 HOUR), BigQuery automatically “undeploys” the Vertex AI endpoint for you, stopping all costs.
Manual control: You can also manually “undeploy“ an endpoint to immediately stop the cost, or redeploy an endpoint using a simple ALTER MODEL statement.

Easy resource cleanup
When you are done with using a model, simply drop it. BigQuery automatically cleans up all associated Vertex AI resources (like the endpoint and model) for you, so you are no longer charged for them.

Get started today

BigQuery’s new managed inference capability for 3P models fundamentally changes how data teams access and use third-party gen AI models. By consolidating the entire model lifecycle management into a familiar SQL interface, we're removing the operational friction and making powerful open models accessible to every BigQuery user, from data analysts to AI/ML engineers. For comprehensive documentation and tutorials, please refer to the following resources:

Read the documentation: Creating automatically-deployed open models
Try the text generation tutorial: Generate text with the Gemma model
Try the embedding generation tutorial: Generate text embeddings with open models

We look forward to seeing what you build!

Posted in

Data Analytics

Simplify your AI workflow with autonomous embedding generation in BigQuery

By Andong Li • 4-minute read

Data Analytics

Building a conversational agent in BigQuery using the Conversational Analytics API

By David Tamaki Szajngarten • 7-minute read

https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_idgEGOv.max-700x700.png

Data Analytics

Unlocking enterprise data to accelerate agentic AI: How Ab Initio does it

By Scott Studer • 4-minute read

Data Analytics

New BigQuery global queries let you explore distributed data with a single SQL statement

By Wawrzek Hyska • 3-minute read

Introducing BigQuery managed and SQL-native inference for open models

Jiashang Liu

Yunmeng Xie

Try Gemini 3.1 Pro

A SQL-native workflow with automated management

How it works: A practical example

Get started today

Related articles

Simplify your AI workflow with autonomous embedding generation in BigQuery

Building a conversational agent in BigQuery using the Conversational Analytics API

Unlocking enterprise data to accelerate agentic AI: How Ab Initio does it

New BigQuery global queries let you explore distributed data with a single SQL statement