Jump to Content
Data Analytics

Introducing BigQuery managed and SQL-native inference for open models

January 15, 2026
Jiashang Liu

Software Engineer

Yunmeng Xie

Software Engineer

Try Gemini 3

Our most intelligent model is now available on Vertex AI and Gemini Enterprise

Try now

BigQuery provides access to a variety of LLMs for text and embedding generation, including Google's Gemini models, Google-managed models from partners like Anthropic and Mistral. Using Gemini models and Google-managed partner models in BigQuery is simple — just create the model with the foundation model name and run inference directly in SQL queries. Today, we are bringing this same simplicity and power to any model you may choose from Hugging Face or Vertex AI Model Garden.

A SQL-native workflow with automated management

With the launch of managed third-party generative AI inference in BigQuery (Preview), you can now run open models using just two SQL statements. 

This new capability delivers four key benefits:

  1. Simplified deployment: Deploy open models using a single CREATE MODEL SQL statement with the model id string (e.g., google/gemma-3-1b-it). BigQuery automatically provisions the compute resources with default configurations.
  2. Automated resource management: BigQuery automatically releases idle compute resources, preventing unintended costs. You can configure idle time via endpoint_idle_ttl.
  3. Granular resource control: You can customize backend computing resources (like machine types and min/max replicas) directly within your CREATE MODEL statement to meet your performance and cost needs.
  4. Unified SQL interface: The entire workflow — from model creation and inference to cost management and cleanup — is managed directly in BigQuery using SQL.

How it works: A practical example

Let’s take a look at the process of creating and utilizing an open model.

Step 1: Create a BigQuery managed open model
To use an open model from Hugging Face or Vertex AI Model Garden, use a CREATE MODEL statement along with the open model ID. It typically takes a few minutes for the query to complete, depending on the model size and machine types.

Hugging Face models
Specify the option hugging_face_model_id in the format of provider_name/model_name. For example, sentence-transformers/all-MiniLM-L6-v2 .

Loading...

Vertex AI Model Garden models
Specify the option model_garden_model_name in the format publishers/publisher/models/model_name@model_version. For example, publishers/google/models/gemma3@gemma-3-1b-it .

Loading...

For demanding workloads, you can customize deployment settings (machine types, replica counts, endpoint idle time) to improve scalability and manage costs. You can also use Compute Engine reservations to secure GPU instances for consistent performance. See CREATE MODEL syntax for all the options.

Step 2: Run batch inference
Once the above CREATE MODEL job finishes, you can use it with AI.GENERATE_TEXT (for LLM inference) or AI.GENERATE_EMBEDDING (for embedding generation) on your data in BigQuery.

Loading...

Vertex AI endpoint lifecycle management and cost control
BigQuery offers flexible controls over Vertex AI endpoint lifecycle and costs through both automated and manual options.

  • Automated control: The endpoint_idle_ttl option enables automated resource recycling. If the model isn't used for the specified duration (e.g., INTERVAL 10 HOUR), BigQuery automatically “undeploys” the Vertex AI endpoint for you, stopping all costs.

  • Manual control: You can also manually “undeploy“ an endpoint to immediately stop the cost, or redeploy an endpoint using a simple ALTER MODEL statement.
Loading...

Easy resource cleanup
When you are done with using a model, simply drop it. BigQuery automatically cleans up all associated Vertex AI resources (like the endpoint and model) for you, so you are no longer charged for them.

Loading...

Get started today

BigQuery’s new managed inference capability for 3P models fundamentally changes how data teams access and use third-party gen AI models. By consolidating the entire model lifecycle management into a familiar SQL interface, we're removing the operational friction and making powerful open models accessible to every BigQuery user, from data analysts to AI/ML engineers.  For comprehensive documentation and tutorials, please refer to the following resources:

We look forward to seeing what you build!

Posted in