Data Analytics

Analyze images and videos in BigQuery using Gemini 1.0 Pro Vision

April 10, 2024

https://storage.googleapis.com/gweb-cloudblog-publish/images/Next24_Blog_blank_2-08.max-2500x2500.jpg

Honglin Zheng

Software Engineer

Alicia Williams

Developer Advocate, Google Cloud

Editor's note: Since this post's publication, new model versions have become available for analyzing multimodal content in BigQuery. Please refer to this documentation for the most up-to-date information.

With the proliferation of digital devices and platforms including social media, mobile devices and IoT sensors, organizations are increasingly generating unstructured data in the form of images, audio files, videos, and documents etc. Over the last few months, we launched BigQuery integrations with Vertex AI to leverage Gemini 1.0 Pro, PaLM , Vision AI, Speech AI, Doc AI, Natural Language AI and more to help you interpret and extract meaningful insights from unstructured data.

While Vision AI provides image classification and object recognition capabilities, large language models (LLMs) unlock new visual use cases. To that end, we are expanding BigQuery and Vertex AI integrations to support multimodal generative AI use cases with Gemini 1.0 Pro Vision. Using familiar SQL statements, you can take advantage of Gemini 1.0 Pro Vision directly in BigQuery to analyze both images and videos by combining them with your own text prompts.

https://storage.googleapis.com/gweb-cloudblog-publish/images/1_-_capabilities_updated.max-1300x1300.png

A birds-eye view of Vertex AI integration capabilities for analyzing unstructured data in BigQuery

Within a data warehouse setting, multimodal capabilities can help enhance your unstructured data analysis across a variety of use cases:

Object recognition: Answer questions related to fine-grained identification of the objects in images and videos.
Info seeking: Combine world knowledge with information extracted from the images and videos.
Captioning/description: Generate descriptions of images and videos with varying levels of detail.
Digital content understanding: Answer questions by extracting information from content like infographics, charts, figures, tables, and web pages.
Structured content generation: Generate responses in formats like HTML and JSON based on provided prompt instructions.

Turning unstructured data into structured data

With minimal prompt adjustments, Gemini 1.0 Pro Vision can produce structured responses in convenient formats like HTML or JSON, making them easy to consume in downstream tasks. In a data warehouse such as BigQuery, having structured data means you can use the results in SQL operations and combine it with other structured datasets for deeper analysis.

For example, imagine you have a large dataset that contains images of cars. You want to understand a few basic details about the car in each image. This is a use case that Gemini 1.0 Pro Vision can help with!

https://storage.googleapis.com/gweb-cloudblog-publish/images/2_-_prompt_without_json.max-1400x1400.png

Combining text and image into a prompt for Gemini 1.0 Pro Vision, with a sample response.

Dataset from: 3D Object Representations for Fine-Grained Categorization Jonathan Krause, Michael Stark, Jia Deng, Li Fei-Fei 4th IEEE Workshop on 3D Representation and Recognition, at ICCV 2013 (3dRR-13). Sydney, Australia. Dec. 8, 2013.

As you can see, Gemini’s response is very thorough! But while the format and extra information are great if you’re a person, they’re not so great if you’re a data warehouse. Rather than turning unstructured data into more unstructured data, you can make changes to the prompt to direct the model on how to return a structured response.

https://storage.googleapis.com/gweb-cloudblog-publish/images/3_-_prompt_with_json.max-1400x1400.png

Adjusting the text portion of the prompt to indicate a structured response from Gemini 1.0 Pro Vision, with a sample result.

You can see how this response would be much more useful in an environment like BigQuery.

Now let's see how to prompt Gemini 1.0 Pro Vision directly in BigQuery to perform this analysis over thousands of images!

Accessing Gemini 1.0 Pro Vision from BigQuery ML

Gemini 1.0 Pro Vision is integrated with BigQuery through the ML.GENERATE_TEXT() function. To unlock this function in your BigQuery project, you will need to create a remote model that represents a hosted Vertex AI large language model. Fortunately, it's just a few lines of SQL:

Once the model is created, you can combine your data with the ML.GENERATE_TEXT() function in your SQL queries to generate text.

A few notes on the ML.GENERATE_TEXT() function syntax when it is pointing to a gemini-pro-vision model endpoint, as is the case in this example:

TABLE: takes an object table as input, where it can contain different types of unstructured objects (e.g. images, videos).
PROMPT: takes a single string text prompt that is placed as part of the option STRUCT (dissimilar to the case when using the gemini-pro model) and applies this prompt to each object, row-by-row, contained in the object TABLE.

Let's take a peek at the results.

https://storage.googleapis.com/gweb-cloudblog-publish/images/4_-_initial_results.max-1500x1500.png

We can add some SQL to this query to extract each of the values for brand, model, and year into new fields for use downstream.

Now the responses have been parsed into new, structured columns.

https://storage.googleapis.com/gweb-cloudblog-publish/images/5_-_parsed_results.max-1100x1100.png

And there you have it. We've just turned a collection of unlabeled, raw images into structured data, fit for analysis in a data warehouse. Imagine joining this new table with other relevant enterprise data. With a dataset of historical car sales, for example, you could determine the average or median sale price for similar cars in a recent time period. This is just a taste of the possibilities that are uncovered by bringing unstructured data into your data workflows!

When getting started with Gemini 1.0 Pro Vision in BigQuery, there are a few important items to note:

You need an enterprise or enterprise plus reservation to run Gemini 1.0 Pro Vision model inference over an object table. For reference see the BigQuery editions documentation.
Limits apply to functions that use Vertex AI large language models (LLMs) and Cloud AI services, so review the current quota in place for the Gemini 1.0 Pro Vision model.

Next steps

Bringing generative AI directly into BigQuery has enormous benefits. Instead of writing custom Python code and building data pipelines between BigQuery and the generative AI model APIs, you can now just write a few lines of SQL! BigQuery manages the infrastructure and helps you scale from one prompt to thousands. Check out the overview and demo video, and the documentation to see more example queries using ML.GENERATE_TEXT() with Gemini 1.0 Pro Vision.

Coming to Next ‘24? Check out session Power data analytics with generative AI using BigQuery and Gemini, where you can see Gemini Vision Pro and BigQuery in action.

Posted in

Data Analytics

How to reduce costs with Managed Service for Apache Kafka: CUDs, compression and more

By Qiqi Wu • 5-minute read

Data Analytics

How to use gen AI for better data schema handling, data quality, and data generation

By Deb Lee • 9-minute read

Data Analytics

BigQuery ML is now compatible with open-source gen AI models

By Vaibhav Sethi • 3-minute read

Data Analytics

Introducing BigQuery metastore, a unified metadata service with Apache Iceberg support

By Yuri Volobuev • 4-minute read

Analyze images and videos in BigQuery using Gemini 1.0 Pro Vision

Honglin Zheng

Alicia Williams

Turning unstructured data into structured data

Accessing Gemini 1.0 Pro Vision from BigQuery ML

Next steps

Related articles

How to reduce costs with Managed Service for Apache Kafka: CUDs, compression and more

How to use gen AI for better data schema handling, data quality, and data generation

BigQuery ML is now compatible with open-source gen AI models

Introducing BigQuery metastore, a unified metadata service with Apache Iceberg support