Jump to Content
Storage & Data Transfer

Understand your Cloud Storage footprint with AI-powered queries and insights

October 1, 2024
Manjul Sahay

Group Product Manager, Google Cloud Storage

Jinal Dalal

Engineering Director, Google Cloud Storage

Google Cloud Summit Series

Discover the latest in AI, Security, Workspace, App Dev, & more.

Register

Google Cloud Storage is at the core of many customers’ cloud deployment because of its simplicity, affordability and near-infinite scale. But managing millions or billions of objects across numerous projects and with hundreds of developers can be complex, often requiring a team of analysts manually analyzing data for insights. Earlier this year, we introduced the experimental launch of a powerful new capability for Cloud Storage: storage insight generation. Leveraging Gemini, our largest and most capable AI model, storage insight generation lets you ask questions to uncover valuable insights about your Cloud Storage environment. 

With storage insight generation you can: 

  • eliminate manual data analysis and get answers rapidly
  • proactively find potential security and compliance risks
  • identify possible cost-savings opportunities to optimize your storage spend

Google Cloud is the first hyperscale cloud provider to generate storage insights specific to an environment by querying object metadata and using the power of large language models (LLMs). These AI-powered insights generated directly from your Cloud Storage object metadata give you a new level of control and understanding, even at billions of objects scale. In this blog post, we show you how to get started with storage insight generation, to give you a sense of what AI assistance can do for your storage operations. 

Generating storage insights with Gemini

To get started, you need to set up the Storage Insights Dataset, a new feature that collects and centralizes all bucket and object metadata across your Google Cloud projects and regions in a BigQuery linked dataset. Each dataset is refreshed every 24 hours and can retain up to 90 days of historical data. During the dataset setup, you need to select insight generation with the Gemini option.

After the initial setup, you’ll be able to access the enhanced user experience, which includes a short summary of your dataset. 

To get started, we offer a pre-curated set of prompts with validated responses. We selected these prompts based on customers' most common questions. These prompts are also a great starting point from which to craft custom prompts.

https://storage.googleapis.com/gweb-cloudblog-publish/images/1_OverviewPage.max-2200x2200.png

Storage Insights start page in the Google Cloud Console UI

When you select any of these curated prompts, you can see verified responses. These responses also include charts, allowing you to translate complex data into clear, visual representations, so you can easily understand, analyze, and share key findings across teams.

https://storage.googleapis.com/gweb-cloudblog-publish/images/2_PreCurated.max-2200x2200.png

Storage Insights response page in the Google Cloud Console UI

You can also use multi-turn chat to dive deeper into insights and run your own interactive analyses. The example below shows a natural language prompt being answered by combining metadata across millions of objects. For every response, the underlying query is also shown, and you can navigate to BigQuery with one click to edit or modify the query.

https://storage.googleapis.com/gweb-cloudblog-publish/images/3_CustomQuery.max-2200x2200.png

Storage Insights response page in the Google Cloud Console UI

Trust, accuracy and safety

Despite generative AI’s powerful capabilities, there may be occasional hallucinations. To help you assess the generated answer, we have included multiple informational indicators: every response includes the SQL query for easy validation, curated prompts show a ‘high accuracy’ tag, and we include helpful information about data freshness. You can also use the thumbs up/down indicator to share your feedback about the response.

We use the AI model to convert natural language to appropriate SQL query, query the dataset, and summarize responses. Your object and bucket metadata is completely yours and not used for training Google Cloud’s AI models: we do not store any customer prompts unless you choose to share them through the feedback option. Datasets only contain object and bucket metadata based on your selection of projects and do not have access to object content. Finally, we follow Google’s Responsible AI approach to validate answers from the model and increase content safety. 

Simpler and easier storage management

Along with this new capability, Cloud Storage offers a comprehensive set of management features to help you manage your storage more easily and efficiently, at scale. We invite Cloud Storage customers to sign up to start generating insights with Gemini today by reaching out to your Google Cloud account team. To learn more, please watch this video recording of the Google Cloud Next 2024 ‘Managing Cloud Storage at scale with Gemini’ session.

Posted in