Introduction to BigQuery metastore

BigQuery metastore is a fully managed metastore for data analytics products on Google Cloud. It provides a single source of truth for managing metadata from multiple sources. The metastore is accessible from BigQuery and various open data processing engines, making it a useful tool for data analysts and engineers.

For example, you can use BigQuery metastore as the catalog with open source query engines such as Apache Spark. Tables created using Spark can be queried using BigQuery without requiring you to synchronize your metadata.

Benefits

BigQuery metastore offers several advantages for data management and analysis:

  • Serverless architecture. BigQuery metastore provides a serverless architecture, eliminating the need for server or cluster management. This helps reduce operational overhead, simplifies deployment, and allows for automatic scaling based on demand.
  • Engine interoperability. BigQuery metastore provides you with direct table access in BigQuery, allowing you to query open-format tables stored in BigQuery without additional configuration. For example, you can create a table in Spark and then query it directly in BigQuery. This helps streamline your analytics workflow and reduces the need for complex data movement or ETL processes.
  • Unified user experience. BigQuery metastore provides a unified workflow across BigQuery and BigQuery Studio. This lets you use Spark directly in BigQuery and BigQuery Studio. For example:

    First, you can create a table in Spark with a BigQuery Studio notebook.

    Create table in BQMS

    Next, you can query the same Spark table in the Google Cloud console.

    Query table in BQMS

Supported integrations

You can use BigQuery metastore with the Google Cloud console, gcloud CLI, or the BigQuery REST APIs.

BigQuery metastore supports the following integrations:

Differences with BigLake Metastore

BigQuery metastore is the recommended metastore on Google Cloud.

The core differences between BigQuery metastore and BigLake Metastore include the following details:

  • BigLake Metastore is a standalone metastore service that is distinct from BigQuery and only supports Iceberg tables. It has a different three-part resource model. Tables in BigLake are not automatically discovered from BigQuery.

  • BigQuery metastore is based on the BigQuery catalog and directly integrates with BigQuery. Tables in BigQuery metastore are mutable from multiple open source engines and the same tables can be queried from BigQuery. When you use BigQuery, there is only one source of truth for your metadata. For example, BigQuery metastore supports direct integration with Spark. This integration provides a more seamless workflow and helps reduce redundancy when storing metadata and running jobs.

What's next