Create and query metastore tables from Spark
You can query Apache Spark Iceberg tables in a BigQuery notebook using open-source engines, such as Spark. These tables are regular Iceberg tables with metadata stored in the BigQuery metastore. The same table can be queried from both BigQuery and Spark.
Before you begin
- Create an Iceberg table while using Spark in a BigQuery notebook. The table schema is stored in the BigQuery metastore. For example, you can create the table with either a BigQuery notebook, Dataproc, Dataproc Serverless, or a stored procedure.
Considerations
- Currently only Iceberg tables are supported.
When running your queries, remember to map your namespace and database names as BigQuery dataset names. The following table lists the applicable key mappings to use:
Open source BigQuery Namespace BigQuery dataset Database BigQuery dataset Table (partitioned or unpartitioned) BigQuery table
View and query a table
After creating your BigQuery resources in Spark, you can view and query them in the Google Cloud console. The following example shows you the general steps to query a metastore table using interactive Spark:
Use the custom Iceberg catalog:
USE `CATALOG_NAME`;
Replace the following:
CATALOG_NAME
: the name of the Spark catalog to that you're using with your SQL job.
Create a namespace:
CREATE NAMESPACE IF NOT EXISTS NAMESPACE_NAME;
Replace the following:
NAMESPACE_NAME
: the namespace name that references your Spark table.
Use the created namespace:
USE NAMESPACE_NAME;
Create an Iceberg table:
CREATE TABLE TABLE_NAME (id int, data string) USING ICEBERG;
Replace the following:
TABLE_NAME
: a name for your Iceberg table.
Insert a table row:
INSERT INTO TABLE_NAME VALUES (1, "first row");
Use the Google Cloud console to do one of the following:
SELECT * FROM `TABLE_NAME`;