Create and query metastore tables from Spark

You can query Apache Spark Iceberg tables in a BigQuery notebook using open-source engines, such as Spark. These tables are regular Iceberg tables with metadata stored in the BigQuery metastore. The same table can be queried from both BigQuery and Spark.

Before you begin

Considerations

  • Currently only Iceberg tables are supported.
  • When running your queries, remember to map your namespace and database names as BigQuery dataset names. The following table lists the applicable key mappings to use:

    Open source BigQuery
    Namespace BigQuery dataset
    Database BigQuery dataset
    Table (partitioned or unpartitioned) BigQuery table

View and query a table

After creating your BigQuery resources in Spark, you can view and query them in the Google Cloud console. The following example shows you the general steps to query a metastore table using interactive Spark:

  1. Use the custom Iceberg catalog:

    USE `CATALOG_NAME`;

    Replace the following:

    • CATALOG_NAME: the name of the Spark catalog to that you're using with your SQL job.
  2. Create a namespace:

    CREATE NAMESPACE IF NOT EXISTS NAMESPACE_NAME;

    Replace the following:

    • NAMESPACE_NAME: the namespace name that references your Spark table.
  3. Use the created namespace:

    USE NAMESPACE_NAME;
  4. Create an Iceberg table:

    CREATE TABLE TABLE_NAME (id int, data string) USING ICEBERG;

    Replace the following:

    • TABLE_NAME: a name for your Iceberg table.
  5. Insert a table row:

    INSERT INTO TABLE_NAME VALUES (1, "first row");
  6. Use the Google Cloud console to do one of the following:

     SELECT * FROM `TABLE_NAME`;

What's next