Create a table

This document shows you how to define tables with Dataform core in Dataform.

About table definitions

To define a table, you define the table type and write a SELECT statement in a type: "table" SQLX file. Dataform then compiles your Dataform core code into SQL, executes the SQL code, and creates your defined tables in BigQuery.

In a Dataform core SELECT statement, you define table structure and reference other objects of your SQL workflow.

In addition to defining tables in a type: "table" SLQX file, you can create empty tables by defining a custom SQL query in a type: "operations" SQLX file. For more information, see Create an empty table.

Reference dependencies with ref

To reference a SQL workflow object in a SELECT statement and automatically add it as a dependency, use the ref function. Dataform executes dependencies before tables that depend on them to ensure correct pipeline ordering.

The ref function is a Dataform core built-in function that is critical to dependency management in Dataform. The ref function lets you reference and automatically depend on the following objects defined in your Dataform SQL workflow instead of hard coding the schema and table names:

Dataform uses the ref function to build a dependency tree of all the tables to be created or updated.

After compiling, Dataform adds boilerplate statements like CREATE, REPLACE, INSERT, or MERGE to the SQL statement.

The following code sample shows a table definition with the use of the ref function:

config { type: "table" }

SELECT
  order_date AS date,
  order_id AS order_id,
  order_status AS order_status,
  SUM(item_count) AS item_count,
  SUM(amount) AS revenue

FROM ${ref("store_clean")}

GROUP BY 1, 2

In the ref function, you provide the name of the table or data source declaration that you want to depend on. This is typically the filename of the SQLX file in which that table or data source declaration is defined.

If a table name is overridden, use the overridden name in the ref function. For example, reference a table with config { name: "overridden_name" } as ref("overridden_name"). For more information about overriding table names, see Configure additional table settings.

When you have multiple tables of the same name in different schemas, you can reference a specific table by providing two arguments to the ref function: schema name and table name.

The following code sample shows the ref function with two arguments to specify a table within a specific schema:

config { type: "table" }
SELECT * FROM ${ref("schema", "store_clean")}

You can also add table dependencies manually to the config block for tables, assertions, data source declarations, or custom SQL operations that are not referenced in a ref function in the SELECT statement. Dataform executes these dependencies before dependent tables.

The following code sample shows a table dependency in the config block:

config { dependencies: [ "unreferenced_table" ] }
SELECT * FROM ...

For more information on dependency management in your SQL workflow, see Declare dependencies.

Reference other tables with resolve

The resolve function lets you reference a table or data source declaration in a SELECT statement like the ref function, but does not add the reference as a dependency. This means that the object referenced using resolve does not affect execution of the table that uses resolve.

For more information on built-in Dataform core functions, see the Dataform core reference.

Before you begin

  1. Create and initialize a development workspace in your repository.
  2. Optional: Declare a data source.

Required roles

To get the permissions that you need to define a table, ask your administrator to grant you the Dataform Editor (roles/dataform.editor) IAM role on workspaces. For more information about granting roles, see Manage access.

You might also be able to get the required permissions through custom roles or other predefined roles.

Create a SQLX file for table definition

Store table definition SQLX files in the definitions/ directory. To create a new SQLX file in the definitions/ directory, follow these steps:

  1. In the Cloud Console, go to the Dataform page.

    Go to the Dataform page

  2. To open a repository, click the repository name.

  3. To open a development workspace, click the workspace name.

  4. In the Files pane, next to definitions/, click the More menu.

  5. Click Create file.

  6. In the Add a file path field, enter the name of the file followed by .sqlx after definitions/. For example, definitions/my-table.sqlx.

    Filenames can only include numbers, letters, hyphens, and underscores.

  7. Click Create file.

Define the table type

To create a new table type definition, follow these steps:

  1. In your development workspace, in the Files pane, expand the definitions/ directory.
  2. Select the table definition SQLX file that you want to edit.
  3. In the file, enter the following code snippet:

    config { type: "TABLE_TYPE" }
    

    Replace TABLE_TYPE with one of the following table types:

    • table
    • incremental
    • view
  4. Optional: To define a materialized view, enter the materialized property under type: "view" in the following format:

    config {
      type: "view",
      materialized: true
    }
    

    For more information, see ITableConfig Dataform core reference.

  5. Optional: Click Format.

Define table structure and dependencies

To write a table definition SELECT statement and define table structure and dependencies, follow these steps:

  1. In your development workspace, in the Files pane, expand the definitions/ directory.
  2. Select the table definition SQLX file that you want to edit.
  3. Below the config block, write a SELECT statement.
  4. Optional: Click Format.

The following code sample shows a table definition with a SELECT statement and the ref function:

config { type: "table" }
SELECT
  customers.id AS id,
  customers.first_name AS first_name,
  customers.last_name AS last_name,
  customers.email AS email,
  customers.country AS country,
  COUNT(orders.id) AS order_count,
  SUM(orders.amount) AS total_spent
FROM
  dataform-samples.dataform_sample.crm_customers AS customers
  LEFT JOIN ${ref('order_stats')} orders
    ON customers.id = orders.customer_id

WHERE
  customers.id IS NOT NULL
  AND customers.first_name <> 'Internal account'
  AND country IN ('UK', 'US', 'FR', 'ES', 'NG', 'JP')

GROUP BY 1, 2, 3, 4, 5

Add manual table dependencies

To add table dependencies that are not referenced in the SELECT statement, but need to be executed before the current table, follow these steps:

  1. In your development workspace, in the Files pane, expand the definitions/ directory.
  2. Select the table definition SQLX file that you want to edit.
  3. In the config block of the table, enter the following code snippet:

    dependencies: [ "DEPENDENCY_TABLE", ]
    

    Replace DEPENDENCY_TABLE with the filename of the table you want to add as a dependency. You can enter multiple filenames.

  4. Optional: Click Format.

The following code sample shows two tables added as manual table dependencies to the config block of a table definition file:

config { dependencies: [ "some_table", "some_other_table" ] }

What's next