Declare a data source

This document shows you how to declare BigQuery data sources with Dataform core.

You can declare any BigQuery table type as a data source in Dataform. Declaring BigQuery data sources that are external to Dataform lets you treat those data sources as first-class Dataform objects. After you declare a data source, you can reference or resolve it in the same way as any other table in Dataform.

Before you begin

Before you declare a data source, create and initialize a development workspace in your repository.

Required roles

To get the permissions that you need to declare a data source, ask your administrator to grant you the Dataform Editor (roles/dataform.editor) IAM role on workspaces. For more information about granting roles, see Manage access.

You might also be able to get the required permissions through custom roles or other predefined roles.

Create a SQLX file for data source declaration

Store SQLX files for data source declarations in the definitions/ directory. To create a new SQLX file in the definitions/ directory, follow these steps:

  1. In the Cloud Console, go to the Dataform page.

    Go to the Dataform page

  2. Select a repository.

  3. Select a development workspace.

  4. In the Files pane, next to definitions/, click the More menu.

  5. Click Create file.

  6. In the Create new file pane, do the following:

    1. In the Add a file path field, after definitions/, enter the name of the file followed by .sqlx. For example, definitions/dataset-declaration.sqlx.

      Filenames can only include numbers, letters, hyphens, and underscores.

    2. Click Create file.

Declare a data source

You can declare one data source per a SQLX declaration file. To declare a data source in the configuration block of an SQLX file, follow these steps:

  1. In your development workspace, in the Files pane, click your SQLX file for data source declaration.
  2. In the file, enter the following code snippet:

    config {
      type: "declaration",
      database: "DATABASE",
      schema: "SCHEMA",
      name: "NAME",
    }
    

    Replace the following:

    • DATABASE: the project ID of the project which contains the data source.
    • SCHEMA: the BigQuery dataset in which the data source exists.
    • NAME: the name of the table or view that you want to use as the data source. You can later use that name to reference the data source in Dataform.
  3. Optional: Click Format.

The following code sample shows a sample declaration of the shakespeare table in the samples dataset of the bigquery-public-data project as a data source:

    config {
      type: "declaration",
      database: "bigquery-public-data",
      schema: "samples",
      name: "shakespeare",
    }

What's next