Create and execute a SQL workflow in Dataform
This quickstart walks you through the following process in Dataform to create a SQL workflow and execute it in BigQuery:
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the BigQuery and Dataform APIs.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the BigQuery and Dataform APIs.
Required roles
To get the permissions that you need to perform all tasks in this tutorial, ask your administrator to grant you the following IAM roles on your project:
-
Dataform Admin (
roles/dataform.admin
) - repositories -
Dataform Editor (
roles/dataform.editor
) - workspaces and workflow invocations
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
Create a Dataform repository
In the Google Cloud console, go to the Dataform page.
Click
Create repository.On the Create repository page, do the following:
In the Repository ID field, enter
quickstart-repository
.In the Region list, select
europe-west4
.Click Create.
Create and initialize a Dataform development workspace
In the Google Cloud console, go to the Dataform page.
Click
quickstart-repository
.Click
Create development workspace.In the Create development workspace window, do the following:
In the Workspace ID field, enter
quickstart-workspace
.Click Create.
The development workspace page appears.
Click Initialize workspace.
Create a view
In the following sections, you define a view that you will later use as a data source for a table.
Create a SQLX file for defining a view
In the Files pane, next to
definitions/
, click the More menu.Click Create file.
In the Create new file pane, do the following:
In the Add a file path field, enter
definitions/quickstart-source.sqlx
.Click Create file.
Define a view
In the Files pane, expand the definitions folder.
Click
definitions/quickstart-source.sqlx
.In the file, enter the following code snippet:
config { type: "view" } SELECT "apples" AS fruit, 2 AS count UNION ALL SELECT "oranges" AS fruit, 5 AS count UNION ALL SELECT "pears" AS fruit, 1 AS count UNION ALL SELECT "bananas" AS fruit, 0 AS count
Click Format.
Create a table
In the following sections, you define the table type in a SQLX file, and then
write a SELECT
statement to define the table structure within the same file.
Create a SQLX file for table definition
In the Files pane, next to
definitions/
, click the More menu, and then select Create file.In the Add a file path field, enter
definitions/quickstart-table.sqlx
.Click Create file.
Define the table type, structure and dependencies
In the Files pane, expand the
definitions/
directory.Select
quickstart-table.sqlx
, and then enter the following table type andSELECT
statement:config { type: "table" } SELECT fruit, SUM(count) as count FROM ${ref("quickstart-source")} GROUP BY 1
Click Format.
After defining the table type, Dataform throws a query validation error
because quickstart-source
does not exist in BigQuery yet. This
error is resolved when you execute the SQL workflow later in this tutorial.
Grant Dataform access to BigQuery
To execute workflows in BigQuery, the Dataform service account must have the following required roles:
- BigQuery Data Editor on projects to which Dataform needs both read and write access. They usually include the project hosting your Dataform repository.
- BigQuery Data Viewer on projects to which Dataform needs read-only access.
- BigQuery Job User on the project hosting your Dataform repository.
To grant these roles, follow these steps:
In the Google Cloud console, go to the IAM page.
Click Add.
In the New principals field, enter your Dataform service account ID.
In the Select a role drop-down list, select the BigQuery Job User role.
Click Add another role, and then in the Select a role drop-down list, select the BigQuery Data Editor role.
Click Add another role, and then in the Select a role drop-down list, select the BigQuery Data Viewer role.
Click Save.
Execute the workflow
In the Google Cloud console, go to the Dataform page.
On the
quickstart-workspace
page, click Start execution.Click All actions.
In the Execute pane, click Start execution.
Dataform uses the default repository settings to create the contents of your workflow in a BigQuery dataset called
dataform
.
View execution logs in Dataform
On the
quickstart-repository
page, click Workflow Execution Logs.To view details of your execution, click the latest execution.
Clean up
To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.
Delete the dataset created in BigQuery
To avoid incurring charges for BigQuery assets, delete the
dataset called dataform
.
In the Google Cloud console, go to the BigQuery page.
In the Explorer panel, expand your project and select
dataform
.Click the
Actions menu, and then select Delete.In the Delete dataset dialog, enter
delete
into the field, and then click Delete.
Delete the Dataform development workspace
Dataform development workspace creation incurs no costs, but to delete the development workspace you can follow these steps:
In the Google Cloud console, go to the Dataform page.
Click
quickstart-repository
.In the Development workspaces tab, click the
More menu byquickstart-workspace
, and then select Delete.To confirm, click Delete.
Delete the Dataform repository
Dataform repository creation incurs no costs, but to delete the repository you can follow these steps:
In the Google Cloud console, go to the Dataform page.
By
quickstart-repository
, click the More menu, and then select Delete.In the Delete repository window, enter the name of the repository to confirm deletion.
To confirm, click Delete.
What's next
To learn more about Dataform, see Dataform overview.
To learn more about Dataform features, see Overview of Dataform features.
To learn more about Dataform core, see Overview of Dataform core.
To learn how to override default Dataform settings of your repository, see Configure Dataform settings.
To learn more about managing datasets in BigQuery, see Managing datasets.
To learn more about managing tables in BigQuery, see Manage tables.