About differences between legacy Dataform and Dataform in Google Cloud
Dataform is a serverless service for data analysts to develop and deploy tables, incremental tables, or views to BigQuery. Dataform offers a web environment for SQL workflow development, connection with GitHub and GitLab, continuous integration, continuous deployment, and workflow execution.
For more information about features of Dataform in Google Cloud, see Overview of Dataform features.
Legacy Dataform features not supported in Google Cloud at this time
- Configuring consistent egress IP addresses, required to allow Dataform IPs with GitHub or GitLab.
- Configuring environments in environments.json. You can use the REST API to configure compilation overrides.
- Configuring isolated development environments.
- Configuring schedules in
environments.json
. You can use the REST API and Dataform Client Libraries to schedule Dataform in different environments with products such as Workflows or Cloud Composer. - Manually running unit tests.
- Running compiled SQL during development.
Known limitations
Dataform in Google Cloud runs on a plain V8 runtime and does not
support additional functionality and modules provided by Node.js.
Projects without a name field in package.json
generate diffs on
package-lock.json
every time packages are installed. To avoid this, you need
to add a name
property in package.json
.
git
+https://
URLs for dependencies in package.json
are not
currently supported. Convert such URLs to plain https://
archive URLs.
For example, convert git+https://github.com/dataform-co/dataform-segment.git#1.5
to https://github.com/dataform-co/dataform-segment/archive/1.5.tar.gz
.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.
-
Enable the BigQuery and Dataform APIs.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.
-
Enable the BigQuery and Dataform APIs.
Required roles
To get the permissions that you need to import a legacy project,
ask your administrator to grant you the
Dataform Admin (roles/dataform.admin
) IAM role on repositories.
For more information about granting roles, see
Manage access.
Import a legacy project
To import a legacy project in Dataform in Google Cloud, follow these steps in the Google Cloud console:
- Ensure that your Dataform project in
app.dataform.co
is connected to GitHub or GitLab. In the Google Cloud console, go to the Dataform page.
Connect the repository to the remote Git repository that houses your legacy project.
Optional: Configure execution schedules with Workflows and Cloud Scheduler, Cloud Composer, or the REST API.
Configure your imported Dataform project
To adjust your legacy project to Dataform in Google Cloud, follow these steps:
In the Google Cloud console, go to the Dataform page.
Select your repository.
Go to the development workspace.
In
dataform.json
, add thedefaultLocation
parameter. This parameter is ignored byapp.dataform.co
."defaultLocation": "DATASET_LOCATION",
Replace DATASET_LOCATION with the default location of your BigQuery dataset, for example,
US
,EU
, orus-east1
.In
package.json
, do the following:- Upgrade
@dataform/core
to2.0.0-beta.1
or later. Add a package name in the following format:
{ "name": "PACKAGE_NAME", "dependencies": { "@dataform/core": "2.0.0-beta.1" } }
Replace PACKAGE_NAME with a name for your Dataform package, for example, your project name.
Convert
git+https://
URLs inpackage.json
dependencies to plainhttps://
archive URLs.For example, convert
git+https://github.com/dataform-co/dataform-segment.git#1.5
tohttps://github.com/dataform-co/dataform-segment/archive/1.5.tar.gz
.If you are using
git+https://
URLs in pre-built dataform packages, check the updated installation instructions for these packages on their release pages, for example, the dataform-segment release page.
- Upgrade
What's next
- To learn more about Dataform in Google Cloud, see Overview of Dataform.
- To learn more about features of Dataform in Google Cloud, see Overview of Dataform features.
- To learn how to create and initialize a development workspace, see Create a workspace.
- To learn how to manually trigger an execution, see Trigger execution.
- To learn how to configure execution schedules with Workflows and Cloud Scheduler, see Schedule executions with Workflows and Cloud Scheduler.
- To learn how to configure execution schedules with Cloud Composer, see Schedule executions with Cloud Composer.