Migrate from legacy Dataform

Stay organized with collections Save and categorize content based on your preferences.
This document shows you how to import a legacy Dataform project into Dataform in Google Cloud.

About differences between legacy Dataform and Dataform in Google Cloud

Dataform is a serverless service for data analysts to develop and deploy tables, incremental tables, or views to BigQuery. Dataform offers a web environment for SQL workflow development, connection with GitHub and GitLab, continuous integration, continuous deployment, and workflow execution.

For more information about features of Dataform in Google Cloud, see Overview of Dataform features.

Legacy Dataform features not supported in Google Cloud at this time

  • Configuring consistent egress IP addresses, required to allow Dataform IPs with GitHub or GitLab.
  • Configuring environments in environments.json. You can use the REST API to configure compilation overrides.
  • Configuring isolated development environments.
  • Configuring schedules in environments.json. You can use the REST API and Dataform Client Libraries to schedule Dataform in different environments with products such as Workflows or Cloud Composer.
  • Manually running unit tests.
  • Running compiled SQL during development.

Known limitations

Dataform in Google Cloud runs on a plain V8 runtime and does not support additional functionality and modules provided by Node.js. Projects without a name field in package.json generate diffs on package-lock.json every time packages are installed. To avoid this, you need to add a name property in package.json.

git+https:// URLs for dependencies in package.json are not currently supported. Convert such URLs to plain https:// archive URLs. For example, convert git+https://github.com/dataform-co/dataform-segment.git#1.5 to https://github.com/dataform-co/dataform-segment/archive/1.5.tar.gz.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.

  4. Enable the BigQuery and Dataform APIs.

    Enable the APIs

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  6. Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.

  7. Enable the BigQuery and Dataform APIs.

    Enable the APIs

Required roles

To get the permissions that you need to import a legacy project, ask your administrator to grant you the Dataform Admin (roles/dataform.admin) IAM role on repositories. For more information about granting roles, see Manage access.

Import a legacy project

To import a legacy project in Dataform in Google Cloud, follow these steps in the Google Cloud console:

  1. Ensure that your Dataform project in app.dataform.co is connected to GitHub or GitLab.
  2. In the Google Cloud console, go to the Dataform page.

    Go to the Dataform page

  3. Create a new repository.

  4. Connect the repository to the remote Git repository that houses your legacy project.

  5. Optional: Configure execution schedules with Workflows and Cloud Scheduler, Cloud Composer, or the REST API.

Configure your imported Dataform project

To adjust your legacy project to Dataform in Google Cloud, follow these steps:

  1. In the Google Cloud console, go to the Dataform page.

    Go to the Dataform page

  2. Select your repository.

  3. Create a development workspace.

  4. Go to the development workspace.

  5. In dataform.json, add the defaultLocationparameter. This parameter is ignored by app.dataform.co.

    "defaultLocation": "DATASET_LOCATION",
    

    Replace DATASET_LOCATION with the default location of your BigQuery dataset, for example, US, EU, or us-east1.

  6. In package.json, do the following:

    1. Upgrade @dataform/core to 2.0.0-beta.1 or later.
    2. Add a package name in the following format:

      {
          "name": "PACKAGE_NAME",
          "dependencies": {
              "@dataform/core": "2.0.0-beta.1"
          }
      }
      

      Replace PACKAGE_NAME with a name for your Dataform package, for example, your project name.

    3. Convert git+https:// URLs in package.json dependencies to plain https:// archive URLs.

      For example, convert git+https://github.com/dataform-co/dataform-segment.git#1.5 to https://github.com/dataform-co/dataform-segment/archive/1.5.tar.gz.

      If you are using git+https:// URLs in pre-built dataform packages, check the updated installation instructions for these packages on their release pages, for example, the dataform-segment release page.

What's next