Getting Started with Cloud Dataprep

Feature Availability: This feature is available in the following editions:


  • Cloud Dataprep Standard by TRIFACTA® INC.
  • Cloud Dataprep Premium by TRIFACTA INC.

Cloud Dataprep by TRIFACTA® INC. enables you to rapidly transform disparate datasets of any size into usable data for the entire enterprise. Ingest, explore, and transform your data through a leading-edge interface, reducing the time to prepare your data from weeks to minutes. Cloud Dataprep by TRIFACTA INC. is integrated with the Google Cloud Platform and operated by partner Trifacta.

Applicable Product Editions

These setup instructions apply to the following editions of the product:

NOTE: If you are an existing Cloud Dataprep by TRIFACTA INC. customer, you can use the Marketplace to upgrade to one of the supported Marketplace editions or to enable your current product edition for a new project. You can also choose to continue using Cloud Dataprep by TRIFACTA INC..

  • Cloud Dataprep Premium by TRIFACTA INC.

    NOTE: If you are purchasing Cloud Dataprep Premium by TRIFACTA INC., you must contact Support before you purchase the product. Additional information is available below.

  • Cloud Dataprep Standard by TRIFACTA INC.

NOTE: These product editions are licensed through the Google Marketplace from Trifacta. For more information on licensing or upgrading from Cloud Dataprep by TRIFACTA INC., please see the Google Marketplace listing.

For more information on getting started with the base product edition for Cloud Dataprep by TRIFACTA INC., see https://cloud.google.com/dataprep/docs/quickstarts/quickstart-dataprep.

Before You Begin

Before you begin, please review the following pre-requisites.

General

To use either product edition, you must have the following already set up in the Google Cloud Platform.

NOTE: If you are upgrading from Cloud Dataprep by TRIFACTA INC., you should already have these services enabled.

  1. Create or set up a Google Cloud project.
  2. Enable billing on that project.
  3. Enable the following services:
    1. Cloud Dataflow
    2. BigQuery
    3. Cloud Storage APIs

For more information, see https://cloud.google.com/dataprep/docs/quickstarts/quickstart-dataprep#before-you-begin.

Set up your storage bucket

On Google Cloud Storage, you must have a bucket set up for use with your project. For more information, see https://cloud.google.com/dataprep/docs/quickstarts/quickstart-dataprep#create_a_cloud_storage_bucket_in_your_project.

Premium-only requirements

Whitelist the IP address of the Cloud Dataprep Service

Feature Availability: This feature is available in Cloud Dataprep Premium by TRIFACTA® INC.

If you are connecting to relational sources, you must whitelist the IP address of the Cloud Dataprep service for your database instances. The IP addresses of the Cloud Dataprep service are the following:

NOTE: On the database server for each relational source type (Oracle, SQL Server, etc.), you must whitelist these IP addresses.

104.198.44.13/32
34.71.238.145/32
104.198.217.74/32
34.68.178.136/32

Tip: To verify that you have whitelisted the IP addresses appropriately, you can create a connection of the relational connection type from inside the Cloud Dataprep application. This step is described later.

For more information, please contact Support.

Contact Support

By default, the Google Marketplace is configured to enable licensing of Cloud Dataprep Standard by TRIFACTA INC..

NOTE: If you are purchasing Cloud Dataprep Premium by TRIFACTA INC., you must contact Support, which can provide appropriate a customized URL for you to the solution in the Google Marketplace.

Purchase and enable through the Google Marketplace

After you have completed the above steps, please proceed through the Google Marketplace to complete your purchase. Your purchase covers:

  • Basic entitlement
  • Licensing for each Google Cloud projects

For more information, see https://console.cloud.google.com/marketplace/product/endpoints/cloud-dataprep-editions-v2.

Setup

After the product has been licensed for your project, please complete the following steps for your account.

Required additional permissions for Cloud Dataprep Premium by TRIFACTA INC.

Feature Availability: This feature is available in Cloud Dataprep Premium by TRIFACTA INC.

Cloud Dataprep Premium by TRIFACTA INC. requires special permissions to use the project. For more information, see Create IAM Role for Cloud Dataprep.

Set up directories

Each user must configure the directories on Google Cloud Storage for use with the product. For more information, see https://cloud.google.com/dataprep/docs/quickstarts/quickstart-dataprep#set_up.

Project settings

You should review the settings for your project. See Project Settings Page.

Verify operations

Before inviting other users, you should run a simple job through the product.

Prepare Your Sample Dataset

To complete this test, you should locate or create a simple dataset. Your dataset should be created in the format that you wish to test.

Tip: The simplest way to test is to create a two-column CSV file with at least 25 non-empty rows of data. This data can be uploaded through the application.

Characteristics:

  • Two or more columns.
  • If there are specific data types that you would like to test, please be sure to include them in the dataset.
  • A minimum of 25 rows is required for best results of type inference.
  • Ideally, your dataset is a single file or sheet.

Verification Steps

Steps:

  1. Login to the application.For Cloud Dataprep by TRIFACTA INC. editions, your login is your gmail address.

  2. In the application menu bar, click Library.
  3. Click Import Data. See Import Data Page.
    1. Select the connection where the dataset is stored. For datasets stored on your local desktop, click Upload.
    2. Select the dataset.
    3. In the right panel, click the Add Dataset to a Flow checkbox. Enter a name for the new flow.
    4. Click Import and Add to Flow.

  4. In the left menu bar, click the Flows icon. Flows page, open the flow you just created. See Flows Page.
  5. In the Flows page, click the dataset you just imported. Click Add new Recipe.
  6. Select the recipe. Click Edit Recipe.
  7. The initial sample of the dataset is opened in the Transformer page, where you can edit your recipe to transform the dataset.
    1. In the Transformer page, some steps are automatically added to the recipe for you. So, you can run the job immediately.
    2. You can add additional steps if desired. See Transformer Page.
  8. Click Run Job.
    1. If options are presented, select the defaults.

    2. To generate results in other formats or output locations, click Add Publishing Destination. Configure the output formats and locations.
    3. To test dataset profiling, click the Profile Results checkbox. Note that profiling runs as a separate job and may take considerably longer.
    4. See Run Job Page.

  9. When the job completes, you should see a success message under the Jobs tab in the Flow View page.
    1. Troubleshooting: Either the Transform job or the Profiling job may break. To localize the problem, try re-running a job by deselecting the broken job type or running the job on a different running environment (if available). You can also download the log files to try to identify the problem. See Job Details Page.
  10. Click View Results from the context menu for the job listing. In the Job Details page, you can see a visual profile of the generated results. See Job Details Page.
  11. In the Output Destinations tab, click a link to download the results to your local desktop.
  12. Load these results into a local application to verify that the content looks ok.

Checkpoint: You have verified importing from the selected datastore and transforming a dataset. If your job was successfully executed, you have verified that the product is connected to the job running environment and can write results to the defined output location. Optionally, you may have tested profiling of job results. If all of the above tasks completed, the product is operational end-to-end.

Verify IP address whitelisting

If you have whitelisted the Cloud Dataprep service IP addresses for your database server, you can create a connection to the database from inside the Cloud Dataprep application. If you are able to successfully read data into the application from your database, then the whitelist has been specified correctly. For more information, see Connection Types.

Invite Users

You can invite other people to join your project at this time. For more information, see https://cloud.google.com/iam/docs/quickstart.

Resources

The following resources can assist users in getting started with wrangling.

Access documentation: To access the full customer documentation, from the left nav bar, select Help menu > Documentation.

For a basic summary of each step of the wrangling process, see Workflow Basics.