Profile Your Source Data

You might want to execute a profile of the data that you imported from the source. As soon as you create a recipe from a source, you can execute a job to profile the dataset.

By profiling the data as soon as you load it into the Transformer page, you can assess the following:

  • Identify problems in the source and potentially correct them in the source system.
  • Create a baseline to evaluate the data wrangling work you do in Cloud Dataprep by TRIFACTA®.
  • Identify mismatched or missing values.

Steps:

  1. Create an imported dataset from your source. Add it to a flow. See Import Data Page.
  2. In Flow view, create a recipe for your imported dataset. See Flow View Page.
  3. In Flow view, edit the newly created recipe. It is opened in the Transformer page. See Transformer Page.
  4. If needed, add a header step to your dataset.
  5. Click Run Job.
  6. In the Run Job page, select the following options:
    1. Choose the default running environment.
    2. CSV format (you need at least one format to generate your dataset's profile).
    3. Select to profile results.
  7. Click Run Job.
  8. When the results are generated, click View Results.

    Tip: For record keeping, click View Recipe to copy and paste the recipe used to create the profile. You can save this recipe information into a text file.

  9. A profile of your dataset is displayed.

In the generated profile, you can identify:

  • Missing or mismatched values in each column
  • Statistical break-out by quartile
  • Beginning dataset size and baseline job execution speed

Tip: You might want to write down the overall statistics for the dataset, which may be useful when validating the changes you have applied through recipe.

You might also download the dataset for recordkeeping.See Job Results Page.

Was this page helpful? Let us know how we did:

Send feedback about...

Google Cloud Dataprep Documentation