Flow View Page

Through the Flow View page, you can access and manage all objects in your flow. For each imported dataset, recipe, or other object in your flow, you can perform a variety of actions to effectively manage flow development and job execution through a single page.

Figure: Flow View page

The imported datasets in the flow or reference datasets added to the flow are listed on the left side of the screen. Associated with each dataset can be one or more recipes, which are used to transform the source data.

NOTE: Objects marked with a red dot indicate a problem with the object's configuration. Please select the object to begin investigating the error. Error information may be displayed in the right panel.

Datasets:

  • To begin working with an imported dataset, select it and click Add new recipe. A new, empty recipe is associated with the dataset. To open in the Transformer page, click the recipe icon and select Edit Recipe. See Transformer Page.
  • When created, these objects are connected together by lines flowing between them, which show the relationships between the objects in the flow.
  • For any object, any objects on which it depends are displayed to the left of the object on one of the flowing lines leading from it.

    Tip: When you run a job for a recipe, all of the recipes steps for the preceding datasets are executed as part of the job, and only the results of the terminal dataset are generated.

    • In the above example, the POS-01 recipe is dependent on all of the objects in the flow.

    • The other datasets have been integrated with the POS-01 dataset and have not yet had a recipe created for them.

Recipes:

A recipe is a set of steps to transform source data into the results you desire.

  • A recipe can be created from the following objects:
    • An imported dataset, as above.
    • A reference dataset. A reference dataset is an object that has been pulled into a flow from another flow. See below.
    • Another recipe. You can chain together recipes. For example, you may have a set of steps that you always apply at the beginning of transforming a specific type of feed. This recipe can be added into each flow as the first recipe chained to an imported dataset of that feed type.
  • The following objects can be created off of a recipe:
    • An output object is a set of publishing targets that you can specify to be executed on an ad-hoc or a scheduled basis. You can also execute ad-hoc jobs from the output object.
    • A reference object is a reference to one of your flow's recipes in another flow. When a reference object is created, the target flow receives the output of the executed recipe.
    • A reference object is a reference to one of your flow's recipes that can be used in another flow.
      • In the target flow, this object appears as a reference dataset.
      • When a reference dataset is used in a flow, the target flow receives the output of the executed recipe.

For more information on these objects, see Object Overview.

Select an object from your flow to open an object-specific panel on the right side of the screen.

Tip: You can right-click any object in Flow View to see the list of available actions that appear when you select it and choose from the right panel.

Tip: Double-click any recipe to edit it. See Transformer Page.


Actions:

Rename: Select the name of the object to rename it within the platform. This rename does not apply to the source of the object, if it exists elsewhere.

Add Datasets: Add new datasets to the flow. Details are below.

Add Schedule: To add a scheduled execution of the recipes in your flow:

  1. Define the scheduled time and interval of execution at the flow level. See Add Schedule Dialog.
    1. After the schedule has been created, you can review, edit, or delete the schedule through the Clock icon.
  2. Define the scheduled destinations for each recipe through its output object. These destinations are targets for the scheduled job. See View for Outputs below.
Make a copy: Create a copy of the flow for another user.

NOTE: The copied flow is independent of the source flow, but the original source datasets are connected.

Edit name and description: (Available to flow owner only) Change the name and description of the flow.

Delete: (Available to flow owner only) Delete the flow.

Deleting a flow removes all recipes that are contained in the flow. If copies of these objects exist in other flows, they are not touched. Imported datasets are not deleted by this action.

Add Datasets to Flow

From the Flow View page, you can add imported or reference datasets to your flow. These datasets are added as independent objects in the flow and can be joined, unioned, or referenced by other datasets in the flow.

Figure: Add datasets to current flow

  1. Search for or select the dataset to add.
    1. Use the page view controls to browse for other datasets, or select the appropriate tab to filter the list to imported or reference datasets.
    2. To import new datasets from external sources, click Import Datasets. See Import Data Page.
  2. When you have made your selections, click Add.
  3. The dataset is added as a new object in flow view.

View for Imported Datasets

When you select an imported dataset, you can preview the data contained in it, replace the source object, and more from the right-side panel.

Figure: Imported Dataset view

Key Fields:

FieldDescription
Data Preview

In the Data Preview window, you can see a small section of the data that is contained in the imported dataset. This window can be useful for verifying that you are looking at the proper data.

Tip: Click the preview to open a larger dialog, where you can select and copy data.

TypeIndicates where the data is sourced or the type of file.
LocationPath to the location of the imported dataset.
File SizeSize of the file. Units may vary.
More detailsReview details on the flows where the dataset is used.
Column Data Type Inference
  • enabled - Data types have been applied to the dataset during import.
  • disabled - Data types were not globally applied to the dataset during import. However, some columns may have had overrides applied to them during the import process. See Import Data Page.


Actions:

ActionDescription
ReplaceReplace the dataset with a different dataset or reference dataset.
Add new RecipeAdd a new recipe for the object. If a recipe already exists for it, this new recipe is created as a branch in the flow.
Edit name and description...(Available to flow owner only) Change the name and description for the object.
Remove structure...

Remove the initial parsing structure. When the structure is removed:

  1. The dataset is converted to a raw dataset. A raw dataset is the source data converted into a flat file format.
  2. All steps to shape the dataset are removed. You must break up columns in manual steps in any recipe created from the object.

See View for Raw Datasets below.

Remove from Flow

Remove the dataset from the flow.

All dependent flows, outputs, and references are not removed from the flow. You can replace the source for these objects as needed.

NOTE: References to the deleted dataset in other flows remain broken until the dataset is replaced.

View for Recipes

For each recipe, you can review or edit its steps or create new recipes altogether. You can also create references to the recipe, modify outputs, and create new recipes off of the recipe.

When you select a recipe:

  • You can create an output object.
  • You can create a reference object.
  • The following options are available in the context panel.

Figure: Recipe view

Key Fields:

FieldDescription
Steps PreviewPreview the first steps in the recipe.
StepsTotal count of the steps in the recipe.
Data Preview

Preview the data as reflected by the recipe.

NOTE: To render this data preview, some of the data must be loaded, and all steps in the recipe must be executed to generate the preview. Some delays may be expected.

Actions:

ActionDescription
Edit Recipe

Open the recipe and begin editing. See Transformer Page.

Add new RecipeAdd a new recipe from the recipe. This new recipe is operates on the outputs of the original recipe.
Edit name and description...(Available to flow owner only) Change the name and description for the object.
Create Output to runCreate a new output for the recipe. See View for Outputs below.
Create Reference Dataset

Create a reference to the output of this recipe.

This object can then be added as a reference dataset in another flow. See View for Reference Dataset below.

Change input

Change the input dataset associated with the recipe.

NOTE: This action substitutes only the primary input from a recipe, which does not include any datasets that are integrated from joins, unions, lookups, or other multi-dataset options.

Make a copy

Create a copy of the recipe and its related objects. You can create the copy with the same inputs or without inputs at all.

The copied recipe is owned by the user who copied it.

Move...Move the recipe to a different flow, or create a new flow to contain it.
Download Recipe

Download the recipe in Wrangle format to your local desktop.

Delete

Delete the recipe.

This step cannot be undone.

View for Outputs

Associated with each recipe is one or more outputs, which can be publishing destinations or scheduled publishing destinations. Through outputs, you can execute and track jobs for the related recipe.

Destinations tab

The Destinations tab contains all configured destinations associated with the recipe.

  • Manual destinations are executed when the job is run through the application interface.
  • Scheduled destinations are executed when the job is triggered based on a schedule you have defined.

Figure: Destinations tab

Key Fields:

FieldDescription
(Action)-(Format)

Field name describes the output action and the file format in which the results are written.

Field value is the location where the results are written.

EnvironmentThe running environment where the job is configured to be executed.
ProfilingIf profiling is enabled for this destination, this value is set to yes.

For more information, see Run Job Page.

Scheduled destinations:

When a scheduled execution of the flow is triggered, these destinations are populated with the results. If any input datasets are missing, the job is not run.

NOTE: Flow collaborators cannot modify publishing destinations.

Actions:

ActionDescription
Run Job

Click Run Job to queue for immediate execution a job for the manual destinations.

You can track the progress and results of this task through the Jobs tab.

Delete Output

Remove this output from the flow. This operation cannot be undone.

Removing an output does not remove the jobs associated with the output. You can continue working with those executed jobs.See Jobs Page.

EditClick this link to modify the selected destination's properties.

Jobs tab

Figure: Jobs tab

Each entry in the Jobs tab identifies a job that has been queued for execution. You can track the progress, success, or failure of execution. When a job has finished execution you can review the results. Click the link to the job. For more information, see Job Results Page.
Actions:
ActionDescription
View Dataflow Job

View the job on Cloud Dataflow.

View steps and dependenciesView steps of the recipe being executed and any dependencies referenced in the recipe.
Export Results

Click to export or publish the results from your completed job. For more information, see Export Results Window.

View for References

When you select a recipe, you can choose to create a reference dataset off of that recipe. A reference dataset is a dataset that is a reference to the output generated from a recipe contained in another flow. Whenever the upstream recipe and its output data are changed, the results are automatically inherited through the reference to the reference dataset.

NOTE: You cannot select or use a reference dataset until a reference has been created in the source flow from the recipe to use.

To create a reference dataset from a recipe, click the Paper Clip icon. The following options appear in the right panel.

Figure: Reference view

Key Fields:

FieldDescription
Used InIndicates the number of flows where the reference appears. If this number is greater than one, click More details to review the flows. See Dataset Details Page.


Actions:

ActionDescription
Add to Flow...Click to add the reference dataset to a new or existing flow.
Edit name and description...(Available to flow owner only) Change the name and description for the object.
Delete Reference Dataset

Remove the reference dataset from the flow.

Deleting a reference dataset in the source flow causes all references to it to be broken in the flows where it is referenced. These broken references should be fixed by swapping in new sources.

View for Raw Datasets

A raw dataset is an imported dataset that does not contain any initial parsing steps. All parsing steps must be added through recipes that are applied to the dataset.

Tip: You can remove initial parsing during import or through the context menu for an imported dataset. See Initial Parsing Steps.

Figure: Raw Dataset view


Key Fields:

FieldDescription
Data Preview

In the Data Preview window, you can see a small section of the data that is contained in the imported dataset. This window can be useful for verifying that you are looking at the proper data.

Tip: Click the preview to open a larger dialog, where you can select and copy data.

TypeIndicates where the data is sourced or the type of file.
File SizeSize of the file. Units may vary.
LocationPath to the location of the imported dataset.

Actions:

ActionDescription
Add new RecipeAdd a new recipe for the object. If a recipe already exists for it, this new recipe is created as a branch in the flow.
Edit name and description...(Available to flow owner only) Change the name and description for the object.
Remove from FlowRemove the dataset from the flow. All dependent flows, outputs, and references are removed from the flow as well.

View for Reference Datasets

A reference dataset is a reference to a recipe's outputs that has been added to a flow other than the one where the recipe is located.

NOTE: A reference dataset is a read-only object in the flow where it is referenced. You cannot select or use a reference dataset until a reference has been created in the source flow from the recipe to use. See View for Recipes above.

To add a reference dataset, you can:

  1. From the source flow, select the reference object for a recipe. In the context panel, click Add to Flow....
  2. Click Add Datasets from the main Flow View page and select one from a different flow.


Figure: View for referenced sataset in a new flow

NOTE: Reference datasets marked with a red dot no longer have a source dataset for them in the other flow. These upstream dependencies should be fixed. See Fix Dependency Issues.

When you select a reference dataset in flow view, the following are available in the right-hand panel.

Key Fields:

FieldDescription
Source FlowFlow that contains the dataset. Click the link to open the Flow View page for that dataset.

Actions:

ActionDescription
ReplaceReplace the dataset with a different dataset or reference dataset.
Add new RecipeAdd a new recipe for the object. If a recipe already exists for it, this new recipe is created as a branch in the flow.
Remove...Remove the reference dataset from the flow. The source dataset in the other flow is untouched.
Was this page helpful? Let us know how we did:

Send feedback about...

Google Cloud Dataprep Documentation