Change Dataset Dialog

Through the Flow View page, you can change the source that is used for your dataset. In this manner, you can apply the same recipe across datasets with the same schema. When the source dataset has been changed, a new sample is automatically generated for you.

For example, you build your recipe for a week's worth of sales data, which is sourced from an imported dataset based on a CSV called, Week01-Sales.csv. When the next week's source data is dropped in the appropriate directory, you can:

  1. Import the new dataset,
  2. Edit the recipe,
  3. Change the source to the new file, and
  4. Execute a job immediately to process the new week of data.

NOTE: A dataset source can be an imported dataset, a reference dataset, or a recipe. Subsequent changes to the source data affect your dataset in development.

Notes and Limitations:

  • If there are differences between the schemas of the source and the new source, your recipe is likely to break on the dataset when the new dataset is selected.

  • You can swap your original source dataset with an imported dataset, reference dataset, or a recipe. If needed, you can swap back to the original source at any time.

  • Data-dependent transforms, such as header and valuestocols, use the data that was present in the sample at the time that they were added to the recipe. This fact can cause unexpected changes or breakages when the recipe is applied to another source.
  • You cannot undo or redo source swaps.

Steps:

  1. To change a data source, open the flow containing it.

  2. In Flow View, you can:

    1. Click the imported dataset icon. Then, click Replace.

      NOTE: This action removes the imported dataset and all links (edges) coming out of it. The replacement must be reconnected with any downstream objects.

    2. Click the recipe icon. Then, click Change input.

      NOTE: This action substitutes only the primary input from a recipe, which does not include any datasets that are integrated from joins, unions, lookups, or other multi-dataset options.

  3. Select the new source:

    NOTE: You can select data from any flow to which you have access. Changes to the source are inherited.

    Figure: Change Dataset Dialog

    1. To import new data, click Import Datasets. For more information, see Import Data Page.

  4. Click Replace or Change.
  5. Your dataset is now using the selected dataset as its source, and the current recipe in the Transformer page is applied to the new source.
Was this page helpful? Let us know how we did:

Send feedback about...

Google Cloud Dataprep Documentation