In the Transformer page, you identify the data that you need to transform and build your transformation recipes on samples taken from your currently selected dataset. When you make changes to your transformation recipe, those changes are immediately applied to your sample, so that you can preview the results of your recipe before you run it against the entire dataset. In this manner, you can quickly build and iterate on the transformations applied to your data.
- In the Datasets page, click the name of the dataset. See Datasets Page.
By default, Cloud Dataprep selects the first N of row data as the head sample. The number of rows depends on the number of columns, data density, and other factors. Depending on the size, this sample may be the full dataset. For more information, see Overview of Sampling.
Figure: Transformer Page
Identify and Select Data
In the main panel of the Transformer page, you can select one or more elements of sampled data, which prompts suggestions for steps that you can apply to transform them. Each of these views provides a different perspective on your data, and the results of any subsequent steps that you select or configure are previewed by default in the data grid:
NOTE: Before your job is run, profiling information such as column statistics are exact counts of the sample that is currently loaded. After the job is run, profiled results in the Job Results page might include estimates for some metrics and counts, depending on the scale of the dataset.
|Data Grid||By default, the Transformer page displays previews in a columnar grid, which is the default view. Click Grid.||Use for examining values in a column with appropriate surrounding context. How do missing values in one column compare to values in another column?|
For additional statistical information on individual columns, select Column Details from the drop-down next to the column title.
|Explore values in an individual column, when their context in other rows is not necessary. Useful for managing outliers, reviewing mean, min, and max values.|
Use the Column Browser to select the columns to display and review data across columns. Click Columns.
Navigate between columns and toggle their display in the data grid. Good for high-level perspective. Use histograms for selection of ranges of values.
|Context Panel||Depending on the state or the current selection of the data grid, the right side of the page displays one of several contextual panels. These panels cover recipes, suggestions, steps, and more. See below.|
Review recipe and edit, create, or delete recipe steps.Review and create samples.
You can use the following methods for acquiring statistics on your dataset sample or individual columns in your sample:
Sample bar: At the top of the data grid, you can see the name of the sample currently displayed in the grid. For smaller datasets, this sample is the entire dataset.
- Status bar: At the bottom of the page, you can review the number of data types and rows and column information for the sample currently displayed in the data grid. These metrics are updated based on the recipe steps that you apply to the sample.
- Click the Eye icon to toggle display of individual columns. See Visible Columns Panel.
Column statistics: You can review basic statistics on individual columns through the column details in Column Browser. Click Columns at the top of the data grid. See Column Browser Panel.
Profile your data: When you run a job on your dataset, you can optionally generate a visual profile of the resulting output. A visual profile can be useful for identifying key metrics on individual columns. See Job Results Page.
- Computed statistical functions: As needed, you can generate aggregated statistics as part of your recipe. See Aggregate Functions.
The following actions are applied through the context panel on the right side of the screen. See Context Panel.
Use the following methods to add or modify recipe steps in the Transformer page:
- Suggestion Cards: By default, when you select data in the Transformer page, a set of suggested transformations is displayed in cards. Select the appropriate one to preview the results in the data grid. See Suggestion Cards Panel.
- Transform Builder: Build complete transform steps by selecting from context-aware menus. See Transform Builder.
- Recipe Panel: After recipe steps have been created, you can review and edit them through the Recipe panel. See Recipe Panel.
Transform Preview: Before a transform step has been added to the recipe, a preview of the transform is displayed in the data grid. See Transform Preview.
For larger datasets, the Transformer page displays a sample of them, which you use as representative data to build your recipe. As needed, you can generate a new sample, which is useful for polishing your recipe. For more information, see Samples Panel.
Run jobs: To run a job that executes the transform recipe currently in the Transformer page across the entire dataset, click Run Job. See Run Job Page.
The Transformer page contains menus that are different from the standard Cloud Dataprep menu bar.
- Cloud Dataprep icon: Click to return to the Flows page. See Flows Page.
- Flow name: Click to review flow details. See Flow View Page.
- Dataset menu: Click the caret next to the flow name to open.
- Review the datasets in the flow or open a different wrangled one.
- See a mini-map of flow view for the flow.
- See Recipe Navigator.
- Recipe icon: Click the Recipe icon to display the current recipe. See Recipe Panel.
Samples icon: Click this icon to review the current sample and create new ones. See Samples Panel.
- Run Job: Runs the currently specified recipe on the dataset. See Run Job Page.
- User menu: Review your user profile and access other resources. See User Profile Page.
- Google Cloud Platform icon: Click to return to the Google Cloud Platform console.