[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["難以理解","hardToUnderstand","thumb-down"],["資訊或程式碼範例有誤","incorrectInformationOrSampleCode","thumb-down"],["缺少我需要的資訊/範例","missingTheInformationSamplesINeed","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-09-05 (世界標準時間)。"],[[["\u003cp\u003eDataplex data profiling identifies statistical characteristics of BigQuery table columns, such as data distribution and null counts, to enhance data understanding and analysis.\u003c/p\u003e\n"],["\u003cp\u003eData profiling scans can be configured for full or incremental table analysis, allowing for the specification of row and column filters to optimize execution time and cost.\u003c/p\u003e\n"],["\u003cp\u003eScan results provide insights like null value percentages, unique value counts, top common values, and statistical measures like average and standard deviation for various column types.\u003c/p\u003e\n"],["\u003cp\u003eDataplex enables the export of scan results to BigQuery tables, integration with Looker for custom reporting, and detailed monitoring of profile jobs.\u003c/p\u003e\n"],["\u003cp\u003eData profiling scans are charged via the premium processing SKU, and can be optimized for cost reduction via methods like sampling, incremental scans, and using filters.\u003c/p\u003e\n"]]],[],null,["# About data profiling\n\nDataplex Universal Catalog data profiling lets you identify common\nstatistical characteristics of the columns in your BigQuery\ntables. This information helps you to understand and analyze your data\nmore effectively.\n\nInformation like typical data values, data distribution, and null counts can\naccelerate analysis. When combined with data classification, data profiling can\ndetect data classes or sensitive information that, in turn, can enable access\ncontrol policies.\n\nDataplex Universal Catalog also uses this information to\n[recommend rules for data quality checks](/dataplex/docs/auto-data-quality-overview).\n\nConceptual model\n----------------\n\nDataplex Universal Catalog lets you better understand the profile of your data by\ncreating a data profile scan.\n\nThe following diagram shows how Dataplex Universal Catalog scans data to report on\nstatistical characteristics.\n\nA data profile scan is associated with one BigQuery table\nand scans the table to generate the data profiling results. A data profile\nscan supports several [configuration options](#configuration-options).\n| **Note:** Dataplex Universal Catalog runs scans on resources in a Google tenant project, so you don't need to set up your own infrastructure.\n\nConfiguration options\n---------------------\n\nThis section describes the configuration options available for running\ndata profile scans.\n\n### Scheduling options\n\nYou can schedule a data profile scan with a defined frequency, or run the scan\non demand.\n\n### Scope\n\nYou can specify the scope of the data to scan:\n\n- **Full table**: The entire table is scanned in the data profile scan.\n Sampling, row filters, and column filters are applied on the entire table\n before calculating the profiling statistics.\n\n- **Incremental** : Incremental data that you specify is scanned in the data\n profile scan. Specify a `Date` or `Timestamp` column in the table to be\n used as an increment. Typically, this is the column on which the table is\n partitioned. Sampling, row filters, and column filters are applied on the\n incremental data before calculating the profiling statistics.\n\n### Filter data\n\nYou can filter data to be scanned for profiling by using row filters and\ncolumn filters. Using filters helps you reduce the run time and cost,\nand exclude sensitive and unuseful data.\n\n- **Row filters**: Row filters let you focus on data within a specific time\n period or from a specific segment, such as region. For example, you can filter\n out data with a timestamp before a certain date.\n\n- **Column filters**: Column filters lets you include and exclude specific\n columns from your table to run the data profile scan.\n\n### Sample data\n\nYou can specify a percentage of records from your data\nto sample for running a data profile scan. Creating data profile scans on a\nsmaller sample of data can reduce the run time and cost of querying the entire dataset.\n\nMultiple data profile scans\n---------------------------\n\nYou can create multiple data profile scans at a time\nusing the Google Cloud console. You can select up to 100 tables from one dataset\nand create a data profile scan for each dataset. For more information, see\n[Create multiple data profile scans](/dataplex/docs/use-data-profiling#multiple-scans).\n\n### Export scan results to a BigQuery table\n\nYou can export the data profile scan results to a BigQuery table\nfor further analysis. To customize reporting, you can connect the\nBigQuery table data to a Looker dashboard. You can\nbuild an aggregated report by using the same results table across multiple scans.\n\nData profiling results\n----------------------\n\nThe data profiling results include the following values:\n\nThe results include the number of records scanned in every job.\n| **Note:** Approximate values might differ from the actual values by 1-2% for performance improvement.\n\nReporting and monitoring\n------------------------\n\nYou can monitor and analyze the data profiling results using the following\nreports and methods:\n\n- **Reports published with the source table in the BigQuery and Dataplex Universal Catalog pages**\n\n If you have configured a data profile scan to publish the results in the\n BigQuery and Dataplex Universal Catalog pages in the\n Google Cloud console, then you can view the latest data profile scan\n results on these pages, on the source table's **Data profile** tab, from any project.\n\n- **Historical, per job report**\n\n On the **Data profiling \\& quality \\\u003e Data profile scan** page in\n Dataplex Universal Catalog and BigQuery, you can view the\n detailed reports for the latest and historical jobs. This\n includes column-level profile information and the configuration that was used.\n\n- **Analysis tab**\n\n On the **Data profiling \\& quality \\\u003e Data profile scan** page in\n Dataplex Universal Catalog and BigQuery, you can use the **Analysis**\n tab to view the trends for a given statistic of a column over multiple\n profile jobs. For example, if you have an incremental scan, you can view how\n the average of a value has been trending over time.\n\n- **Build your own dashboard or analytics**\n\n If you have configured a data profile scan to export results to a\n BigQuery table, then you can build your own dashboards using\n tools, such as Looker Studio.\n\nLimitations\n-----------\n\n- Data profiling is supported for BigQuery tables with all column types except `BIGNUMERIC`. A scan created for a table with a `BIGNUMERIC` column results in a validation error and isn't successfully created.\n\nPricing\n-------\n\n- Dataplex Universal Catalog uses the premium processing SKU to charge for data\n profiling. For more information, see [Pricing](/dataplex/pricing).\n\n- Dataplex Universal Catalog premium processing for data profiling is billed per\n second with a one-minute minimum.\n\n- You aren't charged for failed data profile scans.\n\n- The charge depends on the number of rows, numbers of columns, the amount of\n data scanned, partitioning and clustering settings on the table, and the\n frequency of the scan.\n\n- There are several options to reduce the cost of data profile scans:\n\n - Sampling\n - Incremental scans\n - Column filtering\n - Row filtering\n- To separate data profiling charges from other charges in Dataplex Universal Catalog\n premium processing SKU, on the\n [Cloud Billing report](/billing/docs/how-to/reports), use the label\n `goog-dataplex-workload-type` with value `DATA_PROFILE`.\n\n- To filter aggregate charges, use the following labels:\n\n - `goog-dataplex-datascan-data-source-dataplex-entity`\n - `goog-dataplex-datascan-data-source-dataplex-lake`\n - `goog-dataplex-datascan-data-source-dataplex-zone`\n - `goog-dataplex-datascan-data-source-project`\n - `goog-dataplex-datascan-data-source-region`\n - `goog-dataplex-datascan-id`\n - `goog-dataplex-datascan-job-id`\n\nWhat's next?\n------------\n\n- Learn how to [use data profiling](/dataplex/docs/use-data-profiling).\n- Learn about [auto data quality](/dataplex/docs/auto-data-quality-overview).\n- Learn how to [use auto data quality](/dataplex/docs/use-auto-data-quality).\n- Learn how to [explore your data by generating data insights](/bigquery/docs/data-insights)."]]