This page lists and describes all metrics that are gathered in data profiles.
There are three types of data profiles—project data profiles, table data profiles, and column data profiles.
Project data profiles
Each project data profile has the following fields. The values for these fields are aggregated based on the resources profiled within the project.
- Data risk
- Level of risk associated with the data at its current state. For more information, see Sensitivity and data risk levels.
- Last profile generated
- Date and time the profile was last generated.
- Project ID
- ID of the project that was profiled.
- Resource name
- Fully qualified name of the data profile.
- Sensitivity
- Score indicating the sensitivity level for this project. For more information, see Sensitivity and data risk levels.
- Status
- Icon that indicates the status of the profiling operation.
Table data profiles
Each table data profile has the following fields:
- Data risk
- Level of risk associated with the data at its current state. For more information, see Sensitivity and data risk levels.
- Dataset ID
- ID of the dataset that contains this table.
- Encryption
- Whether encryption for this table is managed by Google or by your organization.
- Expiration time
- Optional. The time when this table expires.
- Failed column count
- The number of columns skipped in this table because of an error.
- Group user
- Number of groups with Identity and Access Management (IAM) permissions to access this table.
- Individual user
- Number of users with IAM permissions to access this table.
- Inspect config snapshot
- Snapshot of the inspection template that was used when the profile was generated. For more information, see Data profile snapshots.
- Last profile generated
- Date and time the profile was last generated.
- Latest update in BigQuery
- Date and time this table was last modified.
- Project ID
- ID of the project that contains this table.
- Public
- Whether this table is available to all users or restricted to certain users.
- Resource labels
- Labels that the table had at the time the profile was generated.
- Resource name
- Fully qualified name of the data profile.
- Row count
- Number of rows in this table when the profile was generated.
- Scanned column count
- The number of columns profiled in this table.
- Sensitivity
- Score indicating the sensitivity level for this table. For more information, see Sensitivity and data risk levels.
- Service account
- Number of service accounts with IAM permissions to access this table.
- Status
- Icon that indicates the status of the profiling operation.
- Table ID
- ID of this table.
- Table creation time
- Date and time the table was created.
- Table size
- The size of this table when the profile was generated.
Column data profiles
Each column data profile has the following fields:
- Data risk
- Level of risk associated with the data at its current state. For more information, see Sensitivity and data risk levels.
- Data type
- The data type of the contents of this column.
- Dataset ID
- ID of the dataset that contains this table column.
- Field ID
- Name of the column.
- Free text score
The probability that this column contains freeform text. A value close to 1 indicates the column is likely to contain freeform or natural-language text. Possible values range from 0 through 1.
A high free text score can increase a column's data risk and sensitivity levels.
- Last profile generated
Date and time the profile was last generated.
- Other infoTypes
InfoTypes detected in the column that don't have a strong enough signal to be considered that column's predicted infoType. In this document, see Predicted infoType.
For data profiles generated after October 13, 2022, each infoType listed in this field has an estimated prevalence. The estimated prevalence is an approximate percentage of non-null rows in which the infoType was detected.
For example, suppose you have a column that has the following metrics:
- Predicted infoType:
FDA_CODE
- Other infoTypes:
PERSON_NAME (2%)
,STREET_ADDRESS (1%)
In this example, there is a strong indication that the column contains FDA codes. Cloud DLP also determined that approximately 2% of non-null rows in the column might contain person names and 1% might contain street addresses.
Cloud DLP scans for only the infoTypes that you specified in the inspection template. Thus, only those infoTypes can appear in the Other infoTypes field. For example, if the column has email addresses, but you didn't include the
EMAIL_ADDRESS
infoType detector in your inspection template, then this field doesn't containEMAIL_ADDRESS
.- Predicted infoType:
- Policy tags
Indicates if a policy tag is applied to the column. For information on best practices for using policy tags, see Using policy tags in BigQuery.
- Predicted infoType
If a single built-in or custom infoType clearly predominates over others in the column, Cloud DLP sets this field to that infoType. Otherwise, this field has no value.
To view a list of all infoTypes detected in the column, refer to the Other infoTypes field.
Cloud DLP scans for only the infoTypes that you specified in the inspection template. Thus, only those infoTypes can appear in the Predicted infoType field. For example, if the column has email addresses, but you didn't include the
EMAIL_ADDRESS
infoType detector in your inspection template, then this field doesn't containEMAIL_ADDRESS
.In this document, see Other infoTypes.
- Project ID
ID of the project that contains this table column.
- Resource name
Fully qualified name of the data profile.
- Sensitivity
Score indicating the sensitivity level for this column. For more information, see Sensitivity and data risk levels.
- Status
Icon that indicates the status of the profiling operation.
- Table ID
ID of the table that contains this column.