This page describes the steps you can take to remediate findings from data profiles.
High data risk
Columns or tables with high data risk have evidence of sensitive information without additional protections. To lower the data risk score, consider doing the following:
For columns that contain sensitive data, apply a BigQuery policy tag to restrict access to accounts with specific access rights.
Before you make this change, make sure your service agent has the permissions required to profile tables with column-level restrictions. Otherwise, Cloud DLP shows an error. For more information, see Troubleshoot issues with the data profiler.
De-identify the raw sensitive data using de-identification techniques like masking and tokenization.
If the high-risk data is not needed, consider removing the sensitive columns.
High free-text score
A column with a high free-text score,
especially one that has evidence of multiple infoTypes (like
PHONE_NUMBER
, US_SOCIAL_SECURITY_NUMBER
, and DATE_OF_BIRTH
), might contain
unstructured data and instances of personally identifiable
information (PII). This column can be a note or comment field. Freeform text
presents a potential risk. For example, in such fields, someone might enter
"Customer was born on January 1, 1985".
Cloud DLP is built to handle unstructured data. To better understand this kind of data, consider doing the following:
To identify the rows or cells where PII might exist, run an on-demand inspection on the BigQuery table.
De-identify the raw sensitive data using techniques like masking and tokenization.
What's next
Learn about how Cloud DLP calculates the data risk and sensitivity levels of your tables and columns.
Learn about how tokenization makes data usable without sacrificing privacy.
Learn about how Forrester named Google Cloud a leader in unstructured data security platforms.