This page lists known issues with Cloud DLP, along with ways you can avoid or recover from the following issues.
BigQuery scanning
This section describes issues you might encounter when inspecting or profiling BigQuery data.
Issues common to inspection and profiling operations
The following issues are applicable to both BigQuery inspection and profiling operations.
Rows with row-level security can't be scanned
Row-level security policies can prevent Cloud DLP from inspecting and profiling the protected BigQuery tables. If you have row-level security policies applied to your BigQuery tables, we recommend that you set a TRUE filter and include the service agent in the grantee list:
- If you're profiling data at the organization or folder level, include the service agent of the container project in the grantee list.
- If you're profiling data at the project level or running an inspection job on a table, include the service agent of the project in the grantee list.
Duplicate rows
When writing data to a BigQuery table, Cloud DLP might write duplicate rows.
Recently streamed data
Cloud DLP doesn't scan recently streamed data (formerly known as streaming buffer). For more information, see Streaming data availability in the BigQuery documentation.
BigQuery inspection issues
The following issues are only applicable to inspection operations on BigQuery data. They don't affect data profiles.
Exported findings do not have values for the row_number field
When you configure Cloud DLP to save findings to BigQuery, the
location.content_locations.record_location.record_key.big_query_key.row_number
field in the generated BigQuery table is inferred at the time the input
table is scanned. Its value is nondeterministic, can't be queried, and can be
null for inspection jobs.
If you need to identify specific rows where findings are present, specify
inspectJob.storageConfig.bigQueryOptions.identifyingFields
at job creation
time.
Identifying fields can be found in the generated BigQuery table, in
the location.content_locations.record_location.record_key.id_values
field.
Limiting scans to new BigQuery content
If you're limiting scans to only new content, and you use the BigQuery Storage Write API to populate the input table, Cloud DLP might skip scanning some rows.
To mitigate this issue, in your inspection job, make sure the timestampField
of the
TimespanConfig
object is a commit timestamp that BigQuery auto-generates.
However, there's still no guarantee that no rows are skipped, because
Cloud DLP doesn't read from
recently streamed data.
If you want to auto-generate commit timestamps for a column, and you use the legacy streaming API to populate your input table, do the following:
In the input table's schema, make sure that the timestamp column is of type
TIMESTAMP
.Example schema
The following example defines the
commit_time_stamp
field and sets its type toTIMESTAMP
:... { "name": "commit_time_stamp", "type": "TIMESTAMP" } ...
In the
rows[].json
field of thetabledata.insertAll
method, make sure that the values in the timestamp column are set toAUTO
.Example JSON
The following example sets the value of the
commit_time_stamp
field toAUTO
:{ ... "commit_time_stamp": "AUTO", ... }
BigQuery profiling issues
The following issues are only applicable to profiling operations on BigQuery data. For more information, see Data profiles for BigQuery data.
Organizations or projects with more than 500 million tables
Cloud DLP returns an error if you attempt to profile an organization or project that has more than 500 million tables. If you encounter this error, you can send your feedback through email to cloud-dlp-feedback@google.com.
If your organization's table count has more than 500 million tables, and you have a project with a lower table count, try to do a project-level scan instead.
For information about table and column limits, see Data profiling limits.
Inspection templates
The inspection template must be in the same
region as the data to be profiled. If you have data in multiple regions, use
multiple inspection templates—one for each region where you have data.
You can also use an inspection template that is stored in the global
region.
If you include a template in the global
region, Cloud DLP uses it
for any data that doesn't have a region-specific template. For more information,
see Data residency considerations.
Stored infoTypes
A stored infoType (also known as a stored custom dictionary detector) that is referenced in your inspection template must be stored in either of the following:
- The
global
region. - The same region as the inspection template.
Otherwise, the profiling operation fails with the error, Resource not found
.
VPC Service Controls
Using this feature with VPC Service Controls zones is not officially supported. If you try scanning data inside a VPC Service Controls zone, let us know what issues you run into by sending an email to cloud-dlp-feedback@google.com.
Intelligent document parsing
This section contains known issues related to document parsing.
The DocumentLocation
object isn't populated
The location.content_locations.document_location.file_offset
field isn't
populated for Intelligent Document Parsing scanning mode.