Known issues

This page lists known issues with Cloud DLP, along with ways you can avoid or recover from the following issues.

BigQuery scanning

This section describes issues you might encounter when inspecting or profiling BigQuery data.

Issues common to inspection and profiling operations

The following issues are applicable to both BigQuery inspection and profiling operations.

Rows with row-level security can't be scanned

Row-level security policies can prevent Cloud DLP from inspecting and profiling the protected BigQuery tables. If you have row-level security policies applied to your BigQuery tables, we recommend that you set a TRUE filter and include the service agent in the grantee list:

Duplicate rows

When writing data to a BigQuery table, Cloud DLP might write duplicate rows.

Recently streamed data

Cloud DLP doesn't scan recently streamed data (formerly known as streaming buffer). For more information, see Streaming data availability in the BigQuery documentation.

BigQuery inspection issues

The following issues are only applicable to inspection operations on BigQuery data. They don't affect data profiles.

Exported findings do not have values for the row_number field

When you configure Cloud DLP to save findings to BigQuery, the location.content_locations.record_location.record_key.big_query_key.row_number field in the generated BigQuery table is inferred at the time the input table is scanned. Its value is nondeterministic, can't be queried, and can be null for inspection jobs.

If you need to identify specific rows where findings are present, specify inspectJob.storageConfig.bigQueryOptions.identifyingFields at job creation time.

Identifying fields can be found in the generated BigQuery table, in the location.content_locations.record_location.record_key.id_values field.

Limiting scans to new BigQuery content

If you're limiting scans to only new content, and you use the BigQuery Storage Write API to populate the input table, Cloud DLP might skip scanning some rows.

To mitigate this issue, in your inspection job, make sure the timestampField of the TimespanConfig object is a commit timestamp that BigQuery auto-generates. However, there's still no guarantee that no rows are skipped, because Cloud DLP doesn't read from recently streamed data.

If you want to auto-generate commit timestamps for a column, and you use the legacy streaming API to populate your input table, do the following:

  1. In the input table's schema, make sure that the timestamp column is of type TIMESTAMP.

    Example schema

    The following example defines the commit_time_stamp field and sets its type to TIMESTAMP:

    ...
    {
     "name": "commit_time_stamp",
     "type": "TIMESTAMP"
    }
    ...
    
  2. In the rows[].json field of the tabledata.insertAll method, make sure that the values in the timestamp column are set to AUTO.

    Example JSON

    The following example sets the value of the commit_time_stamp field to AUTO:

    {
      ...
      "commit_time_stamp": "AUTO",
      ...
    }
    

BigQuery profiling issues

The following issues are only applicable to profiling operations on BigQuery data. For more information, see Data profiles for BigQuery data.

Organizations or projects with more than 500 million tables

Cloud DLP returns an error if you attempt to profile an organization or project that has more than 500 million tables. If you encounter this error, you can send your feedback through email to cloud-dlp-feedback@google.com.

If your organization's table count has more than 500 million tables, and you have a project with a lower table count, try to do a project-level scan instead.

For information about table and column limits, see Data profiling limits.

Inspection templates

The inspection template must be in the same region as the data to be profiled. If you have data in multiple regions, use multiple inspection templates—one for each region where you have data. You can also use an inspection template that is stored in the global region. If you include a template in the global region, Cloud DLP uses it for any data that doesn't have a region-specific template. For more information, see Data residency considerations.

Stored infoTypes

A stored infoType (also known as a stored custom dictionary detector) that is referenced in your inspection template must be stored in either of the following:

  • The global region.
  • The same region as the inspection template.

Otherwise, the profiling operation fails with the error, Resource not found.

VPC Service Controls

Using this feature with VPC Service Controls zones is not officially supported. If you try scanning data inside a VPC Service Controls zone, let us know what issues you run into by sending an email to cloud-dlp-feedback@google.com.

Intelligent document parsing

This section contains known issues related to document parsing.

The DocumentLocation object isn't populated

The location.content_locations.document_location.file_offset field isn't populated for Intelligent Document Parsing scanning mode.