Known issues

This page lists known issues with Sensitive Data Protection, along with ways you can avoid or recover from the following issues.

General issues

Storing results to BigQuery

When a job or discovery scan is storing results to BigQuery, an Already exists error appears in the logs. The error does not indicate that there is a problem; your results will be stored as expected.

BigQuery scanning

This section describes issues you might encounter when inspecting or profiling BigQuery data.

Issues common to inspection and profiling operations

The following issues are applicable to both BigQuery inspection and profiling operations.

Rows with row-level security can't be scanned

Row-level security policies can prevent Sensitive Data Protection from inspecting and profiling the protected BigQuery tables. If you have row-level security policies applied to your BigQuery tables, we recommend that you set a TRUE filter and include the service agent in the grantee list:

Duplicate rows

When writing data to a BigQuery table, Sensitive Data Protection might write duplicate rows.

Recently streamed data

Sensitive Data Protection doesn't scan recently streamed data (formerly known as streaming buffer). For more information, see Streaming data availability in the BigQuery documentation.

BigQuery inspection issues

The following issues are only applicable to inspection operations on BigQuery data. They don't affect data profiles.

Exported findings do not have values for the row_number field

When you configure Sensitive Data Protection to save findings to BigQuery, the location.content_locations.record_location.record_key.big_query_key.row_number field in the generated BigQuery table is inferred at the time the input table is scanned. Its value is nondeterministic, can't be queried, and can be null for inspection jobs.

If you need to identify specific rows where findings are present, specify inspectJob.storageConfig.bigQueryOptions.identifyingFields at job creation time.

Identifying fields can be found in the generated BigQuery table, in the location.content_locations.record_location.record_key.id_values field.

Limiting scans to new BigQuery content

If you're limiting scans to only new content, and you use the BigQuery Storage Write API to populate the input table, Sensitive Data Protection might skip scanning some rows.

To mitigate this issue, in your inspection job, make sure the timestampField of the TimespanConfig object is a commit timestamp that BigQuery auto-generates. However, there's still no guarantee that no rows are skipped, because Sensitive Data Protection doesn't read from recently streamed data.

If you want to auto-generate commit timestamps for a column, and you use the legacy streaming API to populate your input table, do the following:

  1. In the input table's schema, make sure that the timestamp column is of type TIMESTAMP.

    Example schema

    The following example defines the commit_time_stamp field and sets its type to TIMESTAMP:

    ...
    {
     "name": "commit_time_stamp",
     "type": "TIMESTAMP"
    }
    ...
    
  2. In the rows[].json field of the tabledata.insertAll method, make sure that the values in the timestamp column are set to AUTO.

    Example JSON

    The following example sets the value of the commit_time_stamp field to AUTO:

    {
      ...
      "commit_time_stamp": "AUTO",
      ...
    }
    

Limiting scans by setting a maximum percentage or rows

When you set a sampling limit based on a percentage of the total number of table rows (rowsLimitPercent), Sensitive Data Protection can inspect more rows than expected. If you need to put a hard limit on the number of rows to scan, we recommend setting a maximum number of rows (rowsLimit) instead.

BigQuery profiling issues

The following issues are only applicable to profiling operations on BigQuery data. For more information, see Data profiles for BigQuery data.

Organizations or projects with more than 500 million tables

Sensitive Data Protection returns an error if you attempt to profile an organization or project that has more than 500 million tables. If you encounter this error, you can send your feedback through email to cloud-dlp-feedback@google.com.

If your organization's table count has more than 500 million tables, and you have a project with a lower table count, try to do a project-level scan instead.

For information about table and column limits, see Data profiling limits.

Inspection templates

The inspection template must be in the same region as the data to be profiled. If you have data in multiple regions, use multiple inspection templates—one for each region where you have data. You can also use an inspection template that is stored in the global region. If you include a template in the global region, Sensitive Data Protection uses it for any data that doesn't have a region-specific template. For more information, see Data residency considerations.

Stored infoTypes

A stored infoType (also known as a stored custom dictionary detector) that is referenced in your inspection template must be stored in either of the following:

  • The global region.
  • The same region as the inspection template.

Otherwise, the profiling operation fails with the error, Resource not found.

Cloud Storage scanning

This section describes issues you might encounter when inspecting or de-identifying data.

Inspection of XLSX files with large custom dictionary detectors

When you use a large custom dictionary detector (also known as a stored custom dictionary detector) to inspect a Microsoft Excel .xlsx file, the inspection job can run slowly, appear stuck, and incur a large amount of Cloud Storage Class B operations. This is because Sensitive Data Protection might read the source term list of the large custom dictionary once for each cell in the .xlsx file. The volume of read operations can make the Sensitive Data Protection inspection job show little progress and appear to be stuck.

For more information about the relevant Cloud Storage billing charges, see the charges for Class B operations in Operation charges.

Structured files being scanned in binary mode

In certain cases, files that are typically scanned in structured parsing mode might be scanned in binary mode, which doesn't include the enhancements of the structured parsing mode. For more information, see Scanning structured files in structured parsing mode.

Cloud SQL discovery

Certain errors related to Cloud SQL connections might appear only on the Service connections page and might not appear when you view the associated scan configuration.

To check for connection errors, in the Google Cloud console, go to the Service connections page.

Go to Service connections

Intelligent document parsing

This section contains known issues related to document parsing.

The DocumentLocation object isn't populated

The location.content_locations.document_location.file_offset field isn't populated for Intelligent Document Parsing scanning mode.

Detection

Dictionary words containing characters in the Supplementary Multilingual Plane of the Unicode standard can yield unexpected findings. Examples of such characters are emojis, scientific symbols, and historical scripts.