Inspecting Storage and Databases for Sensitive Data

One of the first steps to properly managing sensitive data is identifying where it is and how it’s used. For data in Google Cloud Storage and Google BigQuery databases, this knowledge can help you to properly set access control and sharing permissions, and be part of an ongoing monitoring plan.

The DLP API can detect and classify sensitive data stored in Cloud Storage, Google Cloud Datastore, and BigQuery. Instead of streaming the textual data into the API, you specify location and configuration information in your API call. The API returns details about any InfoTypes found in the text, a likelihood value, and more.

You can call the Data Loss Prevention API in several languages or via cURL/REST and JSON to inspect a Cloud Storage location, Cloud Datastore kind, or BigQuery table for sensitive data.

Inspecting a Storage Repository

To inspect a Cloud Storage location, Cloud Datastore kind, or BigQuery table, send a POST request to https://dlp.googleapis.com/v2beta1/inspect/operations, containing:

Obtaining Scan Results

The results of the scan are available in either summary form (counts by info_type) or detailed form (including metadata on specific findings). They can also contain information specific to the scan source:

  • Cloud Storage:
    • filePath
    • startOffset
  • Cloud Datastore:
    • projectId
    • namespaceId
    • path
    • columnName
    • Offset
  • BigQuery:
    • rowNumber
    • projectId
    • datasetId
    • tableId

The scan’s results are returned as one or more CSV files in either a location in Cloud Storage or a table in a BigQuery dataset, as specified in the OutputStorageConfig object. The results are readable by all authorized and authenticated API callers on the same project that executed the scan.

Testing Cloud Repository Inspection

To run a quick scan for yourself, use the API Explorer on the inspect.operations.create method reference page. (Depending on the width of your browser window, the API Explorer will either be on the right side of the page or at the bottom of the page.) To use API Explorer with data that is contained in a storage repository, you'll have to go through a few preliminary steps to authorize access.

First, make sure the DLP API is enabled:

  1. From the Google Cloud Platform Console, click APIs & services.
  2. Look for "DLP API" in the list of enabled APIs and services. If DLP API does not appear in the list, click Enable APIs and Services and enable it.

Next, generate an API key and an OAuth 2.0 Client ID:

  1. From the Google Cloud Platform Console, point to APIs & services, and then click Credentials.
  2. Click Create credentials, and then click API key. Copy the generated API key to a safe place.
  3. Click Create credentials again, and then click OAuth client ID.
  4. From the list of application types, choose Web application.
  5. Enter a name, and then under Authorized JavaScript origins enter https://developers.google.com. Click Create, and then copy the generated OAuth client ID to a safe place.

Now, go to the inspect.operations.create method reference page, and then do the following:

  1. In the API Explorer, click the Settings button (a gear icon), and then click Set API key / OAuth 2.0 Client ID.
  2. In the Custom API key and OAuth 2.0 client ID pane, click Custom credentials.
  3. Enter the API key and OAuth 2.0 Client ID that you saved previously into the corresponding fields, and then click Save.

You can now enter some API parameters and run a scan. For instance, the following parameters create a scan that searches for the message kind in a Cloud Datastore repository, and then writes the results into the Cloud Storage location gs://mystorage/temp.

First, ensure that you have access to Cloud Datastore, and create the Cloud Storage location gs://mystorage/temp. Then, do the following:

  1. Configure the API Explorer, so that the request looks like the following:
    {
      "storageConfig":
      {
        "datastoreOptions":
        {
          "kind":
          {
            "name":"message"
          }
        }
      }
      "outputConfig":
      {
        "storagePath":
        {
          "path":"gs://mystorage/temp"
        }
      }
    }
  2. Click through the authorization dialog box. If you see a warning message about the dangers of accessing data in this way, click Advanced options, and choose Continue (unsafe).

Once the scan has finished, you can view the results in the Cloud Storage location you specified.

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Data Loss Prevention API