Inspecting Storage and Databases for Sensitive Data

One of the first steps to properly managing sensitive data is storage classification: identifying where your sensitive data is in your storage repository and how it’s used. For data stored in Cloud Storage, Cloud Datastore, and BigQuery, this knowledge can help you to properly set access control and sharing permissions, and it can be part of an ongoing monitoring plan.

The DLP API can detect and classify sensitive data stored in Cloud Storage, Cloud Datastore, and BigQuery. Instead of streaming the textual data into the API, you specify location and configuration information in your API call. The API returns details about any InfoTypes found in the text, a likelihood value, and more.

You can call the Data Loss Prevention API in several languages or via cURL/REST and JSON to inspect a Cloud Storage location, Cloud Datastore kind, or BigQuery table for sensitive data.

This topic includes several samples for each Google Cloud Platform storage repository type (Cloud Storage, Cloud Datastore, and BigQuery) in several programming languages, plus a detailed overview of the inspection process and the results output you can expect.

Inspecting a Cloud Storage Location

The following code samples demonstrate how to inspect a Cloud Storage location using the DLP API in several languages. The "Protocol" sample includes sample JSON that can be sent in a POST request to the specified DLP API endpoint.

For more information about configuration options, see Configuring Storage Classification, later in this topic.

See the JSON quickstart for more information on using JSON.

URL:

  POST https://dlp.googleapis.com/v2beta2/projects/[YOUR_PROJECT_ID]/dataSource:inspect
Sample Input:
{
  "jobConfig": {
    "storageConfig":{
      "bigqueryOptions":{
        "tableReference":{
          "projectId":"[YOUR_PROJECT_ID]",
          "datasetId":"[YOUR_BIGQUERY_DATASET_NAME]",
          "tableId":"[YOUR_BIGQUERY_TABLE_NAME]"
        }
      }
    },
    "inspectConfig":{
      "infoTypes":[
        {
          "name":"PHONE_NUMBER"
        }
      ]
    },
    "outputConfig":{
      "table":{
          "projectId":"[YOUR_PROJECT_ID]",
          "datasetId":"[YOUR_BIGQUERY_DATASET_NAME]",
          "tableId":"[YOUR_BIGQUERY_TABLE_NAME]"
      }
    }
  }
}

Inspecting a Cloud Datastore Kind

The following code samples demonstrate how to inspect a Cloud Datastore kind using the DLP API in several languages. The "Protocol" sample includes sample JSON that can be sent in a POST request to the specified DLP API endpoint.

For more information about configuration options, see Configuring Storage Classification, later in this topic.

See the JSON quickstart for more information on using JSON.

URL:

  POST https://dlp.googleapis.com/v2beta2/projects/[YOUR_PROJECT_ID]/dataSource:inspect
Sample Input:
{
  "jobConfig": {
    "storageConfig":{
      "datastoreOptions":{
        "partitionId":{
          "projectId":"[YOUR_GCLOUD_PROJECT]",
          "namespaceId":"[YOUR_DATASTORE_NAMESPACE]"
        },
        "kind":{
          "name":"[YOUR_DATASTORE_KIND]"
        }
      }
    },
    "inspectConfig":{
      "infoTypes":[
        {
          "name":"PHONE_NUMBER"
        }
      ]
    },
    "outputConfig":{
      "table":{
          "projectId":"[YOUR_PROJECT_ID]",
          "datasetId":"[YOUR_BIGQUERY_DATASET_NAME]",
          "tableId":"[YOUR_BIGQUERY_TABLE_NAME]"
      }
    }
  }
}

Inspecting a BigQuery Table

The following code samples demonstrate how to inspect a BigQuery table using the DLP API in several languages. The "Protocol" sample includes sample JSON that can be sent in a POST request to the specified DLP API endpoint.

For more information about configuration options, see Configuring Storage Classification, later in this topic.

See the JSON quickstart for more information on using JSON.

URL:

  POST https://dlp.googleapis.com/v2beta2/projects/[YOUR_PROJECT_ID]/dataSource:inspect
Sample Input:
{
  "jobConfig": {
    "storageConfig":{
      "datastoreOptions":{
        "partitionId":{
          "projectId":"[YOUR_GCLOUD_PROJECT]",
          "namespaceId":"[YOUR_DATASTORE_NAMESPACE]"
        },
        "kind":{
          "name":"[YOUR_DATASTORE_KIND]"
        }
      }
    },
    "inspectConfig":{
      "infoTypes":[
        {
          "name":"PHONE_NUMBER"
        }
      ]
    },
    "outputConfig":{
        "table":{
            "projectId":"[YOUR_PROJECT_ID]",
            "datasetId":"[YOUR_BIGQUERY_DATASET_NAME]",
            "tableId":"[YOUR_BIGQUERY_TABLE_NAME]"
        }
    }
  }
}

Configuring Storage Classification

To inspect a Cloud Storage location, Cloud Datastore kind, or BigQuery table, you send a request to the DLP API containing:

  • Configuration options (StorageConfig), which must include:
  • Optional inspection configuration information (InspectConfig), which lets you configure your query.
  • Optional output configuration information (OutputStorageConfig), which specifies a path to a Cloud Storage location or BigQuery table to store the API's output. This allows you to save the scan results to one or more CSV files at the location you specify.
The results are readable by all authorized and authenticated API callers on the same project that executed the scan, and contain information specific to the scan source (Cloud Storage, Cloud Datastore, or BigQuery).

Send feedback about...

Data Loss Prevention API