Inspecting Storage and Databases for Sensitive Data

One of the first steps to properly managing sensitive data is storage classification: identifying where your sensitive data is in your storage repository and how it’s used. For data stored in Google Cloud Storage, Google Cloud Datastore, and Google BigQuery, this knowledge can help you to properly set access control and sharing permissions, and it can be part of an ongoing monitoring plan.

The DLP API can detect and classify sensitive data stored in Cloud Storage, Cloud Datastore, and BigQuery. Instead of streaming the textual data into the API, you specify location and configuration information in your API call. The API returns details about any InfoTypes found in the text, a likelihood value, and more.

You can call the Data Loss Prevention API in several languages or via cURL/REST and JSON to inspect a Cloud Storage location, Cloud Datastore kind, or BigQuery table for sensitive data.

This topic includes several samples for each Google Cloud Platform storage repository type (Cloud Storage, Cloud Datastore, and BigQuery) in several programming languages, plus a detailed overview of the inspection process and the results output you can expect.

Inspecting a Cloud Storage Location

The following code samples demonstrate how to inspect a Cloud Storage location using the DLP API in several languages. The "Protocol" tab shows sample JSON that can be sent in a POST request to the specified DLP API endpoint.

For more information about configuration options, see Configuring Storage Classification, later in this topic.

Protocol

See the JSON quickstart for more information on using JSON.

URL:

  POST https://dlp.googleapis.com/v2beta1/inspect/operations

Sample Input:

{
  "storageConfig": {
    "bigqueryOptions": {
      "tableReference": {
        "projectId": "gs://[YOUR_BUCKET]/test.txt",
        "datasetId": "[YOUR_BIGQUERY_DATASET_NAME]",
        "tableId": "[YOUR_BIGQUERY_TABLE_NAME]"
      }
    }
  },
  "inspectConfig": {
    "infoTypes": [
      { "name": "PHONE_NUMBER" }
    ]
  },
  "outputConfig": {
    "storagePath": {
      "path": "gs://[YOUR_BUCKET]/results.csv"
    }
  }
}

Java

For more on installing and creating a DLP API client, refer to DLP API Client Libraries.

// Instantiates a client
try (DlpServiceClient dlpServiceClient = DlpServiceClient.create()) {
  // The name of the bucket where the file resides.
  // bucketName = 'YOUR-BUCKET';

  // The path to the file within the bucket to inspect.
  // Can contain wildcards, e.g. "my-image.*"
  // fileName = 'my-image.png';

  // The minimum likelihood required before returning a match
  // minLikelihood = LIKELIHOOD_UNSPECIFIED;

  // The maximum number of findings to report (0 = server maximum)
  // maxFindings = 0;

  // The infoTypes of information to match
  // infoTypes = ['US_MALE_NAME', 'US_FEMALE_NAME'];

  CloudStorageOptions cloudStorageOptions =
      CloudStorageOptions.newBuilder()
          .setFileSet(FileSet.newBuilder().setUrl("gs://" + bucketName + "/" + fileName))
          .build();

  StorageConfig storageConfig =
      StorageConfig.newBuilder().setCloudStorageOptions(cloudStorageOptions).build();

  InspectConfig inspectConfig =
      InspectConfig.newBuilder()
          .addAllInfoTypes(infoTypes)
          .setMinLikelihood(minLikelihood)
          .build();

  // optionally provide an output configuration to store results, default : none
  OutputStorageConfig outputConfig = OutputStorageConfig.getDefaultInstance();

  // asynchronously submit an inspect operation
  OperationFuture<InspectOperationResult, InspectOperationMetadata, Operation> responseFuture =
      dlpServiceClient.createInspectOperationAsync(inspectConfig, storageConfig, outputConfig);

  // ...
  // block on response, returning job id of the operation
  InspectOperationResult inspectOperationResult = responseFuture.get();
  ResultName resultName = inspectOperationResult.getNameAsResultName();
  InspectResult inspectResult = dlpServiceClient.listInspectFindings(resultName).getResult();

  if (inspectResult.getFindingsCount() > 0) {
    System.out.println("Findings: ");
    for (Finding finding : inspectResult.getFindingsList()) {
      System.out.print("\tInfo type: " + finding.getInfoType().getName());
      System.out.println("\tLikelihood: " + finding.getLikelihood());
    }
  } else {
    System.out.println("No findings.");
  }
} catch (Exception e) {
  e.printStackTrace();
  System.out.println("Error in inspectGCSFileAsync: " + e.getMessage());
}

Node.js

For more on installing and creating a DLP API client, refer to DLP API Client Libraries.

// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');

// Instantiates a client
const dlp = new DLP.DlpServiceClient();

// The name of the bucket where the file resides.
// const bucketName = 'YOUR-BUCKET';

// The path to the file within the bucket to inspect.
// Can contain wildcards, e.g. "my-image.*"
// const fileName = 'my-image.png';

// The minimum likelihood required before returning a match
// const minLikelihood = 'LIKELIHOOD_UNSPECIFIED';

// The maximum number of findings to report (0 = server maximum)
// const maxFindings = 0;

// The infoTypes of information to match
// const infoTypes = [{ name: 'PHONE_NUMBER' }, { name: 'EMAIL_ADDRESS' }, { name: 'CREDIT_CARD_NUMBER' }];

// Get reference to the file to be inspected
const storageItems = {
  cloudStorageOptions: {
    fileSet: {url: `gs://${bucketName}/${fileName}`},
  },
};

// Construct REST request body for creating an inspect job
const request = {
  inspectConfig: {
    infoTypes: infoTypes,
    minLikelihood: minLikelihood,
    maxFindings: maxFindings,
  },
  storageConfig: storageItems,
};

// Create a GCS File inspection job and wait for it to complete (using promises)
dlp
  .createInspectOperation(request)
  .then(createJobResponse => {
    const operation = createJobResponse[0];

    // Start polling for job completion
    return operation.promise();
  })
  .then(completeJobResponse => {
    // When job is complete, get its results
    const jobName = completeJobResponse[0].name;
    return dlp.listInspectFindings({
      name: jobName,
    });
  })
  .then(results => {
    const findings = results[0].result.findings;
    if (findings.length > 0) {
      console.log(`Findings:`);
      findings.forEach(finding => {
        console.log(`\tInfo type: ${finding.infoType.name}`);
        console.log(`\tLikelihood: ${finding.likelihood}`);
      });
    } else {
      console.log(`No findings.`);
    }
  })
  .catch(err => {
    console.log(`Error in promiseInspectGCSFile: ${err.message || err}`);
  });

Inspecting a Cloud Datastore Kind

The following code samples demonstrate how to inspect a Cloud Datastore kind using the DLP API in several languages. The "Protocol" tab shows sample JSON that can be sent in a POST request to the specified DLP API endpoint.

For more information about configuration options, see Configuring Storage Classification, later in this topic.

Protocol

See the JSON quickstart for more information on using JSON.

URL:

  POST https://dlp.googleapis.com/v2beta1/inspect/operations

Sample Input:

{
  "storageConfig": {
    "datastoreOptions": {
      "partitionId": {
        "projectId": "[YOUR_GCLOUD_PROJECT]",
        "namespaceId": "[YOUR_DATASTORE_NAMESPACE]",
      },
      "kind": {
        "name": "[YOUR_DATASTORE_KIND]"
      }
    }
  },
  "inspectConfig": {
    "infoTypes": [
      { "name": "PHONE_NUMBER" }
    ]
  },
  "outputConfig": {
    "storagePath": {
      "path": "gs://[YOUR_BUCKET]/results.csv"
    }
  }
}

Java

For more on installing and creating a DLP API client, refer to DLP API Client Libraries.

// Instantiates a client
try (DlpServiceClient dlpServiceClient = DlpServiceClient.create()) {

  // (Optional) The project ID containing the target Datastore
  // projectId =  my-project-id

  // (Optional) The ID namespace of the Datastore document to inspect.
  // To ignore Datastore namespaces, set this to an empty string ('')
  // namespaceId = '';

  // The kind of the Datastore entity to inspect.
  // kind = 'Person';

  // The minimum likelihood required before returning a match
  // minLikelihood = LIKELIHOOD_UNSPECIFIED;

  // The infoTypes of information to match
  // infoTypes = ['US_MALE_NAME', 'US_FEMALE_NAME'];

  // Reference to the Datastore namespace
  PartitionId partitionId =
      PartitionId.newBuilder().setProjectId(projectId).setNamespaceId(namespaceId).build();

  // Reference to the Datastore kind
  KindExpression kindExpression = KindExpression.newBuilder().setName(kind).build();
  DatastoreOptions datastoreOptions =
      DatastoreOptions.newBuilder().setKind(kindExpression).setPartitionId(partitionId).build();

  // Construct Datastore configuration to be inspected
  StorageConfig storageConfig =
      StorageConfig.newBuilder().setDatastoreOptions(datastoreOptions).build();

  InspectConfig inspectConfig =
      InspectConfig.newBuilder()
          .addAllInfoTypes(infoTypes)
          .setMinLikelihood(minLikelihood)
          .build();

  // optionally provide an output configuration to store results, default : none
  OutputStorageConfig outputConfig = OutputStorageConfig.getDefaultInstance();

  // asynchronously submit an inspect operation
  OperationFuture<InspectOperationResult, InspectOperationMetadata, Operation> responseFuture =
      dlpServiceClient.createInspectOperationAsync(inspectConfig, storageConfig, outputConfig);

  // ...
  // block on response, returning job id of the operation
  InspectOperationResult inspectOperationResult = responseFuture.get();
  ResultName resultName = inspectOperationResult.getNameAsResultName();
  InspectResult inspectResult = dlpServiceClient.listInspectFindings(resultName).getResult();

  if (inspectResult.getFindingsCount() > 0) {
    System.out.println("Findings: ");
    for (Finding finding : inspectResult.getFindingsList()) {
      System.out.print("\tInfo type: " + finding.getInfoType().getName());
      System.out.println("\tLikelihood: " + finding.getLikelihood());
    }
  } else {
    System.out.println("No findings.");
  }
} catch (Exception e) {
  e.printStackTrace();
  System.out.println("Error in inspectDatastore: " + e.getMessage());
}

Node.js

For more on installing and creating a DLP API client, refer to DLP API Client Libraries.

// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');

// Instantiates a client
const dlp = new DLP.DlpServiceClient();

// (Optional) The project ID containing the target Datastore
// const projectId = process.env.GCLOUD_PROJECT;

// (Optional) The ID namespace of the Datastore document to inspect.
// To ignore Datastore namespaces, set this to an empty string ('')
// const namespaceId = '';

// The kind of the Datastore entity to inspect.
// const kind = 'Person';

// The minimum likelihood required before returning a match
// const minLikelihood = 'LIKELIHOOD_UNSPECIFIED';

// The maximum number of findings to report (0 = server maximum)
// const maxFindings = 0;

// The infoTypes of information to match
// const infoTypes = [{ name: 'PHONE_NUMBER' }, { name: 'EMAIL_ADDRESS' }, { name: 'CREDIT_CARD_NUMBER' }];

// Construct items to be inspected
const storageItems = {
  datastoreOptions: {
    partitionId: {
      projectId: projectId,
      namespaceId: namespaceId,
    },
    kind: {
      name: kind,
    },
  },
};

// Construct request for creating an inspect job
const request = {
  inspectConfig: {
    infoTypes: infoTypes,
    minLikelihood: minLikelihood,
    maxFindings: maxFindings,
  },
  storageConfig: storageItems,
};

// Run inspect-job creation request
dlp
  .createInspectOperation(request)
  .then(createJobResponse => {
    const operation = createJobResponse[0];

    // Start polling for job completion
    return operation.promise();
  })
  .then(completeJobResponse => {
    // When job is complete, get its results
    const jobName = completeJobResponse[0].name;
    return dlp.listInspectFindings({
      name: jobName,
    });
  })
  .then(results => {
    const findings = results[0].result.findings;
    if (findings.length > 0) {
      console.log(`Findings:`);
      findings.forEach(finding => {
        console.log(`\tInfo type: ${finding.infoType.name}`);
        console.log(`\tLikelihood: ${finding.likelihood}`);
      });
    } else {
      console.log(`No findings.`);
    }
  })
  .catch(err => {
    console.log(`Error in inspectDatastore: ${err.message || err}`);
  });

Inspecting a BigQuery Table

The following code samples demonstrate how to inspect a BigQuery table using the DLP API in several languages. The "Protocol" tab shows sample JSON that can be sent in a POST request to the specified DLP API endpoint.

For more information about configuration options, see Configuring Storage Classification, later in this topic.

Protocol

See the JSON quickstart for more information on using JSON.

URL:

  POST https://dlp.googleapis.com/v2beta1/inspect/operations

Sample Input:

{
  "storageConfig": {
    "datastoreOptions": {
      "partitionId": {
        "projectId": "[YOUR_GCLOUD_PROJECT]",
        "namespaceId": "[YOUR_DATASTORE_NAMESPACE]",
      },
      "kind": {
        "name": "[YOUR_DATASTORE_KIND]"
      }
    }
  },
  "inspectConfig": {
    "infoTypes": [
      { "name": "PHONE_NUMBER" }
    ]
  },
  "outputConfig": {
    "storagePath": {
      "path": "gs://[YOUR_BUCKET]/results.csv"
    }
  }
}

Java

For more on installing and creating a DLP API client, refer to DLP API Client Libraries.

// Instantiates a client
try (DlpServiceClient dlpServiceClient = DlpServiceClient.create()) {

  // (Optional) The project ID to run the API call under
  // projectId =  my-project-id

  // The ID of the dataset to inspect, e.g. 'my_dataset'
  // datasetId = "my_dataset";

  // The ID of the table to inspect, e.g. 'my_table'
  // tableId = "my_table";

  // The minimum likelihood required before returning a match
  // minLikelihood = LIKELIHOOD_UNSPECIFIED;

  // The infoTypes of information to match
  // infoTypes = ['US_MALE_NAME', 'US_FEMALE_NAME'];

  // Reference to the BigQuery table
  BigQueryTable tableReference =
          BigQueryTable.newBuilder()
              .setProjectId(projectId)
              .setDatasetId(datasetId)
              .setTableId(tableId)
              .build();
  BigQueryOptions bigQueryOptions =
          BigQueryOptions.newBuilder()
              .setTableReference(tableReference)
              .build();

  // Construct BigQuery configuration to be inspected
  StorageConfig storageConfig =
          StorageConfig.newBuilder()
              .setBigQueryOptions(bigQueryOptions)
              .build();

  InspectConfig inspectConfig =
          InspectConfig.newBuilder()
                  .addAllInfoTypes(infoTypes)
                  .setMinLikelihood(minLikelihood)
                  .build();

  // optionally provide an output configuration to store results, default : none
  OutputStorageConfig outputConfig = OutputStorageConfig.getDefaultInstance();

  // asynchronously submit an inspect operation
  OperationFuture<InspectOperationResult, InspectOperationMetadata, Operation> responseFuture =
          dlpServiceClient.createInspectOperationAsync(
              inspectConfig, storageConfig, outputConfig);

  // ...
  // block on response, returning job id of the operation
  InspectOperationResult inspectOperationResult = responseFuture.get();
  ResultName resultName = inspectOperationResult.getNameAsResultName();
  InspectResult inspectResult = dlpServiceClient.listInspectFindings(resultName).getResult();

  if (inspectResult.getFindingsCount() > 0) {
    System.out.println("Findings: ");
    for (Finding finding : inspectResult.getFindingsList()) {
      System.out.print("\tInfo type: " + finding.getInfoType().getName());
      System.out.println("\tLikelihood: " + finding.getLikelihood());
    }
  } else {
    System.out.println("No findings.");
  }
} catch (Exception e) {
  e.printStackTrace();
  System.out.println("Error in inspectBigguery: " + e.getMessage());
}

Node.js

For more on installing and creating a DLP API client, refer to DLP API Client Libraries.

// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');

// Instantiates a client
const dlp = new DLP.DlpServiceClient();

// (Optional) The project ID to run the API call under
// const projectId = process.env.GCLOUD_PROJECT;

// The ID of the dataset to inspect, e.g. 'my_dataset'
// const datasetId = 'my_dataset';

// The ID of the table to inspect, e.g. 'my_table'
// const tableId = 'my_table';

// The minimum likelihood required before returning a match
// const minLikelihood = 'LIKELIHOOD_UNSPECIFIED';

// The maximum number of findings to report (0 = server maximum)
// const maxFindings = 0;

// The infoTypes of information to match
// const infoTypes = [{ name: 'PHONE_NUMBER' }, { name: 'EMAIL_ADDRESS' }, { name: 'CREDIT_CARD_NUMBER' }];

// Construct items to be inspected
const storageItems = {
  bigQueryOptions: {
    tableReference: {
      projectId: projectId,
      datasetId: datasetId,
      tableId: tableId,
    },
  },
};

// Construct request for creating an inspect job
const request = {
  inspectConfig: {
    infoTypes: infoTypes,
    minLikelihood: minLikelihood,
    maxFindings: maxFindings,
  },
  storageConfig: storageItems,
};

// Run inspect-job creation request
dlp
  .createInspectOperation(request)
  .then(createJobResponse => {
    const operation = createJobResponse[0];

    // Start polling for job completion
    return operation.promise();
  })
  .then(completeJobResponse => {
    // When job is complete, get its results
    const jobName = completeJobResponse[0].name;
    return dlp.listInspectFindings({
      name: jobName,
    });
  })
  .then(results => {
    const findings = results[0].result.findings;
    if (findings.length > 0) {
      console.log(`Findings:`);
      findings.forEach(finding => {
        console.log(`\tInfo type: ${finding.infoType.name}`);
        console.log(`\tLikelihood: ${finding.likelihood}`);
      });
    } else {
      console.log(`No findings.`);
    }
  })
  .catch(err => {
    console.log(`Error in inspectBigquery: ${err.message || err}`);
  });

Configuring Storage Classification

To inspect a Cloud Storage location, Cloud Datastore kind, or BigQuery table, you send a request to the DLP API containing:

  • Configuration options (StorageConfig), which must include:

  • Optional inspection configuration information (InspectConfig), which lets you configure your query.

  • Optional output configuration information (OutputStorageConfig), which specifies a path to a Cloud Storage location or BigQuery table to store the API's output. This allows you to save the scan results to one or more CSV files at the location you specify.

The results are readable by all authorized and authenticated API callers on the same project that executed the scan, and contain information specific to the scan source (Cloud Storage, Cloud Datastore, or BigQuery).

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Data Loss Prevention API