檢查儲存空間與資料庫以找出機密資料

要妥善管理儲存空間存放區中所儲存的機密資料,首先必須進行儲存空間分類:確認機密資料在存放區的所在位置、機密資料的類型,以及機密資料的使用方式。這些資訊可協助您正確設定存取權控管與共用權限,而且可以成為持續監控方案的一部分。

Cloud Data Loss Prevention (DLP) 可偵測 Cloud Storage 位置、Cloud Datastore 種類或 BigQuery 資料表中儲存的機密資料並進行分類。FileType 的 API 參考頁面上提供了 Cloud Storage 中 Cloud DLP 可掃描檔案類型的副檔名清單。無法辨識的檔案類型則會當做二進位檔案進行掃描。

您可以在要求中指定位置和設定資訊,而不將文字資料直接串流至 API。Cloud DLP 會啟動在指定位置檢查資料的工作,並提供在內容中找到的 infoType可能性值等相關詳細資料。

Cloud DLP API 符合 REST 樣式。您也可以透過幾種不同的程式語言,使用 Cloud DLP 用戶端程式庫利用程式與它互動。

本主題包括以下內容:

  • 每種 Google Cloud Platform 儲存空間存放區類型 (Cloud Storage、Cloud Datastore 和 BigQuery) 的 JSON 示例,以及多種程式設計語言的程式碼示例。
  • 掃描工作的設定選項詳細總覽。
  • 說明如何擷取掃描結果,以及如何管理透過每項成功要求建立的掃描工作。

檢查 Cloud Storage 位置

您可以透過 REST 要求使用 Cloud DLP 來設定 Cloud Storage 位置的檢查作業,也可以透過幾種程式語言使用用戶端程式庫進行。

程式碼示例

以下是 JSON 示例和幾種程式語言的程式碼,示範如何使用 Cloud DLP 檢查 Cloud Storage 位置。如要進一步瞭解要求包含的參數,請參閱本主題後面的設定儲存空間檢查一節。

通訊協定

您可以在傳送 POST 要求時將以下 JSON 示例傳送至指定的 Cloud DLP REST 端點。此 JSON 示例會示範如何使用 Cloud DLP API 檢查 Cloud Storage 值區。如要進一步瞭解要求包含的參數,請參閱本主題後面的設定儲存空間檢查一節。

如要快速嘗試,您可以在 projects.dlpJobs.create 方法參考頁面使用 API Explorer。請注意,即使在 API Explorer,成功提出的要求仍將建立新的掃描工作。要進一步瞭解如何控管掃描工作,請參閱本主題後面的擷取檢查結果一節。要瞭解如何透過 JSON 將要求傳送至 Cloud DLP API 的一般資訊,請參閱 JSON 快速入門

JSON 輸入:

POST https://dlp.googleapis.com/v2/projects/[PROJECT_NAME]/dlpJobs?key={YOUR_API_KEY}

{
  "inspectJob":{
    "storageConfig":{
      "cloudStorageOptions":{
        "fileSet":{
          "url":"gs://[GCS_BUCKET_NAME]/*"
        },
        "bytesLimitPerFile":"1073741824"
      },
      "timespanConfig":{
        "startTime":"2017-11-13T12:34:29.965633345Z",
        "endTime":"2018-01-05T04:45:04.240912125Z"
      }
    },
    "inspectConfig":{
      "infoTypes":[
        {
          "name":"PHONE_NUMBER"
        }
      ],
      "excludeInfoTypes":false,
      "includeQuote":true,
      "minLikelihood":"LIKELY"
    },
    "actions":[
      {
        "saveFindings":{
          "outputConfig":{
            "table":{
              "projectId":"[PROJECT_ID]",
              "datasetId":"[DATASET_ID]"
            }
          }
        }
      }
    ]
  }
}

JSON 輸出:

{
  "name":"projects/[PROJECT_ID]/dlpJobs/i-2304647377058311040",
  "type":"INSPECT_JOB",
  "state":"PENDING",
  "inspectDetails":{
    "requestedOptions":{
      "snapshotInspectTemplate":{

      },
      "jobConfig":{
        "storageConfig":{
          "cloudStorageOptions":{
            "fileSet":{
              "url":"gs://[GCS_BUCKET_NAME]/*"
            },
            "bytesLimitPerFile":"1073741824"
          },
          "timespanConfig":{
            "startTime":"2017-11-13T12:34:29.965633345Z",
            "endTime":"2018-01-05T04:45:04.240912125Z"
          }
        },
        "inspectConfig":{
          "infoTypes":[
            {
              "name":"PHONE_NUMBER"
            }
          ],
          "minLikelihood":"LIKELY",
          "limits":{

          },
          "includeQuote":true
        },
        "actions":[
          {
            "saveFindings":{
              "outputConfig":{
                "table":{
                  "projectId":"[PROJECT_ID]",
                  "datasetId":"[DATASET_ID]",
                  "tableId":"[NEW_TABLE_ID]"
                }
              }
            }
          }
        ]
      }
    }
  },
  "createTime":"2018-11-07T18:01:14.225Z"
}

Java

/**
 * Inspect GCS file for Info types and wait on job completion using Google Cloud Pub/Sub
 * notification
 *
 * @param bucketName The name of the bucket where the file resides.
 * @param fileName The path to the file within the bucket to inspect (can include wildcards, eg.
 *     my-image.*)
 * @param minLikelihood The minimum likelihood required before returning a match
 * @param infoTypes The infoTypes of information to match
 * @param maxFindings The maximum number of findings to report (0 = server maximum)
 * @param topicId Google Cloud Pub/Sub topic Id to notify of job status
 * @param subscriptionId Google Cloud Subscription to above topic to listen for job status updates
 * @param projectId Google Cloud project ID
 */
private static void inspectGcsFile(
    String bucketName,
    String fileName,
    Likelihood minLikelihood,
    List<InfoType> infoTypes,
    List<CustomInfoType> customInfoTypes,
    int maxFindings,
    String topicId,
    String subscriptionId,
    String projectId)
    throws Exception {
  // Instantiates a client
  try (DlpServiceClient dlpServiceClient = DlpServiceClient.create()) {

    CloudStorageOptions cloudStorageOptions =
        CloudStorageOptions.newBuilder()
            .setFileSet(
                CloudStorageOptions.FileSet.newBuilder()
                    .setUrl("gs://" + bucketName + "/" + fileName))
            .build();

    StorageConfig storageConfig =
        StorageConfig.newBuilder().setCloudStorageOptions(cloudStorageOptions).build();

    FindingLimits findingLimits =
        FindingLimits.newBuilder().setMaxFindingsPerRequest(maxFindings).build();

    InspectConfig inspectConfig =
        InspectConfig.newBuilder()
            .addAllInfoTypes(infoTypes)
            .addAllCustomInfoTypes(customInfoTypes)
            .setMinLikelihood(minLikelihood)
            .setLimits(findingLimits)
            .build();

    String pubSubTopic = String.format("projects/%s/topics/%s", projectId, topicId);
    Action.PublishToPubSub publishToPubSub =
        Action.PublishToPubSub.newBuilder().setTopic(pubSubTopic).build();

    Action action = Action.newBuilder().setPubSub(publishToPubSub).build();

    InspectJobConfig inspectJobConfig =
        InspectJobConfig.newBuilder()
            .setStorageConfig(storageConfig)
            .setInspectConfig(inspectConfig)
            .addActions(action)
            .build();

    // Semi-synchronously submit an inspect job, and wait on results
    CreateDlpJobRequest createDlpJobRequest =
        CreateDlpJobRequest.newBuilder()
            .setParent(ProjectName.of(projectId).toString())
            .setInspectJob(inspectJobConfig)
            .build();

    DlpJob dlpJob = dlpServiceClient.createDlpJob(createDlpJobRequest);

    System.out.println("Job created with ID:" + dlpJob.getName());

    final SettableApiFuture<Boolean> done = SettableApiFuture.create();

    // Set up a Pub/Sub subscriber to listen on the job completion status
    Subscriber subscriber =
        Subscriber.newBuilder(
                ProjectSubscriptionName.of(projectId, subscriptionId),
          (pubsubMessage, ackReplyConsumer) -> {
            if (pubsubMessage.getAttributesCount() > 0
                && pubsubMessage
                    .getAttributesMap()
                    .get("DlpJobName")
                    .equals(dlpJob.getName())) {
              // notify job completion
              done.set(true);
              ackReplyConsumer.ack();
            }
          })
            .build();
    subscriber.startAsync();

    // Wait for job completion semi-synchronously
    // For long jobs, consider using a truly asynchronous execution model such as Cloud Functions
    try {
      done.get(1, TimeUnit.MINUTES);
      Thread.sleep(500); // Wait for the job to become available
    } catch (Exception e) {
      System.out.println("Unable to verify job completion.");
    }

    DlpJob completedJob =
        dlpServiceClient.getDlpJob(
            GetDlpJobRequest.newBuilder().setName(dlpJob.getName()).build());

    System.out.println("Job status: " + completedJob.getState());
    InspectDataSourceDetails inspectDataSourceDetails = completedJob.getInspectDetails();
    InspectDataSourceDetails.Result result = inspectDataSourceDetails.getResult();
    if (result.getInfoTypeStatsCount() > 0) {
      System.out.println("Findings: ");
      for (InfoTypeStats infoTypeStat : result.getInfoTypeStatsList()) {
        System.out.print("\tInfo type: " + infoTypeStat.getInfoType().getName());
        System.out.println("\tCount: " + infoTypeStat.getCount());
      }
    } else {
      System.out.println("No findings.");
    }
  }
}

Node.js

// Import the Google Cloud client libraries
const DLP = require('@google-cloud/dlp');
const {PubSub} = require('@google-cloud/pubsub');

// Instantiates clients
const dlp = new DLP.DlpServiceClient();
const pubsub = new PubSub();

// The project ID to run the API call under
// const callingProjectId = process.env.GCLOUD_PROJECT;

// The name of the bucket where the file resides.
// const bucketName = 'YOUR-BUCKET';

// The path to the file within the bucket to inspect.
// Can contain wildcards, e.g. "my-image.*"
// const fileName = 'my-image.png';

// The minimum likelihood required before returning a match
// const minLikelihood = 'LIKELIHOOD_UNSPECIFIED';

// The maximum number of findings to report per request (0 = server maximum)
// const maxFindings = 0;

// The infoTypes of information to match
// const infoTypes = [{ name: 'PHONE_NUMBER' }, { name: 'EMAIL_ADDRESS' }, { name: 'CREDIT_CARD_NUMBER' }];

// The customInfoTypes of information to match
// const customInfoTypes = [{ name: 'DICT_TYPE', dictionary: { wordList: { words: ['foo', 'bar', 'baz']}}},
//   { name: 'REGEX_TYPE', regex: '\\(\\d{3}\\) \\d{3}-\\d{4}'}];

// The name of the Pub/Sub topic to notify once the job completes
// TODO(developer): create a Pub/Sub topic to use for this
// const topicId = 'MY-PUBSUB-TOPIC'

// The name of the Pub/Sub subscription to use when listening for job
// completion notifications
// TODO(developer): create a Pub/Sub subscription to use for this
// const subscriptionId = 'MY-PUBSUB-SUBSCRIPTION'

// Get reference to the file to be inspected
const storageItem = {
  cloudStorageOptions: {
    fileSet: {url: `gs://${bucketName}/${fileName}`},
  },
};

// Construct request for creating an inspect job
const request = {
  parent: dlp.projectPath(callingProjectId),
  inspectJob: {
    inspectConfig: {
      infoTypes: infoTypes,
      customInfoTypes: customInfoTypes,
      minLikelihood: minLikelihood,
      limits: {
        maxFindingsPerRequest: maxFindings,
      },
    },
    storageConfig: storageItem,
    actions: [
      {
        pubSub: {
          topic: `projects/${callingProjectId}/topics/${topicId}`,
        },
      },
    ],
  },
};

try {
  // Create a GCS File inspection job and wait for it to complete
  const [topicResponse] = await pubsub.topic(topicId).get();
  // Verify the Pub/Sub topic and listen for job notifications via an
  // existing subscription.
  const subscription = await topicResponse.subscription(subscriptionId);
  const [jobsResponse] = await dlp.createDlpJob(request);
  // Get the job's ID
  const jobName = jobsResponse.name;
  // Watch the Pub/Sub topic until the DLP job finishes
  await new Promise((resolve, reject) => {
    const messageHandler = message => {
      if (message.attributes && message.attributes.DlpJobName === jobName) {
        message.ack();
        subscription.removeListener('message', messageHandler);
        subscription.removeListener('error', errorHandler);
        resolve(jobName);
      } else {
        message.nack();
      }
    };

    const errorHandler = err => {
      subscription.removeListener('message', messageHandler);
      subscription.removeListener('error', errorHandler);
      reject(err);
    };

    subscription.on('message', messageHandler);
    subscription.on('error', errorHandler);
  });

  setTimeout(() => {
    console.log(`Waiting for DLP job to fully complete`);
  }, 500);
  const [job] = await dlp.getDlpJob({name: jobName});
  console.log(`Job ${job.name} status: ${job.state}`);

  const infoTypeStats = job.inspectDetails.result.infoTypeStats;
  if (infoTypeStats.length > 0) {
    infoTypeStats.forEach(infoTypeStat => {
      console.log(
        `  Found ${infoTypeStat.count} instance(s) of infoType ${
          infoTypeStat.infoType.name
        }.`
      );
    });
  } else {
    console.log(`No findings.`);
  }
} catch (err) {
  console.log(`Error in inspectGCSFile: ${err.message || err}`);
}

Python

def inspect_gcs_file(project, bucket, filename, topic_id, subscription_id,
                     info_types, custom_dictionaries=None,
                     custom_regexes=None, min_likelihood=None,
                     max_findings=None, timeout=300):
    """Uses the Data Loss Prevention API to analyze a file on GCS.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        bucket: The name of the GCS bucket containing the file, as a string.
        filename: The name of the file in the bucket, including the path, as a
            string; e.g. 'images/myfile.png'.
        topic_id: The id of the Cloud Pub/Sub topic to which the API will
            broadcast job completion. The topic must already exist.
        subscription_id: The id of the Cloud Pub/Sub subscription to listen on
            while waiting for job completion. The subscription must already
            exist and be subscribed to the topic.
        info_types: A list of strings representing info types to look for.
            A full list of info type categories can be fetched from the API.
        min_likelihood: A string representing the minimum likelihood threshold
            that constitutes a match. One of: 'LIKELIHOOD_UNSPECIFIED',
            'VERY_UNLIKELY', 'UNLIKELY', 'POSSIBLE', 'LIKELY', 'VERY_LIKELY'.
        max_findings: The maximum number of findings to report; 0 = no maximum.
        timeout: The number of seconds to wait for a response from the API.
    Returns:
        None; the response from the API is printed to the terminal.
    """

    # Import the client library.
    import google.cloud.dlp

    # This sample additionally uses Cloud Pub/Sub to receive results from
    # potentially long-running operations.
    import google.cloud.pubsub

    # This sample also uses threading.Event() to wait for the job to finish.
    import threading

    # Instantiate a client.
    dlp = google.cloud.dlp.DlpServiceClient()

    # Prepare info_types by converting the list of strings into a list of
    # dictionaries (protos are also accepted).
    if not info_types:
        info_types = ['FIRST_NAME', 'LAST_NAME', 'EMAIL_ADDRESS']
    info_types = [{'name': info_type} for info_type in info_types]

    # Prepare custom_info_types by parsing the dictionary word lists and
    # regex patterns.
    if custom_dictionaries is None:
        custom_dictionaries = []
    dictionaries = [{
        'info_type': {'name': 'CUSTOM_DICTIONARY_{}'.format(i)},
        'dictionary': {
            'word_list': {'words': custom_dict.split(',')}
        }
    } for i, custom_dict in enumerate(custom_dictionaries)]
    if custom_regexes is None:
        custom_regexes = []
    regexes = [{
        'info_type': {'name': 'CUSTOM_REGEX_{}'.format(i)},
        'regex': {'pattern': custom_regex}
    } for i, custom_regex in enumerate(custom_regexes)]
    custom_info_types = dictionaries + regexes

    # Construct the configuration dictionary. Keys which are None may
    # optionally be omitted entirely.
    inspect_config = {
        'info_types': info_types,
        'custom_info_types': custom_info_types,
        'min_likelihood': min_likelihood,
        'limits': {'max_findings_per_request': max_findings},
    }

    # Construct a storage_config containing the file's URL.
    url = 'gs://{}/{}'.format(bucket, filename)
    storage_config = {
        'cloud_storage_options': {
            'file_set': {'url': url}
        }
    }

    # Convert the project id into a full resource id.
    parent = dlp.project_path(project)

    # Tell the API where to send a notification when the job is complete.
    actions = [{
        'pub_sub': {'topic': '{}/topics/{}'.format(parent, topic_id)}
    }]

    # Construct the inspect_job, which defines the entire inspect content task.
    inspect_job = {
        'inspect_config': inspect_config,
        'storage_config': storage_config,
        'actions': actions,
    }

    operation = dlp.create_dlp_job(parent, inspect_job=inspect_job)

    # Create a Pub/Sub client and find the subscription. The subscription is
    # expected to already be listening to the topic.
    subscriber = google.cloud.pubsub.SubscriberClient()
    subscription_path = subscriber.subscription_path(
        project, subscription_id)

    # Set up a callback to acknowledge a message. This closes around an event
    # so that it can signal that it is done and the main thread can continue.
    job_done = threading.Event()

    def callback(message):
        try:
            if (message.attributes['DlpJobName'] == operation.name):
                # This is the message we're looking for, so acknowledge it.
                message.ack()

                # Now that the job is done, fetch the results and print them.
                job = dlp.get_dlp_job(operation.name)
                if job.inspect_details.result.info_type_stats:
                    for finding in job.inspect_details.result.info_type_stats:
                        print('Info type: {}; Count: {}'.format(
                            finding.info_type.name, finding.count))
                else:
                    print('No findings.')

                # Signal to the main thread that we can exit.
                job_done.set()
            else:
                # This is not the message we're looking for.
                message.drop()
        except Exception as e:
            # Because this is executing in a thread, an exception won't be
            # noted unless we print it manually.
            print(e)
            raise

    subscriber.subscribe(subscription_path, callback=callback)
    finished = job_done.wait(timeout=timeout)
    if not finished:
        print('No event received before the timeout. Please verify that the '
              'subscription provided is subscribed to the topic provided.')

Go

// inspectGCSFile searches for the given info types in the given file.
func inspectGCSFile(w io.Writer, client *dlp.Client, project string, minLikelihood dlppb.Likelihood, maxFindings int32, includeQuote bool, infoTypes []string, customDictionaries []string, customRegexes []string, pubSubTopic, pubSubSub, bucketName, fileName string) {
	// Convert the info type strings to a list of InfoTypes.
	var i []*dlppb.InfoType
	for _, it := range infoTypes {
		i = append(i, &dlppb.InfoType{Name: it})
	}
	// Convert the custom dictionary word lists and custom regexes to a list of CustomInfoTypes.
	var customInfoTypes []*dlppb.CustomInfoType
	for idx, it := range customDictionaries {
		customInfoTypes = append(customInfoTypes, &dlppb.CustomInfoType{
			InfoType: &dlppb.InfoType{
				Name: fmt.Sprintf("CUSTOM_DICTIONARY_%d", idx),
			},
			Type: &dlppb.CustomInfoType_Dictionary_{
				Dictionary: &dlppb.CustomInfoType_Dictionary{
					Source: &dlppb.CustomInfoType_Dictionary_WordList_{
						WordList: &dlppb.CustomInfoType_Dictionary_WordList{
							Words: strings.Split(it, ","),
						},
					},
				},
			},
		})
	}
	for idx, it := range customRegexes {
		customInfoTypes = append(customInfoTypes, &dlppb.CustomInfoType{
			InfoType: &dlppb.InfoType{
				Name: fmt.Sprintf("CUSTOM_REGEX_%d", idx),
			},
			Type: &dlppb.CustomInfoType_Regex_{
				Regex: &dlppb.CustomInfoType_Regex{
					Pattern: it,
				},
			},
		})
	}

	ctx := context.Background()

	// Create a PubSub Client used to listen for when the inspect job finishes.
	pClient, err := pubsub.NewClient(ctx, project)
	if err != nil {
		log.Fatalf("Error creating PubSub client: %v", err)
	}
	defer pClient.Close()

	// Create a PubSub subscription we can use to listen for messages.
	s, err := setupPubSub(ctx, pClient, project, pubSubTopic, pubSubSub)
	if err != nil {
		log.Fatalf("Error setting up PubSub: %v\n", err)
	}

	// topic is the PubSub topic string where messages should be sent.
	topic := "projects/" + project + "/topics/" + pubSubTopic

	// Create a configured request.
	req := &dlppb.CreateDlpJobRequest{
		Parent: "projects/" + project,
		Job: &dlppb.CreateDlpJobRequest_InspectJob{
			InspectJob: &dlppb.InspectJobConfig{
				// StorageConfig describes where to find the data.
				StorageConfig: &dlppb.StorageConfig{
					Type: &dlppb.StorageConfig_CloudStorageOptions{
						CloudStorageOptions: &dlppb.CloudStorageOptions{
							FileSet: &dlppb.CloudStorageOptions_FileSet{
								Url: "gs://" + bucketName + "/" + fileName,
							},
						},
					},
				},
				// InspectConfig describes what fields to look for.
				InspectConfig: &dlppb.InspectConfig{
					InfoTypes:       i,
					CustomInfoTypes: customInfoTypes,
					MinLikelihood:   minLikelihood,
					Limits: &dlppb.InspectConfig_FindingLimits{
						MaxFindingsPerRequest: maxFindings,
					},
					IncludeQuote: includeQuote,
				},
				// Send a message to PubSub using Actions.
				Actions: []*dlppb.Action{
					{
						Action: &dlppb.Action_PubSub{
							PubSub: &dlppb.Action_PublishToPubSub{
								Topic: topic,
							},
						},
					},
				},
			},
		},
	}
	// Create the inspect job.
	j, err := client.CreateDlpJob(context.Background(), req)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Fprintf(w, "Created job: %v\n", j.GetName())

	// Wait for the inspect job to finish by waiting for a PubSub message.
	ctx, cancel := context.WithCancel(ctx)
	err = s.Receive(ctx, func(ctx context.Context, msg *pubsub.Message) {
		// If this is the wrong job, do not process the result.
		if msg.Attributes["DlpJobName"] != j.GetName() {
			msg.Nack()
			return
		}
		msg.Ack()
		resp, err := client.GetDlpJob(ctx, &dlppb.GetDlpJobRequest{
			Name: j.GetName(),
		})
		if err != nil {
			log.Fatalf("Error getting completed job: %v\n", err)
		}
		r := resp.GetInspectDetails().GetResult().GetInfoTypeStats()
		if len(r) == 0 {
			fmt.Fprintf(w, "No results")
		}
		for _, s := range r {
			fmt.Fprintf(w, "  Found %v instances of infoType %v\n", s.GetCount(), s.GetInfoType().GetName())
		}
		// Stop listening for more messages.
		cancel()
	})
	if err != nil {
		log.Fatalf("Error receiving from PubSub: %v\n", err)
	}
}

PHP

use Google\Cloud\Dlp\V2\DlpServiceClient;
use Google\Cloud\Dlp\V2\CloudStorageOptions;
use Google\Cloud\Dlp\V2\CloudStorageOptions\FileSet;
use Google\Cloud\Dlp\V2\InfoType;
use Google\Cloud\Dlp\V2\InspectConfig;
use Google\Cloud\Dlp\V2\StorageConfig;
use Google\Cloud\Dlp\V2\Likelihood;
use Google\Cloud\Dlp\V2\DlpJob\JobState;
use Google\Cloud\Dlp\V2\InspectConfig\FindingLimits;
use Google\Cloud\Dlp\V2\Action;
use Google\Cloud\Dlp\V2\Action\PublishToPubSub;
use Google\Cloud\Dlp\V2\InspectJobConfig;
use Google\Cloud\PubSub\PubSubClient;

/**
 * Inspect a file stored on Google Cloud Storage , using Pub/Sub for job status notifications.
 *
 * @param string $callingProjectId The project ID to run the API call under
 * @param string $bucketId The name of the bucket where the file resides
 * @param string $file The path to the file within the bucket to inspect. Can contain wildcards
 *        e.g. "my-image.*"
 * @param string $topicId The name of the Pub/Sub topic to notify once the job completes
 * @param string $subscriptionId The name of the Pub/Sub subscription to use when listening for job
 * @param int $maxFindings (Optional) The maximum number of findings to report per request (0 = server maximum)
 */
function inspect_gcs(
    $callingProjectId,
    $bucketId,
    $file,
    $topicId,
    $subscriptionId,
    $maxFindings = 0
) {
    // Instantiate a client.
    $dlp = new DlpServiceClient([
        'projectId' => $callingProjectId,
    ]);
    $pubsub = new PubSubClient([
        'projectId' => $callingProjectId,
    ]);
    $topic = $pubsub->topic($topicId);

    // The infoTypes of information to match
    $personNameInfoType = (new InfoType())
        ->setName('PERSON_NAME');
    $creditCardNumberInfoType = (new InfoType())
        ->setName('CREDIT_CARD_NUMBER');
    $infoTypes = [$personNameInfoType, $creditCardNumberInfoType];

    // The minimum likelihood required before returning a match
    $minLikelihood = likelihood::LIKELIHOOD_UNSPECIFIED;

    // Specify finding limits
    $limits = (new FindingLimits())
        ->setMaxFindingsPerRequest($maxFindings);

    // Construct items to be inspected
    $fileSet = (new FileSet())
        ->setUrl('gs://' . $bucketId . '/' . $file);

    $cloudStorageOptions = (new CloudStorageOptions())
        ->setFileSet($fileSet);

    $storageConfig = (new StorageConfig())
        ->setCloudStorageOptions($cloudStorageOptions);

    // Construct the inspect config object
    $inspectConfig = (new InspectConfig())
        ->setMinLikelihood($minLikelihood)
        ->setLimits($limits)
        ->setInfoTypes($infoTypes);

    // Construct the action to run when job completes
    $pubSubAction = (new PublishToPubSub())
        ->setTopic($topic->name());

    $action = (new Action())
        ->setPubSub($pubSubAction);

    // Construct inspect job config to run
    $inspectJob = (new InspectJobConfig())
        ->setInspectConfig($inspectConfig)
        ->setStorageConfig($storageConfig)
        ->setActions([$action]);

    // Listen for job notifications via an existing topic/subscription.
    $subscription = $topic->subscription($subscriptionId);

    // Submit request
    $parent = $dlp->projectName($callingProjectId);
    $job = $dlp->createDlpJob($parent, [
        'inspectJob' => $inspectJob
    ]);

    // Poll via Pub/Sub until job finishes
    while (true) {
        foreach ($subscription->pull() as $message) {
            if (isset($message->attributes()['DlpJobName']) &&
                $message->attributes()['DlpJobName'] === $job->getName()) {
                $subscription->acknowledge($message);
                break 2;
            }
        }
    }

    // Sleep for one second to avoid race condition with the job's status.
    usleep(1000000);

    // Get the updated job
    $job = $dlp->getDlpJob($job->getName());

    // Print finding counts
    printf('Job %s status: %s' . PHP_EOL, $job->getName(), $job->getState());
    switch ($job->getState()) {
        case JobState::DONE:
            $infoTypeStats = $job->getInspectDetails()->getResult()->getInfoTypeStats();
            if (count($infoTypeStats) === 0) {
                print('No findings.' . PHP_EOL);
            } else {
                foreach ($infoTypeStats as $infoTypeStat) {
                    printf('  Found %s instance(s) of infoType %s' . PHP_EOL, $infoTypeStat->getCount(), $infoTypeStat->getInfoType()->getName());
                }
            }
            break;
        case JobState::FAILED:
            printf('Job %s had errors:' . PHP_EOL, $job->getName());
            $errors = $job->getErrors();
            foreach ($errors as $error) {
                var_dump($error->getDetails());
            }
            break;
        default:
            print('Unexpected job state. Most likely, the job is either running or has not yet started.');
    }
}

C#

public static object InspectGCS(
    string projectId,
    string minLikelihood,
    int maxFindings,
    bool includeQuote,
    IEnumerable<InfoType> infoTypes,
    IEnumerable<CustomInfoType> customInfoTypes,
    string bucketName,
    string topicId,
    string subscriptionId)
{
    var inspectJob = new InspectJobConfig
    {
        StorageConfig = new StorageConfig
        {
            CloudStorageOptions = new CloudStorageOptions
            {
                FileSet = new CloudStorageOptions.Types.FileSet { Url = $"gs://{bucketName}/*.txt" },
                BytesLimitPerFile = 1073741824
            },
        },
        InspectConfig = new InspectConfig
        {
            InfoTypes = { infoTypes },
            CustomInfoTypes = { customInfoTypes },
            ExcludeInfoTypes = false,
            IncludeQuote = includeQuote,
            Limits = new FindingLimits
            {
                MaxFindingsPerRequest = maxFindings
            },
            MinLikelihood = (Likelihood)System.Enum.Parse(typeof(Likelihood), minLikelihood)
        },
        Actions =
        {
            new Google.Cloud.Dlp.V2.Action
            {
                // Send results to Pub/Sub topic
                PubSub = new Google.Cloud.Dlp.V2.Action.Types.PublishToPubSub
                {
                    Topic = topicId,
                }
            }
        }
    };

    // Issue Create Dlp Job Request
    DlpServiceClient client = DlpServiceClient.Create();
    var request = new CreateDlpJobRequest
    {
        InspectJob = inspectJob,
        ParentAsProjectName = new ProjectName(projectId),
    };

    // We need created job name
    var dlpJob = client.CreateDlpJob(request);

    // Get a pub/sub subscription and listen for DLP results
    var fireEvent = new ManualResetEventSlim();

    var subscriptionName = new SubscriptionName(projectId, subscriptionId);
    var subscriber = SubscriberClient.CreateAsync(subscriptionName).Result;
    subscriber.StartAsync(
        (pubSubMessage, cancellationToken) =>
        {
            // Given a message that we receive on this subscription, we should either acknowledge or decline it
            if (pubSubMessage.Attributes["DlpJobName"] == dlpJob.Name)
            {
                fireEvent.Set();
                return Task.FromResult(SubscriberClient.Reply.Ack);
            }

            return Task.FromResult(SubscriberClient.Reply.Nack);
        });

    // We block here until receiving a signal from a separate thread that is waiting on a message indicating receiving a result of Dlp job
    if (fireEvent.Wait(TimeSpan.FromMinutes(1)))
    {
        // Stop the thread that is listening to messages as a result of StartAsync call earlier
        subscriber.StopAsync(CancellationToken.None).Wait();

        // Now we can inspect full job results
        var job = client.GetDlpJob(new GetDlpJobRequest { DlpJobName = new DlpJobName(projectId, dlpJob.Name) });

        // Inspect Job details
        Console.WriteLine($"Processed bytes: {job.InspectDetails.Result.ProcessedBytes}");
        Console.WriteLine($"Total estimated bytes: {job.InspectDetails.Result.TotalEstimatedBytes}");
        var stats = job.InspectDetails.Result.InfoTypeStats;
        Console.WriteLine("Found stats:");
        foreach (var stat in stats)
        {
            Console.WriteLine($"{stat.InfoType.Name}");
        }
    }
    else
    {
        Console.WriteLine("Error: The wait failed on timeout");
    }

    return 0;
}

檢查 Cloud Datastore 種類

您可以透過 REST 要求使用 Cloud DLP API 設定 Cloud Datastore 種類的檢查作業,也可以透過幾種程式語言使用用戶端程式庫進行。

程式碼示例

以下是 JSON 示例和幾種程式語言的程式碼,示範如何使用 Cloud DLP 檢查 Cloud Datastore 種類。如要進一步瞭解要求包含的參數,請參閱本主題後面的設定儲存空間檢查一節。

通訊協定

您可以在傳送 POST 要求時將以下 JSON 示例傳送至指定的 Cloud DLP API REST 端點。這個 JSON 示例示範了如何使用 Cloud DLP API 檢查 Cloud Datastore 種類。如要進一步瞭解要求包含的參數,請參閱本主題後面的設定儲存空間檢查一節。

如要快速嘗試,您可以在 projects.dlpJobs.create 方法參考頁面使用 API Explorer。請注意,即使在 API Explorer,成功提出的要求仍將建立新的掃描工作。要進一步瞭解如何控管掃描工作,請參閱本主題後面的擷取檢查結果一節。如要瞭解如何透過 JSON 將要求傳送至 Cloud DLP API 的一般資訊,請參閱 JSON 快速入門

JSON 輸入:

POST https://dlp.googleapis.com/v2/projects/[PROJECT_NAME]/dlpJobs?key={YOUR_API_KEY}

{
  "inspectJob":{
    "storageConfig":{
      "datastoreOptions":{
        "kind":{
          "name":"Example-Kind"
        },
        "partitionId":{
          "namespaceId":"[NAMESPACE_ID]",
          "projectId":"[PROJECT_ID]"
        }
      }
    },
    "inspectConfig":{
      "infoTypes":[
        {
          "name":"PHONE_NUMBER"
        }
      ],
      "excludeInfoTypes":false,
      "includeQuote":true,
      "minLikelihood":"LIKELY"
    },
    "actions":[
      {
        "saveFindings":{
          "outputConfig":{
            "table":{
              "projectId":"[PROJECT_ID]",
              "datasetId":"[BIGQUERY-DATASET-NAME]",
              "tableId":"[BIGQUERY-TABLE-NAME]"
            }
          }
        }
      }
    ]
  }
}

Java

/**
 * Inspect a Datastore kind
 *
 * @param projectId The project ID containing the target Datastore
 * @param namespaceId The ID namespace of the Datastore document to inspect
 * @param kind The kind of the Datastore entity to inspect
 * @param minLikelihood The minimum likelihood required before returning a match
 * @param infoTypes The infoTypes of information to match
 * @param maxFindings max number of findings
 * @param topicId Google Cloud Pub/Sub topic to notify job status updates
 * @param subscriptionId Google Cloud Pub/Sub subscription to above topic to receive status
 *     updates
 */
private static void inspectDatastore(
    String projectId,
    String namespaceId,
    String kind,
    Likelihood minLikelihood,
    List<InfoType> infoTypes,
    List<CustomInfoType> customInfoTypes,
    int maxFindings,
    String topicId,
    String subscriptionId) {
  // Instantiates a client
  try (DlpServiceClient dlpServiceClient = DlpServiceClient.create()) {

    // Reference to the Datastore namespace
    PartitionId partitionId =
        PartitionId.newBuilder().setProjectId(projectId).setNamespaceId(namespaceId).build();

    // Reference to the Datastore kind
    KindExpression kindExpression = KindExpression.newBuilder().setName(kind).build();
    DatastoreOptions datastoreOptions =
        DatastoreOptions.newBuilder().setKind(kindExpression).setPartitionId(partitionId).build();

    // Construct Datastore configuration to be inspected
    StorageConfig storageConfig =
        StorageConfig.newBuilder().setDatastoreOptions(datastoreOptions).build();

    FindingLimits findingLimits =
        FindingLimits.newBuilder().setMaxFindingsPerRequest(maxFindings).build();

    InspectConfig inspectConfig =
        InspectConfig.newBuilder()
            .addAllInfoTypes(infoTypes)
            .addAllCustomInfoTypes(customInfoTypes)
            .setMinLikelihood(minLikelihood)
            .setLimits(findingLimits)
            .build();

    String pubSubTopic = String.format("projects/%s/topics/%s", projectId, topicId);
    Action.PublishToPubSub publishToPubSub =
        Action.PublishToPubSub.newBuilder().setTopic(pubSubTopic).build();

    Action action = Action.newBuilder().setPubSub(publishToPubSub).build();

    InspectJobConfig inspectJobConfig =
        InspectJobConfig.newBuilder()
            .setStorageConfig(storageConfig)
            .setInspectConfig(inspectConfig)
            .addActions(action)
            .build();

    // Asynchronously submit an inspect job, and wait on results
    CreateDlpJobRequest createDlpJobRequest =
        CreateDlpJobRequest.newBuilder()
            .setParent(ProjectName.of(projectId).toString())
            .setInspectJob(inspectJobConfig)
            .build();

    DlpJob dlpJob = dlpServiceClient.createDlpJob(createDlpJobRequest);

    System.out.println("Job created with ID:" + dlpJob.getName());

    final SettableApiFuture<Boolean> done = SettableApiFuture.create();

    // Set up a Pub/Sub subscriber to listen on the job completion status
    Subscriber subscriber =
        Subscriber.newBuilder(
                ProjectSubscriptionName.of(projectId, subscriptionId),
          (pubsubMessage, ackReplyConsumer) -> {
            if (pubsubMessage.getAttributesCount() > 0
                && pubsubMessage
                    .getAttributesMap()
                    .get("DlpJobName")
                    .equals(dlpJob.getName())) {
              // notify job completion
              done.set(true);
              ackReplyConsumer.ack();
            }
          })
            .build();
    subscriber.startAsync();

    // Wait for job completion semi-synchronously
    // For long jobs, consider using a truly asynchronous execution model such as Cloud Functions
    try {
      done.get(1, TimeUnit.MINUTES);
      Thread.sleep(500); // Wait for the job to become available
    } catch (Exception e) {
      System.out.println("Unable to verify job completion.");
    }

    DlpJob completedJob =
        dlpServiceClient.getDlpJob(
            GetDlpJobRequest.newBuilder().setName(dlpJob.getName()).build());

    System.out.println("Job status: " + completedJob.getState());
    InspectDataSourceDetails inspectDataSourceDetails = completedJob.getInspectDetails();
    InspectDataSourceDetails.Result result = inspectDataSourceDetails.getResult();
    if (result.getInfoTypeStatsCount() > 0) {
      System.out.println("Findings: ");
      for (InfoTypeStats infoTypeStat : result.getInfoTypeStatsList()) {
        System.out.print("\tInfo type: " + infoTypeStat.getInfoType().getName());
        System.out.println("\tCount: " + infoTypeStat.getCount());
      }
    } else {
      System.out.println("No findings.");
    }
  } catch (Exception e) {
    System.out.println("inspectDatastore Problems: " + e.getMessage());
  }
}

Node.js

// Import the Google Cloud client libraries
const DLP = require('@google-cloud/dlp');
const {PubSub} = require('@google-cloud/pubsub');

// Instantiates clients
const dlp = new DLP.DlpServiceClient();
const pubsub = new PubSub();

// The project ID to run the API call under
// const callingProjectId = process.env.GCLOUD_PROJECT;

// The project ID the target Datastore is stored under
// This may or may not equal the calling project ID
// const dataProjectId = process.env.GCLOUD_PROJECT;

// (Optional) The ID namespace of the Datastore document to inspect.
// To ignore Datastore namespaces, set this to an empty string ('')
// const namespaceId = '';

// The kind of the Datastore entity to inspect.
// const kind = 'Person';

// The minimum likelihood required before returning a match
// const minLikelihood = 'LIKELIHOOD_UNSPECIFIED';

// The maximum number of findings to report per request (0 = server maximum)
// const maxFindings = 0;

// The infoTypes of information to match
// const infoTypes = [{ name: 'PHONE_NUMBER' }, { name: 'EMAIL_ADDRESS' }, { name: 'CREDIT_CARD_NUMBER' }];

// The customInfoTypes of information to match
// const customInfoTypes = [{ name: 'DICT_TYPE', dictionary: { wordList: { words: ['foo', 'bar', 'baz']}}},
//   { name: 'REGEX_TYPE', regex: '\\(\\d{3}\\) \\d{3}-\\d{4}'}];

// The name of the Pub/Sub topic to notify once the job completes
// TODO(developer): create a Pub/Sub topic to use for this
// const topicId = 'MY-PUBSUB-TOPIC'

// The name of the Pub/Sub subscription to use when listening for job
// completion notifications
// TODO(developer): create a Pub/Sub subscription to use for this
// const subscriptionId = 'MY-PUBSUB-SUBSCRIPTION'

// Construct items to be inspected
const storageItems = {
  datastoreOptions: {
    partitionId: {
      projectId: dataProjectId,
      namespaceId: namespaceId,
    },
    kind: {
      name: kind,
    },
  },
};

// Construct request for creating an inspect job
const request = {
  parent: dlp.projectPath(callingProjectId),
  inspectJob: {
    inspectConfig: {
      infoTypes: infoTypes,
      customInfoTypes: customInfoTypes,
      minLikelihood: minLikelihood,
      limits: {
        maxFindingsPerRequest: maxFindings,
      },
    },
    storageConfig: storageItems,
    actions: [
      {
        pubSub: {
          topic: `projects/${callingProjectId}/topics/${topicId}`,
        },
      },
    ],
  },
};
try {
  // Run inspect-job creation request
  const [topicResponse] = await pubsub.topic(topicId).get();
  // Verify the Pub/Sub topic and listen for job notifications via an
  // existing subscription.
  const subscription = await topicResponse.subscription(subscriptionId);
  const [jobsResponse] = await dlp.createDlpJob(request);
  const jobName = jobsResponse.name;
  // Watch the Pub/Sub topic until the DLP job finishes
  await new Promise((resolve, reject) => {
    const messageHandler = message => {
      if (message.attributes && message.attributes.DlpJobName === jobName) {
        message.ack();
        subscription.removeListener('message', messageHandler);
        subscription.removeListener('error', errorHandler);
        resolve(jobName);
      } else {
        message.nack();
      }
    };

    const errorHandler = err => {
      subscription.removeListener('message', messageHandler);
      subscription.removeListener('error', errorHandler);
      reject(err);
    };

    subscription.on('message', messageHandler);
    subscription.on('error', errorHandler);
  });
  // Wait for DLP job to fully complete
  setTimeout(() => {
    console.log(`Waiting for DLP job to fully complete`);
  }, 500);
  const [job] = await dlp.getDlpJob({name: jobName});
  console.log(`Job ${job.name} status: ${job.state}`);

  const infoTypeStats = job.inspectDetails.result.infoTypeStats;
  if (infoTypeStats.length > 0) {
    infoTypeStats.forEach(infoTypeStat => {
      console.log(
        `  Found ${infoTypeStat.count} instance(s) of infoType ${
          infoTypeStat.infoType.name
        }.`
      );
    });
  } else {
    console.log(`No findings.`);
  }
} catch (err) {
  console.log(`Error in inspectDatastore: ${err.message || err}`);
}

Python

def inspect_datastore(project, datastore_project, kind,
                      topic_id, subscription_id, info_types,
                      custom_dictionaries=None, custom_regexes=None,
                      namespace_id=None, min_likelihood=None,
                      max_findings=None, timeout=300):
    """Uses the Data Loss Prevention API to analyze Datastore data.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        datastore_project: The Google Cloud project id of the target Datastore.
        kind: The kind of the Datastore entity to inspect, e.g. 'Person'.
        topic_id: The id of the Cloud Pub/Sub topic to which the API will
            broadcast job completion. The topic must already exist.
        subscription_id: The id of the Cloud Pub/Sub subscription to listen on
            while waiting for job completion. The subscription must already
            exist and be subscribed to the topic.
        info_types: A list of strings representing info types to look for.
            A full list of info type categories can be fetched from the API.
        namespace_id: The namespace of the Datastore document, if applicable.
        min_likelihood: A string representing the minimum likelihood threshold
            that constitutes a match. One of: 'LIKELIHOOD_UNSPECIFIED',
            'VERY_UNLIKELY', 'UNLIKELY', 'POSSIBLE', 'LIKELY', 'VERY_LIKELY'.
        max_findings: The maximum number of findings to report; 0 = no maximum.
        timeout: The number of seconds to wait for a response from the API.
    Returns:
        None; the response from the API is printed to the terminal.
    """

    # Import the client library.
    import google.cloud.dlp

    # This sample additionally uses Cloud Pub/Sub to receive results from
    # potentially long-running operations.
    import google.cloud.pubsub

    # This sample also uses threading.Event() to wait for the job to finish.
    import threading

    # Instantiate a client.
    dlp = google.cloud.dlp.DlpServiceClient()

    # Prepare info_types by converting the list of strings into a list of
    # dictionaries (protos are also accepted).
    if not info_types:
        info_types = ['FIRST_NAME', 'LAST_NAME', 'EMAIL_ADDRESS']
    info_types = [{'name': info_type} for info_type in info_types]

    # Prepare custom_info_types by parsing the dictionary word lists and
    # regex patterns.
    if custom_dictionaries is None:
        custom_dictionaries = []
    dictionaries = [{
        'info_type': {'name': 'CUSTOM_DICTIONARY_{}'.format(i)},
        'dictionary': {
            'word_list': {'words': custom_dict.split(',')}
        }
    } for i, custom_dict in enumerate(custom_dictionaries)]
    if custom_regexes is None:
        custom_regexes = []
    regexes = [{
        'info_type': {'name': 'CUSTOM_REGEX_{}'.format(i)},
        'regex': {'pattern': custom_regex}
    } for i, custom_regex in enumerate(custom_regexes)]
    custom_info_types = dictionaries + regexes

    # Construct the configuration dictionary. Keys which are None may
    # optionally be omitted entirely.
    inspect_config = {
        'info_types': info_types,
        'custom_info_types': custom_info_types,
        'min_likelihood': min_likelihood,
        'limits': {'max_findings_per_request': max_findings},
    }

    # Construct a storage_config containing the target Datastore info.
    storage_config = {
        'datastore_options': {
            'partition_id': {
                'project_id': datastore_project,
                'namespace_id': namespace_id,
            },
            'kind': {
                'name': kind
            },
        }
    }

    # Convert the project id into a full resource id.
    parent = dlp.project_path(project)

    # Tell the API where to send a notification when the job is complete.
    actions = [{
        'pub_sub': {'topic': '{}/topics/{}'.format(parent, topic_id)}
    }]

    # Construct the inspect_job, which defines the entire inspect content task.
    inspect_job = {
        'inspect_config': inspect_config,
        'storage_config': storage_config,
        'actions': actions,
    }

    operation = dlp.create_dlp_job(parent, inspect_job=inspect_job)

    # Create a Pub/Sub client and find the subscription. The subscription is
    # expected to already be listening to the topic.
    subscriber = google.cloud.pubsub.SubscriberClient()
    subscription_path = subscriber.subscription_path(
        project, subscription_id)

    # Set up a callback to acknowledge a message. This closes around an event
    # so that it can signal that it is done and the main thread can continue.
    job_done = threading.Event()

    def callback(message):
        try:
            if (message.attributes['DlpJobName'] == operation.name):
                # This is the message we're looking for, so acknowledge it.
                message.ack()

                # Now that the job is done, fetch the results and print them.
                job = dlp.get_dlp_job(operation.name)
                if job.inspect_details.result.info_type_stats:
                    for finding in job.inspect_details.result.info_type_stats:
                        print('Info type: {}; Count: {}'.format(
                            finding.info_type.name, finding.count))
                else:
                    print('No findings.')

                # Signal to the main thread that we can exit.
                job_done.set()
            else:
                # This is not the message we're looking for.
                message.drop()
        except Exception as e:
            # Because this is executing in a thread, an exception won't be
            # noted unless we print it manually.
            print(e)
            raise

    # Register the callback and wait on the event.
    subscriber.subscribe(subscription_path, callback=callback)

    finished = job_done.wait(timeout=timeout)
    if not finished:
        print('No event received before the timeout. Please verify that the '
              'subscription provided is subscribed to the topic provided.')

Go

// inspectDatastore searches for the given info types in the given dataset kind.
func inspectDatastore(w io.Writer, client *dlp.Client, project string, minLikelihood dlppb.Likelihood, maxFindings int32, includeQuote bool, infoTypes []string, customDictionaries []string, customRegexes []string, pubSubTopic, pubSubSub, dataProject, namespaceID, kind string) {
	// Convert the info type strings to a list of InfoTypes.
	var i []*dlppb.InfoType
	for _, it := range infoTypes {
		i = append(i, &dlppb.InfoType{Name: it})
	}
	// Convert the custom dictionary word lists and custom regexes to a list of CustomInfoTypes.
	var customInfoTypes []*dlppb.CustomInfoType
	for idx, it := range customDictionaries {
		customInfoTypes = append(customInfoTypes, &dlppb.CustomInfoType{
			InfoType: &dlppb.InfoType{
				Name: fmt.Sprintf("CUSTOM_DICTIONARY_%d", idx),
			},
			Type: &dlppb.CustomInfoType_Dictionary_{
				Dictionary: &dlppb.CustomInfoType_Dictionary{
					Source: &dlppb.CustomInfoType_Dictionary_WordList_{
						WordList: &dlppb.CustomInfoType_Dictionary_WordList{
							Words: strings.Split(it, ","),
						},
					},
				},
			},
		})
	}
	for idx, it := range customRegexes {
		customInfoTypes = append(customInfoTypes, &dlppb.CustomInfoType{
			InfoType: &dlppb.InfoType{
				Name: fmt.Sprintf("CUSTOM_REGEX_%d", idx),
			},
			Type: &dlppb.CustomInfoType_Regex_{
				Regex: &dlppb.CustomInfoType_Regex{
					Pattern: it,
				},
			},
		})
	}

	ctx := context.Background()

	// Create a PubSub Client used to listen for when the inspect job finishes.
	pClient, err := pubsub.NewClient(ctx, project)
	if err != nil {
		log.Fatalf("Error creating PubSub client: %v", err)
	}
	defer pClient.Close()

	// Create a PubSub subscription we can use to listen for messages.
	s, err := setupPubSub(ctx, pClient, project, pubSubTopic, pubSubSub)
	if err != nil {
		log.Fatalf("Error setting up PubSub: %v\n", err)
	}

	// topic is the PubSub topic string where messages should be sent.
	topic := "projects/" + project + "/topics/" + pubSubTopic

	// Create a configured request.
	req := &dlppb.CreateDlpJobRequest{
		Parent: "projects/" + project,
		Job: &dlppb.CreateDlpJobRequest_InspectJob{
			InspectJob: &dlppb.InspectJobConfig{
				// StorageConfig describes where to find the data.
				StorageConfig: &dlppb.StorageConfig{
					Type: &dlppb.StorageConfig_DatastoreOptions{
						DatastoreOptions: &dlppb.DatastoreOptions{
							PartitionId: &dlppb.PartitionId{
								ProjectId:   dataProject,
								NamespaceId: namespaceID,
							},
							Kind: &dlppb.KindExpression{
								Name: kind,
							},
						},
					},
				},
				// InspectConfig describes what fields to look for.
				InspectConfig: &dlppb.InspectConfig{
					InfoTypes:       i,
					CustomInfoTypes: customInfoTypes,
					MinLikelihood:   minLikelihood,
					Limits: &dlppb.InspectConfig_FindingLimits{
						MaxFindingsPerRequest: maxFindings,
					},
					IncludeQuote: includeQuote,
				},
				// Send a message to PubSub using Actions.
				Actions: []*dlppb.Action{
					{
						Action: &dlppb.Action_PubSub{
							PubSub: &dlppb.Action_PublishToPubSub{
								Topic: topic,
							},
						},
					},
				},
			},
		},
	}
	// Create the inspect job.
	j, err := client.CreateDlpJob(context.Background(), req)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Fprintf(w, "Created job: %v\n", j.GetName())

	// Wait for the inspect job to finish by waiting for a PubSub message.
	ctx, cancel := context.WithCancel(ctx)
	err = s.Receive(ctx, func(ctx context.Context, msg *pubsub.Message) {
		// If this is the wrong job, do not process the result.
		if msg.Attributes["DlpJobName"] != j.GetName() {
			msg.Nack()
			return
		}
		msg.Ack()
		resp, err := client.GetDlpJob(ctx, &dlppb.GetDlpJobRequest{
			Name: j.GetName(),
		})
		if err != nil {
			log.Fatalf("Error getting completed job: %v\n", err)
		}
		r := resp.GetInspectDetails().GetResult().GetInfoTypeStats()
		if len(r) == 0 {
			fmt.Fprintf(w, "No results")
		}
		for _, s := range r {
			fmt.Fprintf(w, "  Found %v instances of infoType %v\n", s.GetCount(), s.GetInfoType().GetName())
		}
		// Stop listening for more messages.
		cancel()
	})
	if err != nil {
		log.Fatalf("Error receiving from PubSub: %v\n", err)
	}
}

PHP

use Google\Cloud\Dlp\V2\DlpServiceClient;
use Google\Cloud\Dlp\V2\DatastoreOptions;
use Google\Cloud\Dlp\V2\InfoType;
use Google\Cloud\Dlp\V2\Action;
use Google\Cloud\Dlp\V2\Action\PublishToPubSub;
use Google\Cloud\Dlp\V2\InspectConfig;
use Google\Cloud\Dlp\V2\InspectJobConfig;
use Google\Cloud\Dlp\V2\KindExpression;
use Google\Cloud\Dlp\V2\PartitionId;
use Google\Cloud\Dlp\V2\StorageConfig;
use Google\Cloud\Dlp\V2\Likelihood;
use Google\Cloud\Dlp\V2\DlpJob\JobState;
use Google\Cloud\Dlp\V2\InspectConfig\FindingLimits;
use Google\Cloud\PubSub\PubSubClient;

/**
 * Inspect Datastore, using Pub/Sub for job status notifications.
 *
 * @param string $callingProjectId The project ID to run the API call under
 * @param string $dataProjectId The project ID containing the target Datastore
 *        (This may or may not be equal to $callingProjectId)
 * @param string $topicId The name of the Pub/Sub topic to notify once the job completes
 * @param string $subscriptionId The name of the Pub/Sub subscription to use when listening for job
 * @param string $kind The datastore kind to inspect
 * @param string $namespaceId The ID namespace of the Datastore document to inspect
 * @param int $maxFindings (Optional) The maximum number of findings to report per request (0 = server maximum)
 */
function inspect_datastore(
    $callingProjectId,
    $dataProjectId,
    $topicId,
    $subscriptionId,
    $kind,
    $namespaceId,
    $maxFindings = 0
) {
    // Instantiate clients
    $dlp = new DlpServiceClient();
    $pubsub = new PubSubClient();
    $topic = $pubsub->topic($topicId);

    // The infoTypes of information to match
    $personNameInfoType = (new InfoType())
        ->setName('PERSON_NAME');
    $phoneNumberInfoType = (new InfoType())
        ->setName('PHONE_NUMBER');
    $infoTypes = [$personNameInfoType, $phoneNumberInfoType];

    // The minimum likelihood required before returning a match
    $minLikelihood = likelihood::LIKELIHOOD_UNSPECIFIED;

    // Specify finding limits
    $limits = (new FindingLimits())
        ->setMaxFindingsPerRequest($maxFindings);

    // Construct items to be inspected
    $partitionId = (new PartitionId())
        ->setProjectId($dataProjectId)
        ->setNamespaceId($namespaceId);

    $kindExpression = (new KindExpression())
        ->setName($kind);

    $datastoreOptions = (new DatastoreOptions())
        ->setPartitionId($partitionId)
        ->setKind($kindExpression);

    // Construct the inspect config object
    $inspectConfig = (new InspectConfig())
        ->setInfoTypes($infoTypes)
        ->setMinLikelihood($minLikelihood)
        ->setLimits($limits);

    // Construct the storage config object
    $storageConfig = (new StorageConfig())
        ->setDatastoreOptions($datastoreOptions);

    // Construct the action to run when job completes
    $pubSubAction = (new PublishToPubSub())
        ->setTopic($topic->name());

    $action = (new Action())
        ->setPubSub($pubSubAction);

    // Construct inspect job config to run
    $inspectJob = (new InspectJobConfig())
        ->setInspectConfig($inspectConfig)
        ->setStorageConfig($storageConfig)
        ->setActions([$action]);

    // Listen for job notifications via an existing topic/subscription.
    $subscription = $topic->subscription($subscriptionId);

    // Submit request
    $parent = $dlp->projectName($callingProjectId);
    $job = $dlp->createDlpJob($parent, [
        'inspectJob' => $inspectJob
    ]);

    // Poll via Pub/Sub until job finishes
    while (true) {
        foreach ($subscription->pull() as $message) {
            if (isset($message->attributes()['DlpJobName']) &&
                $message->attributes()['DlpJobName'] === $job->getName()) {
                $subscription->acknowledge($message);
                break 2;
            }
        }
    }

    // Sleep for one second to avoid race condition with the job's status.
    usleep(1000000);

    // Get the updated job
    $job = $dlp->getDlpJob($job->getName());

    // Print finding counts
    printf('Job %s status: %s' . PHP_EOL, $job->getName(), $job->getState());
    switch ($job->getState()) {
        case JobState::DONE:
            $infoTypeStats = $job->getInspectDetails()->getResult()->getInfoTypeStats();
            if (count($infoTypeStats) === 0) {
                print('No findings.' . PHP_EOL);
            } else {
                foreach ($infoTypeStats as $infoTypeStat) {
                    printf('  Found %s instance(s) of infoType %s' . PHP_EOL, $infoTypeStat->getCount(), $infoTypeStat->getInfoType()->getName());
                }
            }
            break;
        case JobState::FAILED:
            printf('Job %s had errors:' . PHP_EOL, $job->getName());
            $errors = $job->getErrors();
            foreach ($errors as $error) {
                var_dump($error->getDetails());
            }
            break;
        default:
            print('Unexpected job state. Most likely, the job is either running or has not yet started.');
    }
}

C#

public static object InspectCloudDataStore(
    string projectId,
    string minLikelihood,
    int maxFindings,
    bool includeQuote,
    string kindName,
    string namespaceId,
    IEnumerable<InfoType> infoTypes,
    IEnumerable<CustomInfoType> customInfoTypes,
    string datasetId,
    string tableId)
{
    var inspectJob = new InspectJobConfig
    {
        StorageConfig = new StorageConfig
        {
            DatastoreOptions = new DatastoreOptions
            {
                Kind = new KindExpression { Name = kindName },
                PartitionId = new PartitionId
                {
                    NamespaceId = namespaceId,
                    ProjectId = projectId,
                }
            },
            TimespanConfig = new StorageConfig.Types.TimespanConfig
            {
                StartTime = Timestamp.FromDateTime(System.DateTime.UtcNow.AddYears(-1)),
                EndTime = Timestamp.FromDateTime(System.DateTime.UtcNow)
            }
        },

        InspectConfig = new InspectConfig
        {
            InfoTypes = { infoTypes },
            CustomInfoTypes = { customInfoTypes },
            Limits = new FindingLimits
            {
                MaxFindingsPerRequest = maxFindings
            },
            ExcludeInfoTypes = false,
            IncludeQuote = includeQuote,
            MinLikelihood = (Likelihood)System.Enum.Parse(typeof(Likelihood), minLikelihood)
        },
        Actions =
        {
            new Google.Cloud.Dlp.V2.Action
            {
                // Save results in BigQuery Table
                SaveFindings = new Google.Cloud.Dlp.V2.Action.Types.SaveFindings
                {
                    OutputConfig = new OutputStorageConfig
                    {
                        Table = new Google.Cloud.Dlp.V2.BigQueryTable
                        {
                            ProjectId = projectId,
                            DatasetId = datasetId,
                            TableId = tableId
                        }
                    }
                },
            }
        }
    };

    // Issue Create Dlp Job Request
    DlpServiceClient client = DlpServiceClient.Create();
    var request = new CreateDlpJobRequest
    {
        InspectJob = inspectJob,
        ParentAsProjectName = new ProjectName(projectId),
    };

    // We need created job name
    var dlpJob = client.CreateDlpJob(request);
    var jobName = dlpJob.Name;

    // Make sure the job finishes before inspecting the results.
    // Alternatively, we can inspect results opportunistically, but
    // for testing purposes, we want consistent outcome
    bool jobFinished = EnsureJobFinishes(projectId, jobName);
    if (jobFinished)
    {
        var bigQueryClient = BigQueryClient.Create(projectId);
        var table = bigQueryClient.GetTable(datasetId, tableId);

        // Return only first page of 10 rows
        Console.WriteLine("DLP v2 Results:");
        var firstPage = table.ListRows(new ListRowsOptions { StartIndex = 0, PageSize = 10 });
        foreach (var item in firstPage)
        {
            Console.WriteLine($"\t {item[""]}");
        }
    }

    return 0;
}

檢查 BigQuery 表格

您可以透過 REST 要求使用 Cloud DLP 設定 BigQuery 表格的檢查作業,也可以透過幾種程式語言使用用戶端程式庫進行。

程式碼示例

以下是 JSON 示例和幾種程式語言的程式碼,示範如何使用 Cloud DLP API 檢查 BigQuery 表格。如要進一步瞭解要求包含的參數,請參閱本主題後面的設定儲存空間檢查一節。

通訊協定

您可以在傳送 POST 要求時將以下 JSON 示例傳送至指定的 Cloud DLP API REST 端點。這個 JSON 示例示範了如何使用 Cloud DLP API 檢查 BigQuery 表格。如要進一步瞭解要求包含的參數,請參閱本主題後面的設定儲存空間檢查一節。

如要快速嘗試,您可以在 projects.dlpJobs.create 方法參考頁面使用 API Explorer。請注意,即使在 API Explorer,成功提出的要求仍將建立新的掃描工作。要進一步瞭解如何控管掃描工作,請參閱本主題後面的擷取檢查結果一節。要瞭解如何透過 JSON 將要求傳送至 Cloud DLP API 的一般資訊,請參閱 JSON 快速入門

JSON 輸入:

POST https://dlp.googleapis.com/v2/projects/[PROJECT_NAME]/dlpJobs?key={YOUR_API_KEY}

{
  "inspectJob":{
    "storageConfig":{
      "bigQueryOptions":{
        "tableReference":{
          "projectId":"[PROJECT_ID]",
          "datasetId":"[BIGQUERY-DATASET-NAME]",
          "tableId":"[BIGQUERY-TABLE-NAME]"
        },
        "identifyingFields":[
          {
            "name":"person.contactinfo"
          }
        ]
      },
      "timespanConfig":{
        "startTime":"2017-11-13T12:34:29.965633345Z ",
        "endTime":"2018-01-05T04:45:04.240912125Z "
      }
    },
    "inspectConfig":{
      "infoTypes":[
        {
          "name":"PHONE_NUMBER"
        }
      ],
      "excludeInfoTypes":false,
      "includeQuote":true,
      "minLikelihood":"LIKELY"
    },
    "actions":[
      {
        "saveFindings":{
          "outputConfig":{
            "table":{
              "projectId":"[PROJECT_ID]",
              "datasetId":"[BIGQUERY-DATASET-NAME]",
              "tableId":"[BIGQUERY-TABLE-NAME]"
            }
          },
          "outputSchema": "BASIC_COLUMNS"
        }
      }
    ]
  }
}

Java

/**
 * Inspect a BigQuery table
 *
 * @param projectId The project ID to run the API call under
 * @param datasetId The ID of the dataset to inspect, e.g. 'my_dataset'
 * @param tableId The ID of the table to inspect, e.g. 'my_table'
 * @param minLikelihood The minimum likelihood required before returning a match
 * @param infoTypes The infoTypes of information to match
 * @param maxFindings The maximum number of findings to report (0 = server maximum)
 * @param topicId Topic ID for pubsub.
 * @param subscriptionId Subscription ID for pubsub.
 */
private static void inspectBigquery(
    String projectId,
    String datasetId,
    String tableId,
    Likelihood minLikelihood,
    List<InfoType> infoTypes,
    List<CustomInfoType> customInfoTypes,
    int maxFindings,
    String topicId,
    String subscriptionId) {
  // Instantiates a client
  try (DlpServiceClient dlpServiceClient = DlpServiceClient.create()) {
    // Reference to the BigQuery table
    BigQueryTable tableReference =
        BigQueryTable.newBuilder()
            .setProjectId(projectId)
            .setDatasetId(datasetId)
            .setTableId(tableId)
            .build();
    BigQueryOptions bigQueryOptions =
        BigQueryOptions.newBuilder().setTableReference(tableReference).build();

    // Construct BigQuery configuration to be inspected
    StorageConfig storageConfig =
        StorageConfig.newBuilder().setBigQueryOptions(bigQueryOptions).build();

    FindingLimits findingLimits =
        FindingLimits.newBuilder().setMaxFindingsPerRequest(maxFindings).build();

    InspectConfig inspectConfig =
        InspectConfig.newBuilder()
            .addAllInfoTypes(infoTypes)
            .addAllCustomInfoTypes(customInfoTypes)
            .setMinLikelihood(minLikelihood)
            .setLimits(findingLimits)
            .build();

    ProjectTopicName topic = ProjectTopicName.of(projectId, topicId);
    Action.PublishToPubSub publishToPubSub =
        Action.PublishToPubSub.newBuilder().setTopic(topic.toString()).build();

    Action action = Action.newBuilder().setPubSub(publishToPubSub).build();

    InspectJobConfig inspectJobConfig =
        InspectJobConfig.newBuilder()
            .setStorageConfig(storageConfig)
            .setInspectConfig(inspectConfig)
            .addActions(action)
            .build();

    // Asynchronously submit an inspect job, and wait on results
    CreateDlpJobRequest createDlpJobRequest =
        CreateDlpJobRequest.newBuilder()
            .setParent(ProjectName.of(projectId).toString())
            .setInspectJob(inspectJobConfig)
            .build();

    DlpJob dlpJob = dlpServiceClient.createDlpJob(createDlpJobRequest);

    System.out.println("Job created with ID:" + dlpJob.getName());

    // Wait for job completion semi-synchronously
    // For long jobs, consider using a truly asynchronous execution model such as Cloud Functions
    final SettableApiFuture<Boolean> done = SettableApiFuture.create();

    // Set up a Pub/Sub subscriber to listen on the job completion status
    Subscriber subscriber =
        Subscriber.newBuilder(
                ProjectSubscriptionName.of(projectId, subscriptionId),
          (pubsubMessage, ackReplyConsumer) -> {
            if (pubsubMessage.getAttributesCount() > 0
                && pubsubMessage
                    .getAttributesMap()
                    .get("DlpJobName")
                    .equals(dlpJob.getName())) {
              // notify job completion
              done.set(true);
              ackReplyConsumer.ack();
            }
          })
            .build();
    subscriber.startAsync();

    try {
      done.get(1, TimeUnit.MINUTES);
      Thread.sleep(500); // Wait for the job to become available
    } catch (Exception e) {
      System.out.println("Unable to verify job completion.");
    }

    DlpJob completedJob =
        dlpServiceClient.getDlpJob(
            GetDlpJobRequest.newBuilder().setName(dlpJob.getName()).build());

    System.out.println("Job status: " + completedJob.getState());
    InspectDataSourceDetails inspectDataSourceDetails = completedJob.getInspectDetails();
    InspectDataSourceDetails.Result result = inspectDataSourceDetails.getResult();
    if (result.getInfoTypeStatsCount() > 0) {
      System.out.println("Findings: ");
      for (InfoTypeStats infoTypeStat : result.getInfoTypeStatsList()) {
        System.out.print("\tInfo type: " + infoTypeStat.getInfoType().getName());
        System.out.println("\tCount: " + infoTypeStat.getCount());
      }
    } else {
      System.out.println("No findings.");
    }
  } catch (Exception e) {
    System.out.println("inspectBigquery Problems: " + e.getMessage());
  }
}

Node.js

// Import the Google Cloud client libraries
const DLP = require('@google-cloud/dlp');
const {PubSub} = require('@google-cloud/pubsub');

// Instantiates clients
const dlp = new DLP.DlpServiceClient();
const pubsub = new PubSub();

// The project ID to run the API call under
// const callingProjectId = process.env.GCLOUD_PROJECT;

// The project ID the table is stored under
// This may or (for public datasets) may not equal the calling project ID
// const dataProjectId = process.env.GCLOUD_PROJECT;

// The ID of the dataset to inspect, e.g. 'my_dataset'
// const datasetId = 'my_dataset';

// The ID of the table to inspect, e.g. 'my_table'
// const tableId = 'my_table';

// The minimum likelihood required before returning a match
// const minLikelihood = 'LIKELIHOOD_UNSPECIFIED';

// The maximum number of findings to report per request (0 = server maximum)
// const maxFindings = 0;

// The infoTypes of information to match
// const infoTypes = [{ name: 'PHONE_NUMBER' }, { name: 'EMAIL_ADDRESS' }, { name: 'CREDIT_CARD_NUMBER' }];

// The customInfoTypes of information to match
// const customInfoTypes = [{ name: 'DICT_TYPE', dictionary: { wordList: { words: ['foo', 'bar', 'baz']}}},
//   { name: 'REGEX_TYPE', regex: '\\(\\d{3}\\) \\d{3}-\\d{4}'}];

// The name of the Pub/Sub topic to notify once the job completes
// TODO(developer): create a Pub/Sub topic to use for this
// const topicId = 'MY-PUBSUB-TOPIC'

// The name of the Pub/Sub subscription to use when listening for job
// completion notifications
// TODO(developer): create a Pub/Sub subscription to use for this
// const subscriptionId = 'MY-PUBSUB-SUBSCRIPTION'

// Construct item to be inspected
const storageItem = {
  bigQueryOptions: {
    tableReference: {
      projectId: dataProjectId,
      datasetId: datasetId,
      tableId: tableId,
    },
  },
};

// Construct request for creating an inspect job
const request = {
  parent: dlp.projectPath(callingProjectId),
  inspectJob: {
    inspectConfig: {
      infoTypes: infoTypes,
      customInfoTypes: customInfoTypes,
      minLikelihood: minLikelihood,
      limits: {
        maxFindingsPerRequest: maxFindings,
      },
    },
    storageConfig: storageItem,
    actions: [
      {
        pubSub: {
          topic: `projects/${callingProjectId}/topics/${topicId}`,
        },
      },
    ],
  },
};

try {
  // Run inspect-job creation request
  const [topicResponse] = await pubsub.topic(topicId).get();
  // Verify the Pub/Sub topic and listen for job notifications via an
  // existing subscription.
  const subscription = await topicResponse.subscription(subscriptionId);
  const [jobsResponse] = await dlp.createDlpJob(request);
  const jobName = jobsResponse.name;
  // Watch the Pub/Sub topic until the DLP job finishes
  await new Promise((resolve, reject) => {
    const messageHandler = message => {
      if (message.attributes && message.attributes.DlpJobName === jobName) {
        message.ack();
        subscription.removeListener('message', messageHandler);
        subscription.removeListener('error', errorHandler);
        resolve(jobName);
      } else {
        message.nack();
      }
    };

    const errorHandler = err => {
      subscription.removeListener('message', messageHandler);
      subscription.removeListener('error', errorHandler);
      reject(err);
    };

    subscription.on('message', messageHandler);
    subscription.on('error', errorHandler);
  });
  // Wait for DLP job to fully complete
  setTimeout(() => {
    console.log(`Waiting for DLP job to fully complete`);
  }, 500);
  const [job] = await dlp.getDlpJob({name: jobName});
  console.log(`Job ${job.name} status: ${job.state}`);

  const infoTypeStats = job.inspectDetails.result.infoTypeStats;
  if (infoTypeStats.length > 0) {
    infoTypeStats.forEach(infoTypeStat => {
      console.log(
        `  Found ${infoTypeStat.count} instance(s) of infoType ${
          infoTypeStat.infoType.name
        }.`
      );
    });
  } else {
    console.log(`No findings.`);
  }
} catch (err) {
  console.log(`Error in inspectBigquery: ${err.message || err}`);
}

Python

def inspect_bigquery(project, bigquery_project, dataset_id, table_id,
                     topic_id, subscription_id, info_types,
                     custom_dictionaries=None, custom_regexes=None,
                     min_likelihood=None, max_findings=None, timeout=300):
    """Uses the Data Loss Prevention API to analyze BigQuery data.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        bigquery_project: The Google Cloud project id of the target table.
        dataset_id: The id of the target BigQuery dataset.
        table_id: The id of the target BigQuery table.
        topic_id: The id of the Cloud Pub/Sub topic to which the API will
            broadcast job completion. The topic must already exist.
        subscription_id: The id of the Cloud Pub/Sub subscription to listen on
            while waiting for job completion. The subscription must already
            exist and be subscribed to the topic.
        info_types: A list of strings representing info types to look for.
            A full list of info type categories can be fetched from the API.
        namespace_id: The namespace of the Datastore document, if applicable.
        min_likelihood: A string representing the minimum likelihood threshold
            that constitutes a match. One of: 'LIKELIHOOD_UNSPECIFIED',
            'VERY_UNLIKELY', 'UNLIKELY', 'POSSIBLE', 'LIKELY', 'VERY_LIKELY'.
        max_findings: The maximum number of findings to report; 0 = no maximum.
        timeout: The number of seconds to wait for a response from the API.
    Returns:
        None; the response from the API is printed to the terminal.
    """

    # Import the client library.
    import google.cloud.dlp

    # This sample additionally uses Cloud Pub/Sub to receive results from
    # potentially long-running operations.
    import google.cloud.pubsub

    # This sample also uses threading.Event() to wait for the job to finish.
    import threading

    # Instantiate a client.
    dlp = google.cloud.dlp.DlpServiceClient()

    # Prepare info_types by converting the list of strings into a list of
    # dictionaries (protos are also accepted).
    if not info_types:
        info_types = ['FIRST_NAME', 'LAST_NAME', 'EMAIL_ADDRESS']
    info_types = [{'name': info_type} for info_type in info_types]

    # Prepare custom_info_types by parsing the dictionary word lists and
    # regex patterns.
    if custom_dictionaries is None:
        custom_dictionaries = []
    dictionaries = [{
        'info_type': {'name': 'CUSTOM_DICTIONARY_{}'.format(i)},
        'dictionary': {
            'word_list': {'words': custom_dict.split(',')}
        }
    } for i, custom_dict in enumerate(custom_dictionaries)]
    if custom_regexes is None:
        custom_regexes = []
    regexes = [{
        'info_type': {'name': 'CUSTOM_REGEX_{}'.format(i)},
        'regex': {'pattern': custom_regex}
    } for i, custom_regex in enumerate(custom_regexes)]
    custom_info_types = dictionaries + regexes

    # Construct the configuration dictionary. Keys which are None may
    # optionally be omitted entirely.
    inspect_config = {
        'info_types': info_types,
        'custom_info_types': custom_info_types,
        'min_likelihood': min_likelihood,
        'limits': {'max_findings_per_request': max_findings},
    }

    # Construct a storage_config containing the target Bigquery info.
    storage_config = {
        'big_query_options': {
            'table_reference': {
                'project_id': bigquery_project,
                'dataset_id': dataset_id,
                'table_id': table_id,
            }
        }
    }

    # Convert the project id into a full resource id.
    parent = dlp.project_path(project)

    # Tell the API where to send a notification when the job is complete.
    actions = [{
        'pub_sub': {'topic': '{}/topics/{}'.format(parent, topic_id)}
    }]

    # Construct the inspect_job, which defines the entire inspect content task.
    inspect_job = {
        'inspect_config': inspect_config,
        'storage_config': storage_config,
        'actions': actions,
    }

    operation = dlp.create_dlp_job(parent, inspect_job=inspect_job)

    # Create a Pub/Sub client and find the subscription. The subscription is
    # expected to already be listening to the topic.
    subscriber = google.cloud.pubsub.SubscriberClient()
    subscription_path = subscriber.subscription_path(
        project, subscription_id)

    # Set up a callback to acknowledge a message. This closes around an event
    # so that it can signal that it is done and the main thread can continue.
    job_done = threading.Event()

    def callback(message):
        try:
            if (message.attributes['DlpJobName'] == operation.name):
                # This is the message we're looking for, so acknowledge it.
                message.ack()

                # Now that the job is done, fetch the results and print them.
                job = dlp.get_dlp_job(operation.name)
                if job.inspect_details.result.info_type_stats:
                    for finding in job.inspect_details.result.info_type_stats:
                        print('Info type: {}; Count: {}'.format(
                            finding.info_type.name, finding.count))
                else:
                    print('No findings.')

                # Signal to the main thread that we can exit.
                job_done.set()
            else:
                # This is not the message we're looking for.
                message.drop()
        except Exception as e:
            # Because this is executing in a thread, an exception won't be
            # noted unless we print it manually.
            print(e)
            raise

    # Register the callback and wait on the event.
    subscriber.subscribe(subscription_path, callback=callback)
    finished = job_done.wait(timeout=timeout)
    if not finished:
        print('No event received before the timeout. Please verify that the '
              'subscription provided is subscribed to the topic provided.')

Go

// inspectBigquery searches for the given info types in the given Bigquery dataset table.
func inspectBigquery(w io.Writer, client *dlp.Client, project string, minLikelihood dlppb.Likelihood, maxFindings int32, includeQuote bool, infoTypes []string, customDictionaries []string, customRegexes []string, pubSubTopic, pubSubSub, dataProject, datasetID, tableID string) {
	// Convert the info type strings to a list of InfoTypes.
	var i []*dlppb.InfoType
	for _, it := range infoTypes {
		i = append(i, &dlppb.InfoType{Name: it})
	}
	// Convert the custom dictionary word lists and custom regexes to a list of CustomInfoTypes.
	var customInfoTypes []*dlppb.CustomInfoType
	for idx, it := range customDictionaries {
		customInfoTypes = append(customInfoTypes, &dlppb.CustomInfoType{
			InfoType: &dlppb.InfoType{
				Name: fmt.Sprintf("CUSTOM_DICTIONARY_%d", idx),
			},
			Type: &dlppb.CustomInfoType_Dictionary_{
				Dictionary: &dlppb.CustomInfoType_Dictionary{
					Source: &dlppb.CustomInfoType_Dictionary_WordList_{
						WordList: &dlppb.CustomInfoType_Dictionary_WordList{
							Words: strings.Split(it, ","),
						},
					},
				},
			},
		})
	}
	for idx, it := range customRegexes {
		customInfoTypes = append(customInfoTypes, &dlppb.CustomInfoType{
			InfoType: &dlppb.InfoType{
				Name: fmt.Sprintf("CUSTOM_REGEX_%d", idx),
			},
			Type: &dlppb.CustomInfoType_Regex_{
				Regex: &dlppb.CustomInfoType_Regex{
					Pattern: it,
				},
			},
		})
	}

	ctx := context.Background()

	// Create a PubSub Client used to listen for when the inspect job finishes.
	pClient, err := pubsub.NewClient(ctx, project)
	if err != nil {
		log.Fatalf("Error creating PubSub client: %v", err)
	}
	defer pClient.Close()

	// Create a PubSub subscription we can use to listen for messages.
	s, err := setupPubSub(ctx, pClient, project, pubSubTopic, pubSubSub)
	if err != nil {
		log.Fatalf("Error setting up PubSub: %v\n", err)
	}

	// topic is the PubSub topic string where messages should be sent.
	topic := "projects/" + project + "/topics/" + pubSubTopic

	// Create a configured request.
	req := &dlppb.CreateDlpJobRequest{
		Parent: "projects/" + project,
		Job: &dlppb.CreateDlpJobRequest_InspectJob{
			InspectJob: &dlppb.InspectJobConfig{
				// StorageConfig describes where to find the data.
				StorageConfig: &dlppb.StorageConfig{
					Type: &dlppb.StorageConfig_BigQueryOptions{
						BigQueryOptions: &dlppb.BigQueryOptions{
							TableReference: &dlppb.BigQueryTable{
								ProjectId: dataProject,
								DatasetId: datasetID,
								TableId:   tableID,
							},
						},
					},
				},
				// InspectConfig describes what fields to look for.
				InspectConfig: &dlppb.InspectConfig{
					InfoTypes:       i,
					CustomInfoTypes: customInfoTypes,
					MinLikelihood:   minLikelihood,
					Limits: &dlppb.InspectConfig_FindingLimits{
						MaxFindingsPerRequest: maxFindings,
					},
					IncludeQuote: includeQuote,
				},
				// Send a message to PubSub using Actions.
				Actions: []*dlppb.Action{
					{
						Action: &dlppb.Action_PubSub{
							PubSub: &dlppb.Action_PublishToPubSub{
								Topic: topic,
							},
						},
					},
				},
			},
		},
	}
	// Create the inspect job.
	j, err := client.CreateDlpJob(context.Background(), req)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Fprintf(w, "Created job: %v\n", j.GetName())

	// Wait for the inspect job to finish by waiting for a PubSub message.
	ctx, cancel := context.WithCancel(ctx)
	err = s.Receive(ctx, func(ctx context.Context, msg *pubsub.Message) {
		// If this is the wrong job, do not process the result.
		if msg.Attributes["DlpJobName"] != j.GetName() {
			msg.Nack()
			return
		}
		msg.Ack()
		resp, err := client.GetDlpJob(ctx, &dlppb.GetDlpJobRequest{
			Name: j.GetName(),
		})
		if err != nil {
			log.Fatalf("Error getting completed job: %v\n", err)
		}
		r := resp.GetInspectDetails().GetResult().GetInfoTypeStats()
		if len(r) == 0 {
			fmt.Fprintf(w, "No results")
		}
		for _, s := range r {
			fmt.Fprintf(w, "  Found %v instances of infoType %v\n", s.GetCount(), s.GetInfoType().GetName())
		}
		// Stop listening for more messages.
		cancel()
	})
	if err != nil {
		log.Fatalf("Error receiving from PubSub: %v\n", err)
	}
}

PHP

use Google\Cloud\Dlp\V2\DlpServiceClient;
use Google\Cloud\Dlp\V2\BigQueryOptions;
use Google\Cloud\Dlp\V2\InfoType;
use Google\Cloud\Dlp\V2\InspectConfig;
use Google\Cloud\Dlp\V2\StorageConfig;
use Google\Cloud\Dlp\V2\BigQueryTable;
use Google\Cloud\Dlp\V2\Likelihood;
use Google\Cloud\Dlp\V2\DlpJob\JobState;
use Google\Cloud\Dlp\V2\InspectConfig\FindingLimits;
use Google\Cloud\Dlp\V2\Action;
use Google\Cloud\Dlp\V2\Action\PublishToPubSub;
use Google\Cloud\Dlp\V2\InspectJobConfig;
use Google\Cloud\PubSub\PubSubClient;

/**
 * Inspect a BigQuery table , using Pub/Sub for job status notifications.
 *
 * @param string $callingProjectId The project ID to run the API call under
 * @param string $dataProjectId The project ID containing the target Datastore
 * @param string $topicId The name of the Pub/Sub topic to notify once the job completes
 * @param string $subscriptionId The name of the Pub/Sub subscription to use when listening for job
 * @param string $datasetId The ID of the dataset to inspect
 * @param string $tableId The ID of the table to inspect
 * @param int $maxFindings The maximum number of findings to report per request (0 = server maximum)
 */
function inspect_bigquery(
  $callingProjectId,
  $dataProjectId,
  $topicId,
  $subscriptionId,
  $datasetId,
  $tableId,
  $maxFindings = 0
) {
    // Instantiate a client.
    $dlp = new DlpServiceClient();
    $pubsub = new PubSubClient();
    $topic = $pubsub->topic($topicId);

    // The infoTypes of information to match
    $personNameInfoType = (new InfoType())
        ->setName('PERSON_NAME');
    $creditCardNumberInfoType = (new InfoType())
        ->setName('CREDIT_CARD_NUMBER');
    $infoTypes = [$personNameInfoType, $creditCardNumberInfoType];

    // The minimum likelihood required before returning a match
    $minLikelihood = likelihood::LIKELIHOOD_UNSPECIFIED;

    // Specify finding limits
    $limits = (new FindingLimits())
        ->setMaxFindingsPerRequest($maxFindings);

    // Construct items to be inspected
    $bigqueryTable = (new BigQueryTable())
        ->setProjectId($dataProjectId)
        ->setDatasetId($datasetId)
        ->setTableId($tableId);

    $bigQueryOptions = (new BigQueryOptions())
        ->setTableReference($bigqueryTable);

    $storageConfig = (new StorageConfig())
        ->setBigQueryOptions($bigQueryOptions);

    // Construct the inspect config object
    $inspectConfig = (new InspectConfig())
        ->setMinLikelihood($minLikelihood)
        ->setLimits($limits)
        ->setInfoTypes($infoTypes);

    // Construct the action to run when job completes
    $pubSubAction = (new PublishToPubSub())
        ->setTopic($topic->name());

    $action = (new Action())
        ->setPubSub($pubSubAction);

    // Construct inspect job config to run
    $inspectJob = (new InspectJobConfig())
        ->setInspectConfig($inspectConfig)
        ->setStorageConfig($storageConfig)
        ->setActions([$action]);

    // Listen for job notifications via an existing topic/subscription.
    $subscription = $topic->subscription($subscriptionId);

    // Submit request
    $parent = $dlp->projectName($callingProjectId);
    $job = $dlp->createDlpJob($parent, [
        'inspectJob' => $inspectJob
    ]);

    // Poll via Pub/Sub until job finishes
    while (true) {
        foreach ($subscription->pull() as $message) {
            if (isset($message->attributes()['DlpJobName']) &&
                $message->attributes()['DlpJobName'] === $job->getName()) {
                $subscription->acknowledge($message);
                break 2;
            }
        }
    }

    // Sleep for one second to avoid race condition with the job's status.
    usleep(1000000);

    // Get the updated job
    $job = $dlp->getDlpJob($job->getName());

    // Print finding counts
    printf('Job %s status: %s' . PHP_EOL, $job->getName(), $job->getState());
    switch ($job->getState()) {
        case JobState::DONE:
            $infoTypeStats = $job->getInspectDetails()->getResult()->getInfoTypeStats();
            if (count($infoTypeStats) === 0) {
                print('No findings.' . PHP_EOL);
            } else {
                foreach ($infoTypeStats as $infoTypeStat) {
                    printf(
                        '  Found %s instance(s) of infoType %s' . PHP_EOL,
                        $infoTypeStat->getCount(),
                        $infoTypeStat->getInfoType()->getName()
                    );
                }
            }
            break;
        case JobState::FAILED:
            printf('Job %s had errors:' . PHP_EOL, $job->getName());
            $errors = $job->getErrors();
            foreach ($errors as $error) {
                var_dump($error->getDetails());
            }
            break;
        default:
            printf('Unexpected job state. Most likely, the job is either running or has not yet started.');
    }
}

C#

public static object InspectBigQuery(
    string projectId,
    string minLikelihood,
    int maxFindings,
    bool includeQuote,
    IEnumerable<FieldId> identifyingFields,
    IEnumerable<InfoType> infoTypes,
    IEnumerable<CustomInfoType> customInfoTypes,
    string datasetId,
    string tableId)
{
    var inspectJob = new InspectJobConfig
    {
        StorageConfig = new StorageConfig
        {
            BigQueryOptions = new BigQueryOptions
            {
                TableReference = new Google.Cloud.Dlp.V2.BigQueryTable
                {
                    ProjectId = projectId,
                    DatasetId = datasetId,
                    TableId = tableId,
                },
                IdentifyingFields =
                {
                    identifyingFields
                }
            },

            TimespanConfig = new StorageConfig.Types.TimespanConfig
            {
                StartTime = Timestamp.FromDateTime(System.DateTime.UtcNow.AddYears(-1)),
                EndTime = Timestamp.FromDateTime(System.DateTime.UtcNow)
            }
        },

        InspectConfig = new InspectConfig
        {
            InfoTypes = { infoTypes },
            CustomInfoTypes = { customInfoTypes },
            Limits = new FindingLimits
            {
                MaxFindingsPerRequest = maxFindings
            },
            ExcludeInfoTypes = false,
            IncludeQuote = includeQuote,
            MinLikelihood = (Likelihood)System.Enum.Parse(typeof(Likelihood), minLikelihood)
        },
        Actions =
        {
            new Google.Cloud.Dlp.V2.Action
            {
                // Save results in BigQuery Table
                SaveFindings = new Google.Cloud.Dlp.V2.Action.Types.SaveFindings
                {
                    OutputConfig = new OutputStorageConfig
                    {
                        Table = new Google.Cloud.Dlp.V2.BigQueryTable
                        {
                            ProjectId = projectId,
                            DatasetId = datasetId,
                            TableId = tableId
                        }
                    }
                },
            }
        }
    };

    // Issue Create Dlp Job Request
    DlpServiceClient client = DlpServiceClient.Create();
    var request = new CreateDlpJobRequest
    {
        InspectJob = inspectJob,
        ParentAsProjectName = new ProjectName(projectId),
    };

    // We need created job name
    var dlpJob = client.CreateDlpJob(request);
    string jobName = dlpJob.Name;

    // Make sure the job finishes before inspecting the results.
    // Alternatively, we can inspect results opportunistically, but
    // for testing purposes, we want consistent outcome
    bool jobFinished = EnsureJobFinishes(projectId, jobName);
    if (jobFinished)
    {
        var bigQueryClient = BigQueryClient.Create(projectId);
        var table = bigQueryClient.GetTable(datasetId, tableId);

        // Return only first page of 10 rows
        Console.WriteLine("DLP v2 Results:");
        var firstPage = table.ListRows(new ListRowsOptions { StartIndex = 0, PageSize = 10 });
        foreach (var item in firstPage)
        {
            Console.WriteLine($"\t {item[""]}");
        }
    }

    return 0;
}

設定儲存空間檢查

如要檢查 Cloud Storage 位置、Cloud Datastore 種類或 BigQuery 資料表,請將要求傳送到 Cloud DLP API 的 projects.dlpJobs.create 方法,該要求必須至少包含要掃描的資料位置和內容。除了這些必要參數以外,您也可以指定寫入掃描結果的位置、大小和可能性門檻等項目。如果要求成功,就會建立 DlpJob 物件執行個體,這部分會在擷取檢查結果一節中進一步探討。

以下摘要說明可用的設定選項:

  • InspectJobConfig 物件:包含檢查工作的設定資訊。請注意,JobTriggers 物件也會使用 InspectJobConfig 物件來安排建立 DlpJob 的時間。這個物件包括以下內容:

    • StorageConfig 物件:必要。包含要掃描的儲存空間存放區的相關詳細資料:

      • 視要掃描的儲存空間存放區的類型而定,StorageConfig 物件中必須包含以下其中一個項目:

        • CloudStorageOptions 物件:包含要掃描的 Cloud Storage 值區相關資訊。
        • DatastoreOptions 物件:包含要掃描的 Cloud Datastore 資料集相關資訊。
        • BigQueryOptions 物件:包含要掃描的 BigQuery 資料表相關資訊 (而且可選擇是否要識別欄位)。這個物件也能啟用結果取樣。詳情請參閱下方的啟用結果取樣說明。
      • TimespanConfig 物件:選用。指定掃描中要包含的項目時間範圍。

    • InspectConfig 物件:必要。指定要掃描的內容,例如 infoType可能性值。

      • InfoType 物件:必要。要掃描的一或多個 infoType 值。
      • Likelihood 列舉:選用。如已設定,Cloud DLP 只會傳回大於或等於這個可能性門檻的結果。如果省略了這個列舉,預設值則為 POSSIBLE
      • FindingLimits 物件:選用。如已設定,這個物件可讓您指定傳回的結果數量上限。
      • includeQuote 參數:選用。預設值為 false。設為 true 時,每個結果都會包含內容相關引言 (出自觸發該結果的資料)。
      • excludeInfoTypes 參數:選用。預設值為 false。設為 true 時,掃描結果將排除結果的類型資訊。
      • CustomInfoType 物件:一或多個使用者建立的自訂 infoType。要進一步瞭解如何建立自訂 infoType,請參閱建立自訂 InfoType 偵測工具一文。
    • inspectTemplateName 字串:選用。指定要用來在 InspectConfig 物件中填入預設值的範本。如果您已經指定 InspectConfig,將會合併範本值。

    • Action 物件:選用。完成工作時要執行的一或多個動作。每個動作都會按照列出的順序執行。您可以在這裡指定要將結果寫入哪個位置,或是否要將通知發佈至 Cloud Pub/Sub 主題。

  • jobId:選用。Cloud DLP 傳回的工作的識別碼。如果省略了 jobId 或為空白,系統則會為工作建立 ID。如已指定,工作會獲派這個 ID 值。工作 ID 不得重複,而且可包含大寫和小寫字母、數字及連字號,也就是必須符合以下規則運算式規則:[a-zA-Z\\d-]+

限制檢查的內容量

如果您掃描的是 BigQuery 表格或 Cloud Storage 值區,Cloud DLP 可以掃描資料集的小型子集。這可提供掃描結果的取樣,以免掃描整個資料集時可能產生的費用。

以下各節將介紹如何限制 BigQuery 掃描Cloud Storage 掃描的大小。

限制 BigQuery 掃描

如要限制掃描的資料量以啟用 BigQuery 中的取樣功能,請在 BigQueryOptions 中指定以下選用欄位:

  • rowsLimit:要掃描的資料列數量上限。如果表格有比這個值更多的資料列,則會省略其餘的資料列。如未設定,或已設定為 0,則會掃描所有資料列。
  • sampleMethod:如何在並未掃描所有資料列的情況下從資料列取樣。如未指定,掃描會從頂端開始。這個欄位可設定為以下兩個值的其中之一:
    • TOP:掃描會從頂端開始。
    • RANDOM_START:掃描會從隨機選取的資料列開始。

以下 JSON 示例示範如何使用 Cloud DLP API 掃描 BigQuery 表格中 1000 個資料列的子集。掃描會從隨機的資料列開始。

JSON 輸入:

POST https://dlp.googleapis.com/v2/projects/[PROJECT_NAME]/dlpJobs?key={YOUR_API_KEY}

{
  "inspectJob":{
    "storageConfig":{
      "bigQueryOptions":{
        "tableReference":{
          "projectId":"bigquery-public-data",
          "datasetId":"usa_names",
          "tableId":"usa_1910_current"
        },
        "rowsLimit":"1000",
        "sampleMethod":"RANDOM_START",
        "identifyingFields":[
          {
            "name":"name"
          }
        ]
      }
    },
    "inspectConfig":{
      "infoTypes":[
        {
          "name":"FIRST_NAME"
        }
      ],
      "includeQuote":true
    },
    "actions":[
      {
        "saveFindings":{
          "outputConfig":{
            "table":{
              "projectId":"[PROJECT_ID]",
              "datasetId":"testingdlp",
              "tableId":"bqsample3"
            },
            "outputSchema":"BASIC_COLUMNS"
          }
        }
      }
    ]
  }
}

將 POST 要求中的 JSON 輸入傳送至指定的網址後,隨即建立 DLP 工作,而我們會收到以下 JSON 回應。

JSON 輸出:

{
  "name":"projects/[PROJECT_ID]/dlpJobs/[JOB_ID]",
  "type":"INSPECT_JOB",
  "state":"PENDING",
  "inspectDetails":{
    "requestedOptions":{
      "snapshotInspectTemplate":{

      },
      "jobConfig":{
        "storageConfig":{
          "bigQueryOptions":{
            "tableReference":{
              "projectId":"bigquery-public-data",
              "datasetId":"usa_names",
              "tableId":"usa_1910_current"
            },
            "rowsLimit":"1000",
            "sampleMethod":"RANDOM_START"
          }
        },
        "inspectConfig":{
          "infoTypes":[
            {
              "name":"FIRST_NAME"
            }
          ],
          "minLikelihood":"POSSIBLE",
          "limits":{

          },
          "includeQuote":true
        },
        "actions":[
          {
            "saveFindings":{
              "outputConfig":{
                "table":{
                  "projectId":"[PROJECT_ID]",
                  "datasetId":"testingdlp",
                  "tableId":"bqsample3"
                },
                "outputSchema":"BASIC_COLUMNS"
              }
            }
          }
        ]
      }
    }
  },
  "createTime":"2018-05-25T21:02:50.655Z"
}

當檢查工作執行完畢且 BigQuery 已處理其結果時,即可在指定的 BigQuery 表格中取得掃描的結果。要進一步瞭解如何擷取檢查結果,請參閱下一節。

限制 Cloud Storage 掃描

您可以限制掃描的資料量,以啟用 Cloud Storage 中的取樣功能。您可以指示 Cloud DLP API 只掃描特定大小以下的檔案、只掃描特定檔案類型,以及只掃描輸入檔案集的檔案總數量的特定百分比。如要這麼做,請在 CloudStorageOptions 中指定以下選用欄位:

  • bytesLimitPerFile:設定要從檔案掃描的位元組數量上限。如果掃描檔案的大小大於這個值,則會省略其餘的位元組。
  • fileTypes[]:列出掃描中要包含的檔案類型群組。這可以設為以下一或多個 FileType 列舉類型:
    • FILE_TYPE_UNSPECIFIED:所有檔案。
    • BINARY_FILE:所有不包含在 TEXT_FILE 中的副檔名。
    • TEXT_FILE:多個文字檔案格式。如需最新清單,請參閱 FileType
  • filesLimitPercent:將要掃描的檔案數量限制為輸入 FileSet 的指定百分比。指定 0100,即表示沒有任何限制。
  • sampleMethod:如何在並未掃描所有位元組的情況下從位元組取樣。只有在搭配 bytesLimitPerFile 使用時,指定此值才有意義。如未指定,掃描會從頂端開始。這個欄位可設定為以下兩個值的其中之一:
    • TOP:掃描會從頂端開始。
    • RANDOM_START:針對大於 bytesLimitPerFile 中指定大小的每個檔案,隨機挑選要開始掃描的偏移。掃描的位元組會是連續的。

以下 JSON 示例示範了如何使用 Cloud DLP API 掃描 Cloud Storage 值區中 90% 的子集以查詢人名。掃描會從資料集的隨機位置開始,而且只會包含 200 位元組以下的文字檔案。

JSON 輸入:

POST https://dlp.googleapis.com/v2/projects/[PROJECT_NAME]/dlpJobs?key={YOUR_API_KEY}

{
  "inspectJob":{
    "storageConfig":{
      "cloudStorageOptions":{
        "fileSet":{
          "url":"gs://[BUCKET-NAME]/*"
        },
        "bytesLimitPerFile":"200",
        "fileTypes":[
          "TEXT_FILE"
        ],
        "filesLimitPercent":90,
        "sampleMethod":"RANDOM_START"
      }
    },
    "inspectConfig":{
      "infoTypes":[
        {
          "name":"PERSON_NAME"
        }
      ],
      "excludeInfoTypes":true,
      "includeQuote":true,
      "minLikelihood":"POSSIBLE"
    },
    "actions":[
      {
        "saveFindings":{
          "outputConfig":{
            "table":{
              "projectId":"[PROJECT_ID]",
              "datasetId":"testingdlp"
            },
            "outputSchema":"BASIC_COLUMNS"
          }
        }
      }
    ]
  }
}

將 POST 要求中的 JSON 輸入傳送至指定的網址後,隨即建立 DLP 工作,而我們會收到以下 JSON 回應。

JSON 輸出:

{
  "name":"projects/[PROJECT_ID]/dlpJobs/[JOB_ID]",
  "type":"INSPECT_JOB",
  "state":"PENDING",
  "inspectDetails":{
    "requestedOptions":{
      "snapshotInspectTemplate":{

      },
      "jobConfig":{
        "storageConfig":{
          "cloudStorageOptions":{
            "fileSet":{
              "url":"gs://[BUCKET_NAME]/*"
            },
            "bytesLimitPerFile":"200",
            "fileTypes":[
              "TEXT_FILE"
            ],
            "sampleMethod":"TOP",
            "filesLimitPercent":90
          }
        },
        "inspectConfig":{
          "infoTypes":[
            {
              "name":"PERSON_NAME"
            }
          ],
          "minLikelihood":"POSSIBLE",
          "limits":{

          },
          "includeQuote":true,
          "excludeInfoTypes":true
        },
        "actions":[
          {
            "saveFindings":{
              "outputConfig":{
                "table":{
                  "projectId":"[PROJECT_ID]",
                  "datasetId":"[DATASET_ID]",
                  "tableId":"[TABLE_ID]"
                },
                "outputSchema":"BASIC_COLUMNS"
              }
            }
          }
        ]
      }
    }
  },
  "createTime":"2018-05-30T22:22:08.279Z"
}

擷取檢查結果

您可以使用 projects.dlpJobs.get 方法擷取 DlpJob 的摘要。傳回的 DlpJob 會包含其 InspectDataSourceDetails 物件,該物件含有工作設定 (RequestedOptions) 的摘要及工作結果 (Result) 的摘要。結果摘要包含以下內容:

  • processedBytes:已處理的位元組總大小。
  • totalEstimatedBytes:預估剩餘需處理的位元組數量。
  • InfoTypeStatistics 物件:檢查工作期間每種 infoType 發現次數的統計資料。

如需完整的檢查工作結果,有兩種做法。視您選擇的 Action 而定,檢查工作會:

  • 儲存至 BigQuery (SaveFindings 物件) 的指定資料表中。您必須先使用 projects.dlpJobs.get 方法 (如下所述) 確認已完成工作,才能查看或分析結果。請注意,您可以使用 OutputSchema 物件指定儲存發現項目的結構定義。
  • 發布至 Cloud Pub/Sub 主題 (PublishToPubSub 物件)。您必須授予該主題權限,使其有權發布執行 DlpJob 並傳送通知的 Cloud DLP 服務帳戶。

為了仔細檢查 Cloud DLP 所產生的大量資料,您可以使用內建的 BigQuery 工具來執行各種 SQL 數據分析或工具,例如運用 Google 數據分析來產生報表。詳情請參閱 Cloud DLP 發現項目分析與報告一文。如需查詢範例,請參閱在 BigQuery 中查詢發現項目一文。

將儲存空間存放區檢查要求傳送至 Cloud DLP 時,系統會建立並執行 DlpJob 物件執行個體以做為回應。視您的資料大小及指定的設定而定,執行這些工作可能需要幾秒、幾分或幾小時。如果選擇發布至 Cloud Pub/Sub 主題 (做法為在 Action 中指定 PublishToPubSub),則會在工作狀態變更時,自動傳送通知給指定名稱的主題。Cloud Pub/Sub 主題名稱會指定為 projects/[PROJECT_ID]/topics/[PUBSUB-TOPIC-NAME] 的格式。

您可以使用以下管理方法充分控管自己建立的工作:

  • projects.dlpJobs.cancel 方法:停止目前正在進行的工作。伺服器會盡可能嘗試取消工作,但不保證一定能成功。工作及其設定會保留至您將工作刪除為止。
  • projects.dlpJobs.delete 方法:刪除工作及其設定。
  • projects.dlpJobs.get 方法:擷取單一工作並傳回其狀態和設定,以及在工作完成時傳回摘要結果。
  • projects.dlpJobs.list 方法:擷取所有工作的清單,並提供篩選結果的功能。
本頁內容對您是否有任何幫助?請提供意見:

傳送您對下列選項的寶貴意見...

這個網頁
Cloud Data Loss Prevention