ストレージとデータベースに含まれる機密データの検査

ストレージ リポジトリに保存されている機密データを適切に管理する第一歩は、ストレージの分類です。これは、機密データが置かれているリポジトリ内の場所、機密データのタイプ、機密データの使用方法を識別することです。これを知っていると、アクセス制御および共有権限を適切に設定するのに役立ち、継続的なモニタリング計画に含めることができます。

Cloud データ損失防止(DLP)を使用すると、Cloud Storage のロケーション、Cloud Datastore の種類、または BigQuery のテーブルに保存されている機密データを検出して分類できます。Cloud DLP がスキャンできる Cloud Storage 内のファイルタイプのファイル拡張子のリストは、API リファレンスの FileType のページに記されています。認識されない形式のファイルは、バイナリ ファイルとしてスキャンされます。

テキストデータを API に直接ストリーミングする代わりに、リクエストで場所や構成情報を指定できます。Cloud DLP は、指定された場所のデータを検査し、コンテンツでの infoType可能性の値などに関する詳細を使用できるようにするジョブを開始します。

Cloud DLP API は RESTful です。いずれかの言語で Cloud DLP Client Library をプログラムで使用して操作することもできます。

このトピックには以下の情報が含まれます。

  • 各 Google Cloud Platform ストレージ リポジトリ タイプ(Cloud Storage、Cloud Datastore、BigQuery)の JSON の例と、いくつかのプログラミング言語でのコードサンプル。
  • スキャンジョブの構成オプションの詳細。
  • スキャン結果の取得方法と、正常に行われた各リクエストから作成されたスキャンジョブの管理方法に関する説明。

Cloud Storage のロケーションの検査

Cloud Storage のロケーションの検査は、REST リクエストで Cloud DLP を使用するか、プログラムによりいくつかの言語でクライアント ライブラリを使用することにより設定できます。

コードの例

以下では、Cloud DLP を使用して Cloud Storage のロケーションを検査する方法を示す JSON とコードの例をいくつかの言語で示します。リクエストに含まれるパラメータの詳細については、このトピックの後半のストレージの検査の構成をご覧ください。

プロトコル

以下は、POST リクエストによって、指定された Cloud DLP REST エンドポイントに送信できるサンプル JSON です。このサンプル JSON では、Cloud DLP API を使用して Cloud Storage バケットを検査する方法を示します。リクエストに含まれるパラメータの詳細については、このトピックの後半のストレージの検査の構成をご覧ください。

projects.dlpJobs.create メソッドのリファレンス ページで API Explorer を使用すれば、これをすぐに試すことができます。API Explorer でもリクエストが成功すると、新しいスキャンジョブが作成されることに注意してください。スキャンジョブを制御する方法の詳細については、このトピックの後半の検査結果の取得を参照してください。JSON を使用して Cloud DLP API にリクエストを送信する方法については、JSON クイックスタートをご覧ください。

JSON 入力:

POST https://dlp.googleapis.com/v2/projects/[PROJECT_NAME]/dlpJobs?key={YOUR_API_KEY}

{
  "inspectJob":{
    "storageConfig":{
      "cloudStorageOptions":{
        "fileSet":{
          "url":"gs://[GCS_BUCKET_NAME]/*"
        },
        "bytesLimitPerFile":"1073741824"
      },
      "timespanConfig":{
        "startTime":"2017-11-13T12:34:29.965633345Z",
        "endTime":"2018-01-05T04:45:04.240912125Z"
      }
    },
    "inspectConfig":{
      "infoTypes":[
        {
          "name":"PHONE_NUMBER"
        }
      ],
      "excludeInfoTypes":false,
      "includeQuote":true,
      "minLikelihood":"LIKELY"
    },
    "actions":[
      {
        "saveFindings":{
          "outputConfig":{
            "table":{
              "projectId":"[PROJECT_ID]",
              "datasetId":"[DATASET_ID]"
            }
          }
        }
      }
    ]
  }
}

JSON 出力:

{
  "name":"projects/[PROJECT_ID]/dlpJobs/i-2304647377058311040",
  "type":"INSPECT_JOB",
  "state":"PENDING",
  "inspectDetails":{
    "requestedOptions":{
      "snapshotInspectTemplate":{

      },
      "jobConfig":{
        "storageConfig":{
          "cloudStorageOptions":{
            "fileSet":{
              "url":"gs://[GCS_BUCKET_NAME]/*"
            },
            "bytesLimitPerFile":"1073741824"
          },
          "timespanConfig":{
            "startTime":"2017-11-13T12:34:29.965633345Z",
            "endTime":"2018-01-05T04:45:04.240912125Z"
          }
        },
        "inspectConfig":{
          "infoTypes":[
            {
              "name":"PHONE_NUMBER"
            }
          ],
          "minLikelihood":"LIKELY",
          "limits":{

          },
          "includeQuote":true
        },
        "actions":[
          {
            "saveFindings":{
              "outputConfig":{
                "table":{
                  "projectId":"[PROJECT_ID]",
                  "datasetId":"[DATASET_ID]",
                  "tableId":"[NEW_TABLE_ID]"
                }
              }
            }
          }
        ]
      }
    }
  },
  "createTime":"2018-11-07T18:01:14.225Z"
}

Java

/**
 * Inspect GCS file for Info types and wait on job completion using Google Cloud Pub/Sub
 * notification
 *
 * @param bucketName The name of the bucket where the file resides.
 * @param fileName The path to the file within the bucket to inspect (can include wildcards, eg.
 *     my-image.*)
 * @param minLikelihood The minimum likelihood required before returning a match
 * @param infoTypes The infoTypes of information to match
 * @param maxFindings The maximum number of findings to report (0 = server maximum)
 * @param topicId Google Cloud Pub/Sub topic Id to notify of job status
 * @param subscriptionId Google Cloud Subscription to above topic to listen for job status updates
 * @param projectId Google Cloud project ID
 */
private static void inspectGcsFile(
    String bucketName,
    String fileName,
    Likelihood minLikelihood,
    List<InfoType> infoTypes,
    List<CustomInfoType> customInfoTypes,
    int maxFindings,
    String topicId,
    String subscriptionId,
    String projectId)
    throws Exception {
  // Instantiates a client
  try (DlpServiceClient dlpServiceClient = DlpServiceClient.create()) {

    CloudStorageOptions cloudStorageOptions =
        CloudStorageOptions.newBuilder()
            .setFileSet(
                CloudStorageOptions.FileSet.newBuilder()
                    .setUrl("gs://" + bucketName + "/" + fileName))
            .build();

    StorageConfig storageConfig =
        StorageConfig.newBuilder().setCloudStorageOptions(cloudStorageOptions).build();

    FindingLimits findingLimits =
        FindingLimits.newBuilder().setMaxFindingsPerRequest(maxFindings).build();

    InspectConfig inspectConfig =
        InspectConfig.newBuilder()
            .addAllInfoTypes(infoTypes)
            .addAllCustomInfoTypes(customInfoTypes)
            .setMinLikelihood(minLikelihood)
            .setLimits(findingLimits)
            .build();

    String pubSubTopic = String.format("projects/%s/topics/%s", projectId, topicId);
    Action.PublishToPubSub publishToPubSub =
        Action.PublishToPubSub.newBuilder().setTopic(pubSubTopic).build();

    Action action = Action.newBuilder().setPubSub(publishToPubSub).build();

    InspectJobConfig inspectJobConfig =
        InspectJobConfig.newBuilder()
            .setStorageConfig(storageConfig)
            .setInspectConfig(inspectConfig)
            .addActions(action)
            .build();

    // Semi-synchronously submit an inspect job, and wait on results
    CreateDlpJobRequest createDlpJobRequest =
        CreateDlpJobRequest.newBuilder()
            .setParent(ProjectName.of(projectId).toString())
            .setInspectJob(inspectJobConfig)
            .build();

    DlpJob dlpJob = dlpServiceClient.createDlpJob(createDlpJobRequest);

    System.out.println("Job created with ID:" + dlpJob.getName());

    final SettableApiFuture<Boolean> done = SettableApiFuture.create();

    // Set up a Pub/Sub subscriber to listen on the job completion status
    Subscriber subscriber =
        Subscriber.newBuilder(
                ProjectSubscriptionName.of(projectId, subscriptionId),
          (pubsubMessage, ackReplyConsumer) -> {
            if (pubsubMessage.getAttributesCount() > 0
                && pubsubMessage
                    .getAttributesMap()
                    .get("DlpJobName")
                    .equals(dlpJob.getName())) {
              // notify job completion
              done.set(true);
              ackReplyConsumer.ack();
            }
          })
            .build();
    subscriber.startAsync();

    // Wait for job completion semi-synchronously
    // For long jobs, consider using a truly asynchronous execution model such as Cloud Functions
    try {
      done.get(1, TimeUnit.MINUTES);
      Thread.sleep(500); // Wait for the job to become available
    } catch (Exception e) {
      System.out.println("Unable to verify job completion.");
    }

    DlpJob completedJob =
        dlpServiceClient.getDlpJob(
            GetDlpJobRequest.newBuilder().setName(dlpJob.getName()).build());

    System.out.println("Job status: " + completedJob.getState());
    InspectDataSourceDetails inspectDataSourceDetails = completedJob.getInspectDetails();
    InspectDataSourceDetails.Result result = inspectDataSourceDetails.getResult();
    if (result.getInfoTypeStatsCount() > 0) {
      System.out.println("Findings: ");
      for (InfoTypeStats infoTypeStat : result.getInfoTypeStatsList()) {
        System.out.print("\tInfo type: " + infoTypeStat.getInfoType().getName());
        System.out.println("\tCount: " + infoTypeStat.getCount());
      }
    } else {
      System.out.println("No findings.");
    }
  }
}

Node.js

// Import the Google Cloud client libraries
const DLP = require('@google-cloud/dlp');
const {PubSub} = require('@google-cloud/pubsub');

// Instantiates clients
const dlp = new DLP.DlpServiceClient();
const pubsub = new PubSub();

// The project ID to run the API call under
// const callingProjectId = process.env.GCLOUD_PROJECT;

// The name of the bucket where the file resides.
// const bucketName = 'YOUR-BUCKET';

// The path to the file within the bucket to inspect.
// Can contain wildcards, e.g. "my-image.*"
// const fileName = 'my-image.png';

// The minimum likelihood required before returning a match
// const minLikelihood = 'LIKELIHOOD_UNSPECIFIED';

// The maximum number of findings to report per request (0 = server maximum)
// const maxFindings = 0;

// The infoTypes of information to match
// const infoTypes = [{ name: 'PHONE_NUMBER' }, { name: 'EMAIL_ADDRESS' }, { name: 'CREDIT_CARD_NUMBER' }];

// The customInfoTypes of information to match
// const customInfoTypes = [{ name: 'DICT_TYPE', dictionary: { wordList: { words: ['foo', 'bar', 'baz']}}},
//   { name: 'REGEX_TYPE', regex: '\\(\\d{3}\\) \\d{3}-\\d{4}'}];

// The name of the Pub/Sub topic to notify once the job completes
// TODO(developer): create a Pub/Sub topic to use for this
// const topicId = 'MY-PUBSUB-TOPIC'

// The name of the Pub/Sub subscription to use when listening for job
// completion notifications
// TODO(developer): create a Pub/Sub subscription to use for this
// const subscriptionId = 'MY-PUBSUB-SUBSCRIPTION'

// Get reference to the file to be inspected
const storageItem = {
  cloudStorageOptions: {
    fileSet: {url: `gs://${bucketName}/${fileName}`},
  },
};

// Construct request for creating an inspect job
const request = {
  parent: dlp.projectPath(callingProjectId),
  inspectJob: {
    inspectConfig: {
      infoTypes: infoTypes,
      customInfoTypes: customInfoTypes,
      minLikelihood: minLikelihood,
      limits: {
        maxFindingsPerRequest: maxFindings,
      },
    },
    storageConfig: storageItem,
    actions: [
      {
        pubSub: {
          topic: `projects/${callingProjectId}/topics/${topicId}`,
        },
      },
    ],
  },
};

try {
  // Create a GCS File inspection job and wait for it to complete
  const [topicResponse] = await pubsub.topic(topicId).get();
  // Verify the Pub/Sub topic and listen for job notifications via an
  // existing subscription.
  const subscription = await topicResponse.subscription(subscriptionId);
  const [jobsResponse] = await dlp.createDlpJob(request);
  // Get the job's ID
  const jobName = jobsResponse.name;
  // Watch the Pub/Sub topic until the DLP job finishes
  await new Promise((resolve, reject) => {
    const messageHandler = message => {
      if (message.attributes && message.attributes.DlpJobName === jobName) {
        message.ack();
        subscription.removeListener('message', messageHandler);
        subscription.removeListener('error', errorHandler);
        resolve(jobName);
      } else {
        message.nack();
      }
    };

    const errorHandler = err => {
      subscription.removeListener('message', messageHandler);
      subscription.removeListener('error', errorHandler);
      reject(err);
    };

    subscription.on('message', messageHandler);
    subscription.on('error', errorHandler);
  });

  setTimeout(() => {
    console.log(`Waiting for DLP job to fully complete`);
  }, 500);
  const [job] = await dlp.getDlpJob({name: jobName});
  console.log(`Job ${job.name} status: ${job.state}`);

  const infoTypeStats = job.inspectDetails.result.infoTypeStats;
  if (infoTypeStats.length > 0) {
    infoTypeStats.forEach(infoTypeStat => {
      console.log(
        `  Found ${infoTypeStat.count} instance(s) of infoType ${
          infoTypeStat.infoType.name
        }.`
      );
    });
  } else {
    console.log(`No findings.`);
  }
} catch (err) {
  console.log(`Error in inspectGCSFile: ${err.message || err}`);
}

Python

def inspect_gcs_file(project, bucket, filename, topic_id, subscription_id,
                     info_types, custom_dictionaries=None,
                     custom_regexes=None, min_likelihood=None,
                     max_findings=None, timeout=300):
    """Uses the Data Loss Prevention API to analyze a file on GCS.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        bucket: The name of the GCS bucket containing the file, as a string.
        filename: The name of the file in the bucket, including the path, as a
            string; e.g. 'images/myfile.png'.
        topic_id: The id of the Cloud Pub/Sub topic to which the API will
            broadcast job completion. The topic must already exist.
        subscription_id: The id of the Cloud Pub/Sub subscription to listen on
            while waiting for job completion. The subscription must already
            exist and be subscribed to the topic.
        info_types: A list of strings representing info types to look for.
            A full list of info type categories can be fetched from the API.
        min_likelihood: A string representing the minimum likelihood threshold
            that constitutes a match. One of: 'LIKELIHOOD_UNSPECIFIED',
            'VERY_UNLIKELY', 'UNLIKELY', 'POSSIBLE', 'LIKELY', 'VERY_LIKELY'.
        max_findings: The maximum number of findings to report; 0 = no maximum.
        timeout: The number of seconds to wait for a response from the API.
    Returns:
        None; the response from the API is printed to the terminal.
    """

    # Import the client library.
    import google.cloud.dlp

    # This sample additionally uses Cloud Pub/Sub to receive results from
    # potentially long-running operations.
    import google.cloud.pubsub

    # This sample also uses threading.Event() to wait for the job to finish.
    import threading

    # Instantiate a client.
    dlp = google.cloud.dlp.DlpServiceClient()

    # Prepare info_types by converting the list of strings into a list of
    # dictionaries (protos are also accepted).
    if not info_types:
        info_types = ['FIRST_NAME', 'LAST_NAME', 'EMAIL_ADDRESS']
    info_types = [{'name': info_type} for info_type in info_types]

    # Prepare custom_info_types by parsing the dictionary word lists and
    # regex patterns.
    if custom_dictionaries is None:
        custom_dictionaries = []
    dictionaries = [{
        'info_type': {'name': 'CUSTOM_DICTIONARY_{}'.format(i)},
        'dictionary': {
            'word_list': {'words': custom_dict.split(',')}
        }
    } for i, custom_dict in enumerate(custom_dictionaries)]
    if custom_regexes is None:
        custom_regexes = []
    regexes = [{
        'info_type': {'name': 'CUSTOM_REGEX_{}'.format(i)},
        'regex': {'pattern': custom_regex}
    } for i, custom_regex in enumerate(custom_regexes)]
    custom_info_types = dictionaries + regexes

    # Construct the configuration dictionary. Keys which are None may
    # optionally be omitted entirely.
    inspect_config = {
        'info_types': info_types,
        'custom_info_types': custom_info_types,
        'min_likelihood': min_likelihood,
        'limits': {'max_findings_per_request': max_findings},
    }

    # Construct a storage_config containing the file's URL.
    url = 'gs://{}/{}'.format(bucket, filename)
    storage_config = {
        'cloud_storage_options': {
            'file_set': {'url': url}
        }
    }

    # Convert the project id into a full resource id.
    parent = dlp.project_path(project)

    # Tell the API where to send a notification when the job is complete.
    actions = [{
        'pub_sub': {'topic': '{}/topics/{}'.format(parent, topic_id)}
    }]

    # Construct the inspect_job, which defines the entire inspect content task.
    inspect_job = {
        'inspect_config': inspect_config,
        'storage_config': storage_config,
        'actions': actions,
    }

    operation = dlp.create_dlp_job(parent, inspect_job=inspect_job)

    # Create a Pub/Sub client and find the subscription. The subscription is
    # expected to already be listening to the topic.
    subscriber = google.cloud.pubsub.SubscriberClient()
    subscription_path = subscriber.subscription_path(
        project, subscription_id)

    # Set up a callback to acknowledge a message. This closes around an event
    # so that it can signal that it is done and the main thread can continue.
    job_done = threading.Event()

    def callback(message):
        try:
            if (message.attributes['DlpJobName'] == operation.name):
                # This is the message we're looking for, so acknowledge it.
                message.ack()

                # Now that the job is done, fetch the results and print them.
                job = dlp.get_dlp_job(operation.name)
                if job.inspect_details.result.info_type_stats:
                    for finding in job.inspect_details.result.info_type_stats:
                        print('Info type: {}; Count: {}'.format(
                            finding.info_type.name, finding.count))
                else:
                    print('No findings.')

                # Signal to the main thread that we can exit.
                job_done.set()
            else:
                # This is not the message we're looking for.
                message.drop()
        except Exception as e:
            # Because this is executing in a thread, an exception won't be
            # noted unless we print it manually.
            print(e)
            raise

    subscriber.subscribe(subscription_path, callback=callback)
    finished = job_done.wait(timeout=timeout)
    if not finished:
        print('No event received before the timeout. Please verify that the '
              'subscription provided is subscribed to the topic provided.')

Go

// inspectGCSFile searches for the given info types in the given file.
func inspectGCSFile(w io.Writer, client *dlp.Client, project string, minLikelihood dlppb.Likelihood, maxFindings int32, includeQuote bool, infoTypes []string, customDictionaries []string, customRegexes []string, pubSubTopic, pubSubSub, bucketName, fileName string) {
	// Convert the info type strings to a list of InfoTypes.
	var i []*dlppb.InfoType
	for _, it := range infoTypes {
		i = append(i, &dlppb.InfoType{Name: it})
	}
	// Convert the custom dictionary word lists and custom regexes to a list of CustomInfoTypes.
	var customInfoTypes []*dlppb.CustomInfoType
	for idx, it := range customDictionaries {
		customInfoTypes = append(customInfoTypes, &dlppb.CustomInfoType{
			InfoType: &dlppb.InfoType{
				Name: fmt.Sprintf("CUSTOM_DICTIONARY_%d", idx),
			},
			Type: &dlppb.CustomInfoType_Dictionary_{
				Dictionary: &dlppb.CustomInfoType_Dictionary{
					Source: &dlppb.CustomInfoType_Dictionary_WordList_{
						WordList: &dlppb.CustomInfoType_Dictionary_WordList{
							Words: strings.Split(it, ","),
						},
					},
				},
			},
		})
	}
	for idx, it := range customRegexes {
		customInfoTypes = append(customInfoTypes, &dlppb.CustomInfoType{
			InfoType: &dlppb.InfoType{
				Name: fmt.Sprintf("CUSTOM_REGEX_%d", idx),
			},
			Type: &dlppb.CustomInfoType_Regex_{
				Regex: &dlppb.CustomInfoType_Regex{
					Pattern: it,
				},
			},
		})
	}

	ctx := context.Background()

	// Create a PubSub Client used to listen for when the inspect job finishes.
	pClient, err := pubsub.NewClient(ctx, project)
	if err != nil {
		log.Fatalf("Error creating PubSub client: %v", err)
	}
	defer pClient.Close()

	// Create a PubSub subscription we can use to listen for messages.
	s, err := setupPubSub(ctx, pClient, project, pubSubTopic, pubSubSub)
	if err != nil {
		log.Fatalf("Error setting up PubSub: %v\n", err)
	}

	// topic is the PubSub topic string where messages should be sent.
	topic := "projects/" + project + "/topics/" + pubSubTopic

	// Create a configured request.
	req := &dlppb.CreateDlpJobRequest{
		Parent: "projects/" + project,
		Job: &dlppb.CreateDlpJobRequest_InspectJob{
			InspectJob: &dlppb.InspectJobConfig{
				// StorageConfig describes where to find the data.
				StorageConfig: &dlppb.StorageConfig{
					Type: &dlppb.StorageConfig_CloudStorageOptions{
						CloudStorageOptions: &dlppb.CloudStorageOptions{
							FileSet: &dlppb.CloudStorageOptions_FileSet{
								Url: "gs://" + bucketName + "/" + fileName,
							},
						},
					},
				},
				// InspectConfig describes what fields to look for.
				InspectConfig: &dlppb.InspectConfig{
					InfoTypes:       i,
					CustomInfoTypes: customInfoTypes,
					MinLikelihood:   minLikelihood,
					Limits: &dlppb.InspectConfig_FindingLimits{
						MaxFindingsPerRequest: maxFindings,
					},
					IncludeQuote: includeQuote,
				},
				// Send a message to PubSub using Actions.
				Actions: []*dlppb.Action{
					{
						Action: &dlppb.Action_PubSub{
							PubSub: &dlppb.Action_PublishToPubSub{
								Topic: topic,
							},
						},
					},
				},
			},
		},
	}
	// Create the inspect job.
	j, err := client.CreateDlpJob(context.Background(), req)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Fprintf(w, "Created job: %v\n", j.GetName())

	// Wait for the inspect job to finish by waiting for a PubSub message.
	ctx, cancel := context.WithCancel(ctx)
	err = s.Receive(ctx, func(ctx context.Context, msg *pubsub.Message) {
		// If this is the wrong job, do not process the result.
		if msg.Attributes["DlpJobName"] != j.GetName() {
			msg.Nack()
			return
		}
		msg.Ack()
		resp, err := client.GetDlpJob(ctx, &dlppb.GetDlpJobRequest{
			Name: j.GetName(),
		})
		if err != nil {
			log.Fatalf("Error getting completed job: %v\n", err)
		}
		r := resp.GetInspectDetails().GetResult().GetInfoTypeStats()
		if len(r) == 0 {
			fmt.Fprintf(w, "No results")
		}
		for _, s := range r {
			fmt.Fprintf(w, "  Found %v instances of infoType %v\n", s.GetCount(), s.GetInfoType().GetName())
		}
		// Stop listening for more messages.
		cancel()
	})
	if err != nil {
		log.Fatalf("Error receiving from PubSub: %v\n", err)
	}
}

PHP

use Google\Cloud\Dlp\V2\DlpServiceClient;
use Google\Cloud\Dlp\V2\CloudStorageOptions;
use Google\Cloud\Dlp\V2\CloudStorageOptions\FileSet;
use Google\Cloud\Dlp\V2\InfoType;
use Google\Cloud\Dlp\V2\InspectConfig;
use Google\Cloud\Dlp\V2\StorageConfig;
use Google\Cloud\Dlp\V2\Likelihood;
use Google\Cloud\Dlp\V2\DlpJob\JobState;
use Google\Cloud\Dlp\V2\InspectConfig\FindingLimits;
use Google\Cloud\Dlp\V2\Action;
use Google\Cloud\Dlp\V2\Action\PublishToPubSub;
use Google\Cloud\Dlp\V2\InspectJobConfig;
use Google\Cloud\PubSub\PubSubClient;

/**
 * Inspect a file stored on Google Cloud Storage , using Pub/Sub for job status notifications.
 *
 * @param string $callingProjectId The project ID to run the API call under
 * @param string $bucketId The name of the bucket where the file resides
 * @param string $file The path to the file within the bucket to inspect. Can contain wildcards
 *        e.g. "my-image.*"
 * @param string $topicId The name of the Pub/Sub topic to notify once the job completes
 * @param string $subscriptionId The name of the Pub/Sub subscription to use when listening for job
 * @param int $maxFindings (Optional) The maximum number of findings to report per request (0 = server maximum)
 */
function inspect_gcs(
    $callingProjectId,
    $bucketId,
    $file,
    $topicId,
    $subscriptionId,
    $maxFindings = 0
) {
    // Instantiate a client.
    $dlp = new DlpServiceClient([
        'projectId' => $callingProjectId,
    ]);
    $pubsub = new PubSubClient([
        'projectId' => $callingProjectId,
    ]);
    $topic = $pubsub->topic($topicId);

    // The infoTypes of information to match
    $personNameInfoType = (new InfoType())
        ->setName('PERSON_NAME');
    $creditCardNumberInfoType = (new InfoType())
        ->setName('CREDIT_CARD_NUMBER');
    $infoTypes = [$personNameInfoType, $creditCardNumberInfoType];

    // The minimum likelihood required before returning a match
    $minLikelihood = likelihood::LIKELIHOOD_UNSPECIFIED;

    // Specify finding limits
    $limits = (new FindingLimits())
        ->setMaxFindingsPerRequest($maxFindings);

    // Construct items to be inspected
    $fileSet = (new FileSet())
        ->setUrl('gs://' . $bucketId . '/' . $file);

    $cloudStorageOptions = (new CloudStorageOptions())
        ->setFileSet($fileSet);

    $storageConfig = (new StorageConfig())
        ->setCloudStorageOptions($cloudStorageOptions);

    // Construct the inspect config object
    $inspectConfig = (new InspectConfig())
        ->setMinLikelihood($minLikelihood)
        ->setLimits($limits)
        ->setInfoTypes($infoTypes);

    // Construct the action to run when job completes
    $pubSubAction = (new PublishToPubSub())
        ->setTopic($topic->name());

    $action = (new Action())
        ->setPubSub($pubSubAction);

    // Construct inspect job config to run
    $inspectJob = (new InspectJobConfig())
        ->setInspectConfig($inspectConfig)
        ->setStorageConfig($storageConfig)
        ->setActions([$action]);

    // Listen for job notifications via an existing topic/subscription.
    $subscription = $topic->subscription($subscriptionId);

    // Submit request
    $parent = $dlp->projectName($callingProjectId);
    $job = $dlp->createDlpJob($parent, [
        'inspectJob' => $inspectJob
    ]);

    // Poll via Pub/Sub until job finishes
    while (true) {
        foreach ($subscription->pull() as $message) {
            if (isset($message->attributes()['DlpJobName']) &&
                $message->attributes()['DlpJobName'] === $job->getName()) {
                $subscription->acknowledge($message);
                break 2;
            }
        }
    }

    // Sleep for one second to avoid race condition with the job's status.
    usleep(1000000);

    // Get the updated job
    $job = $dlp->getDlpJob($job->getName());

    // Print finding counts
    printf('Job %s status: %s' . PHP_EOL, $job->getName(), $job->getState());
    switch ($job->getState()) {
        case JobState::DONE:
            $infoTypeStats = $job->getInspectDetails()->getResult()->getInfoTypeStats();
            if (count($infoTypeStats) === 0) {
                print('No findings.' . PHP_EOL);
            } else {
                foreach ($infoTypeStats as $infoTypeStat) {
                    printf('  Found %s instance(s) of infoType %s' . PHP_EOL, $infoTypeStat->getCount(), $infoTypeStat->getInfoType()->getName());
                }
            }
            break;
        case JobState::FAILED:
            printf('Job %s had errors:' . PHP_EOL, $job->getName());
            $errors = $job->getErrors();
            foreach ($errors as $error) {
                var_dump($error->getDetails());
            }
            break;
        default:
            print('Unexpected job state. Most likely, the job is either running or has not yet started.');
    }
}

C#

public static object InspectGCS(
    string projectId,
    string minLikelihood,
    int maxFindings,
    bool includeQuote,
    IEnumerable<InfoType> infoTypes,
    IEnumerable<CustomInfoType> customInfoTypes,
    string bucketName,
    string topicId,
    string subscriptionId)
{
    var inspectJob = new InspectJobConfig
    {
        StorageConfig = new StorageConfig
        {
            CloudStorageOptions = new CloudStorageOptions
            {
                FileSet = new CloudStorageOptions.Types.FileSet { Url = $"gs://{bucketName}/*.txt" },
                BytesLimitPerFile = 1073741824
            },
        },
        InspectConfig = new InspectConfig
        {
            InfoTypes = { infoTypes },
            CustomInfoTypes = { customInfoTypes },
            ExcludeInfoTypes = false,
            IncludeQuote = includeQuote,
            Limits = new FindingLimits
            {
                MaxFindingsPerRequest = maxFindings
            },
            MinLikelihood = (Likelihood)System.Enum.Parse(typeof(Likelihood), minLikelihood)
        },
        Actions =
        {
            new Google.Cloud.Dlp.V2.Action
            {
                // Send results to Pub/Sub topic
                PubSub = new Google.Cloud.Dlp.V2.Action.Types.PublishToPubSub
                {
                    Topic = topicId,
                }
            }
        }
    };

    // Issue Create Dlp Job Request
    DlpServiceClient client = DlpServiceClient.Create();
    var request = new CreateDlpJobRequest
    {
        InspectJob = inspectJob,
        ParentAsProjectName = new ProjectName(projectId),
    };

    // We need created job name
    var dlpJob = client.CreateDlpJob(request);

    // Get a pub/sub subscription and listen for DLP results
    var fireEvent = new ManualResetEventSlim();

    var subscriptionName = new SubscriptionName(projectId, subscriptionId);
    var subscriber = SubscriberClient.CreateAsync(subscriptionName).Result;
    subscriber.StartAsync(
        (pubSubMessage, cancellationToken) =>
        {
            // Given a message that we receive on this subscription, we should either acknowledge or decline it
            if (pubSubMessage.Attributes["DlpJobName"] == dlpJob.Name)
            {
                fireEvent.Set();
                return Task.FromResult(SubscriberClient.Reply.Ack);
            }

            return Task.FromResult(SubscriberClient.Reply.Nack);
        });

    // We block here until receiving a signal from a separate thread that is waiting on a message indicating receiving a result of Dlp job
    if (fireEvent.Wait(TimeSpan.FromMinutes(1)))
    {
        // Stop the thread that is listening to messages as a result of StartAsync call earlier
        subscriber.StopAsync(CancellationToken.None).Wait();

        // Now we can inspect full job results
        var job = client.GetDlpJob(new GetDlpJobRequest { DlpJobName = new DlpJobName(projectId, dlpJob.Name) });

        // Inspect Job details
        Console.WriteLine($"Processed bytes: {job.InspectDetails.Result.ProcessedBytes}");
        Console.WriteLine($"Total estimated bytes: {job.InspectDetails.Result.TotalEstimatedBytes}");
        var stats = job.InspectDetails.Result.InfoTypeStats;
        Console.WriteLine("Found stats:");
        foreach (var stat in stats)
        {
            Console.WriteLine($"{stat.InfoType.Name}");
        }
    }
    else
    {
        Console.WriteLine("Error: The wait failed on timeout");
    }

    return 0;
}

Cloud Datastore の種類の検査

Cloud Datastore の種類の検査は、REST リクエストで Cloud DLP API を使用するか、プログラムによりいくつかの言語でクライアント ライブラリを使用することにより、設定できます。

コードの例

以下では、Cloud DLP を使用して Cloud Datastore の種類を検査する方法を示す JSON とコードの例をいくつかの言語で示します。リクエストに含まれるパラメータの詳細については、このトピックの後半のストレージの検査の構成をご覧ください。

プロトコル

以下は、POST リクエストによって、指定された Cloud DLP API REST エンドポイントに送信できるサンプル JSON です。この例の JSON は、Cloud DLP API を使用して Cloud Datastore の種類を検査する方法を示します。リクエストに含まれるパラメータの詳細については、このトピックの後半のストレージの検査の構成をご覧ください。

projects.dlpJobs.create メソッドのリファレンス ページで API Explorer を使用すれば、これをすぐに試すことができます。API Explorer でもリクエストが成功すると、新しいスキャンジョブが作成されることに注意してください。スキャンジョブを制御する方法の詳細については、このトピックの後半の検査結果の取得を参照してください。JSON を使用して Cloud DLP API にリクエストを送信する方法については、JSON クイックスタートをご覧ください。

JSON 入力:

POST https://dlp.googleapis.com/v2/projects/[PROJECT_NAME]/dlpJobs?key={YOUR_API_KEY}

{
  "inspectJob":{
    "storageConfig":{
      "datastoreOptions":{
        "kind":{
          "name":"Example-Kind"
        },
        "partitionId":{
          "namespaceId":"[NAMESPACE_ID]",
          "projectId":"[PROJECT_ID]"
        }
      }
    },
    "inspectConfig":{
      "infoTypes":[
        {
          "name":"PHONE_NUMBER"
        }
      ],
      "excludeInfoTypes":false,
      "includeQuote":true,
      "minLikelihood":"LIKELY"
    },
    "actions":[
      {
        "saveFindings":{
          "outputConfig":{
            "table":{
              "projectId":"[PROJECT_ID]",
              "datasetId":"[BIGQUERY-DATASET-NAME]",
              "tableId":"[BIGQUERY-TABLE-NAME]"
            }
          }
        }
      }
    ]
  }
}

Java

/**
 * Inspect a Datastore kind
 *
 * @param projectId The project ID containing the target Datastore
 * @param namespaceId The ID namespace of the Datastore document to inspect
 * @param kind The kind of the Datastore entity to inspect
 * @param minLikelihood The minimum likelihood required before returning a match
 * @param infoTypes The infoTypes of information to match
 * @param maxFindings max number of findings
 * @param topicId Google Cloud Pub/Sub topic to notify job status updates
 * @param subscriptionId Google Cloud Pub/Sub subscription to above topic to receive status
 *     updates
 */
private static void inspectDatastore(
    String projectId,
    String namespaceId,
    String kind,
    Likelihood minLikelihood,
    List<InfoType> infoTypes,
    List<CustomInfoType> customInfoTypes,
    int maxFindings,
    String topicId,
    String subscriptionId) {
  // Instantiates a client
  try (DlpServiceClient dlpServiceClient = DlpServiceClient.create()) {

    // Reference to the Datastore namespace
    PartitionId partitionId =
        PartitionId.newBuilder().setProjectId(projectId).setNamespaceId(namespaceId).build();

    // Reference to the Datastore kind
    KindExpression kindExpression = KindExpression.newBuilder().setName(kind).build();
    DatastoreOptions datastoreOptions =
        DatastoreOptions.newBuilder().setKind(kindExpression).setPartitionId(partitionId).build();

    // Construct Datastore configuration to be inspected
    StorageConfig storageConfig =
        StorageConfig.newBuilder().setDatastoreOptions(datastoreOptions).build();

    FindingLimits findingLimits =
        FindingLimits.newBuilder().setMaxFindingsPerRequest(maxFindings).build();

    InspectConfig inspectConfig =
        InspectConfig.newBuilder()
            .addAllInfoTypes(infoTypes)
            .addAllCustomInfoTypes(customInfoTypes)
            .setMinLikelihood(minLikelihood)
            .setLimits(findingLimits)
            .build();

    String pubSubTopic = String.format("projects/%s/topics/%s", projectId, topicId);
    Action.PublishToPubSub publishToPubSub =
        Action.PublishToPubSub.newBuilder().setTopic(pubSubTopic).build();

    Action action = Action.newBuilder().setPubSub(publishToPubSub).build();

    InspectJobConfig inspectJobConfig =
        InspectJobConfig.newBuilder()
            .setStorageConfig(storageConfig)
            .setInspectConfig(inspectConfig)
            .addActions(action)
            .build();

    // Asynchronously submit an inspect job, and wait on results
    CreateDlpJobRequest createDlpJobRequest =
        CreateDlpJobRequest.newBuilder()
            .setParent(ProjectName.of(projectId).toString())
            .setInspectJob(inspectJobConfig)
            .build();

    DlpJob dlpJob = dlpServiceClient.createDlpJob(createDlpJobRequest);

    System.out.println("Job created with ID:" + dlpJob.getName());

    final SettableApiFuture<Boolean> done = SettableApiFuture.create();

    // Set up a Pub/Sub subscriber to listen on the job completion status
    Subscriber subscriber =
        Subscriber.newBuilder(
                ProjectSubscriptionName.of(projectId, subscriptionId),
          (pubsubMessage, ackReplyConsumer) -> {
            if (pubsubMessage.getAttributesCount() > 0
                && pubsubMessage
                    .getAttributesMap()
                    .get("DlpJobName")
                    .equals(dlpJob.getName())) {
              // notify job completion
              done.set(true);
              ackReplyConsumer.ack();
            }
          })
            .build();
    subscriber.startAsync();

    // Wait for job completion semi-synchronously
    // For long jobs, consider using a truly asynchronous execution model such as Cloud Functions
    try {
      done.get(1, TimeUnit.MINUTES);
      Thread.sleep(500); // Wait for the job to become available
    } catch (Exception e) {
      System.out.println("Unable to verify job completion.");
    }

    DlpJob completedJob =
        dlpServiceClient.getDlpJob(
            GetDlpJobRequest.newBuilder().setName(dlpJob.getName()).build());

    System.out.println("Job status: " + completedJob.getState());
    InspectDataSourceDetails inspectDataSourceDetails = completedJob.getInspectDetails();
    InspectDataSourceDetails.Result result = inspectDataSourceDetails.getResult();
    if (result.getInfoTypeStatsCount() > 0) {
      System.out.println("Findings: ");
      for (InfoTypeStats infoTypeStat : result.getInfoTypeStatsList()) {
        System.out.print("\tInfo type: " + infoTypeStat.getInfoType().getName());
        System.out.println("\tCount: " + infoTypeStat.getCount());
      }
    } else {
      System.out.println("No findings.");
    }
  } catch (Exception e) {
    System.out.println("inspectDatastore Problems: " + e.getMessage());
  }
}

Node.js

// Import the Google Cloud client libraries
const DLP = require('@google-cloud/dlp');
const {PubSub} = require('@google-cloud/pubsub');

// Instantiates clients
const dlp = new DLP.DlpServiceClient();
const pubsub = new PubSub();

// The project ID to run the API call under
// const callingProjectId = process.env.GCLOUD_PROJECT;

// The project ID the target Datastore is stored under
// This may or may not equal the calling project ID
// const dataProjectId = process.env.GCLOUD_PROJECT;

// (Optional) The ID namespace of the Datastore document to inspect.
// To ignore Datastore namespaces, set this to an empty string ('')
// const namespaceId = '';

// The kind of the Datastore entity to inspect.
// const kind = 'Person';

// The minimum likelihood required before returning a match
// const minLikelihood = 'LIKELIHOOD_UNSPECIFIED';

// The maximum number of findings to report per request (0 = server maximum)
// const maxFindings = 0;

// The infoTypes of information to match
// const infoTypes = [{ name: 'PHONE_NUMBER' }, { name: 'EMAIL_ADDRESS' }, { name: 'CREDIT_CARD_NUMBER' }];

// The customInfoTypes of information to match
// const customInfoTypes = [{ name: 'DICT_TYPE', dictionary: { wordList: { words: ['foo', 'bar', 'baz']}}},
//   { name: 'REGEX_TYPE', regex: '\\(\\d{3}\\) \\d{3}-\\d{4}'}];

// The name of the Pub/Sub topic to notify once the job completes
// TODO(developer): create a Pub/Sub topic to use for this
// const topicId = 'MY-PUBSUB-TOPIC'

// The name of the Pub/Sub subscription to use when listening for job
// completion notifications
// TODO(developer): create a Pub/Sub subscription to use for this
// const subscriptionId = 'MY-PUBSUB-SUBSCRIPTION'

// Construct items to be inspected
const storageItems = {
  datastoreOptions: {
    partitionId: {
      projectId: dataProjectId,
      namespaceId: namespaceId,
    },
    kind: {
      name: kind,
    },
  },
};

// Construct request for creating an inspect job
const request = {
  parent: dlp.projectPath(callingProjectId),
  inspectJob: {
    inspectConfig: {
      infoTypes: infoTypes,
      customInfoTypes: customInfoTypes,
      minLikelihood: minLikelihood,
      limits: {
        maxFindingsPerRequest: maxFindings,
      },
    },
    storageConfig: storageItems,
    actions: [
      {
        pubSub: {
          topic: `projects/${callingProjectId}/topics/${topicId}`,
        },
      },
    ],
  },
};
try {
  // Run inspect-job creation request
  const [topicResponse] = await pubsub.topic(topicId).get();
  // Verify the Pub/Sub topic and listen for job notifications via an
  // existing subscription.
  const subscription = await topicResponse.subscription(subscriptionId);
  const [jobsResponse] = await dlp.createDlpJob(request);
  const jobName = jobsResponse.name;
  // Watch the Pub/Sub topic until the DLP job finishes
  await new Promise((resolve, reject) => {
    const messageHandler = message => {
      if (message.attributes && message.attributes.DlpJobName === jobName) {
        message.ack();
        subscription.removeListener('message', messageHandler);
        subscription.removeListener('error', errorHandler);
        resolve(jobName);
      } else {
        message.nack();
      }
    };

    const errorHandler = err => {
      subscription.removeListener('message', messageHandler);
      subscription.removeListener('error', errorHandler);
      reject(err);
    };

    subscription.on('message', messageHandler);
    subscription.on('error', errorHandler);
  });
  // Wait for DLP job to fully complete
  setTimeout(() => {
    console.log(`Waiting for DLP job to fully complete`);
  }, 500);
  const [job] = await dlp.getDlpJob({name: jobName});
  console.log(`Job ${job.name} status: ${job.state}`);

  const infoTypeStats = job.inspectDetails.result.infoTypeStats;
  if (infoTypeStats.length > 0) {
    infoTypeStats.forEach(infoTypeStat => {
      console.log(
        `  Found ${infoTypeStat.count} instance(s) of infoType ${
          infoTypeStat.infoType.name
        }.`
      );
    });
  } else {
    console.log(`No findings.`);
  }
} catch (err) {
  console.log(`Error in inspectDatastore: ${err.message || err}`);
}

Python

def inspect_datastore(project, datastore_project, kind,
                      topic_id, subscription_id, info_types,
                      custom_dictionaries=None, custom_regexes=None,
                      namespace_id=None, min_likelihood=None,
                      max_findings=None, timeout=300):
    """Uses the Data Loss Prevention API to analyze Datastore data.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        datastore_project: The Google Cloud project id of the target Datastore.
        kind: The kind of the Datastore entity to inspect, e.g. 'Person'.
        topic_id: The id of the Cloud Pub/Sub topic to which the API will
            broadcast job completion. The topic must already exist.
        subscription_id: The id of the Cloud Pub/Sub subscription to listen on
            while waiting for job completion. The subscription must already
            exist and be subscribed to the topic.
        info_types: A list of strings representing info types to look for.
            A full list of info type categories can be fetched from the API.
        namespace_id: The namespace of the Datastore document, if applicable.
        min_likelihood: A string representing the minimum likelihood threshold
            that constitutes a match. One of: 'LIKELIHOOD_UNSPECIFIED',
            'VERY_UNLIKELY', 'UNLIKELY', 'POSSIBLE', 'LIKELY', 'VERY_LIKELY'.
        max_findings: The maximum number of findings to report; 0 = no maximum.
        timeout: The number of seconds to wait for a response from the API.
    Returns:
        None; the response from the API is printed to the terminal.
    """

    # Import the client library.
    import google.cloud.dlp

    # This sample additionally uses Cloud Pub/Sub to receive results from
    # potentially long-running operations.
    import google.cloud.pubsub

    # This sample also uses threading.Event() to wait for the job to finish.
    import threading

    # Instantiate a client.
    dlp = google.cloud.dlp.DlpServiceClient()

    # Prepare info_types by converting the list of strings into a list of
    # dictionaries (protos are also accepted).
    if not info_types:
        info_types = ['FIRST_NAME', 'LAST_NAME', 'EMAIL_ADDRESS']
    info_types = [{'name': info_type} for info_type in info_types]

    # Prepare custom_info_types by parsing the dictionary word lists and
    # regex patterns.
    if custom_dictionaries is None:
        custom_dictionaries = []
    dictionaries = [{
        'info_type': {'name': 'CUSTOM_DICTIONARY_{}'.format(i)},
        'dictionary': {
            'word_list': {'words': custom_dict.split(',')}
        }
    } for i, custom_dict in enumerate(custom_dictionaries)]
    if custom_regexes is None:
        custom_regexes = []
    regexes = [{
        'info_type': {'name': 'CUSTOM_REGEX_{}'.format(i)},
        'regex': {'pattern': custom_regex}
    } for i, custom_regex in enumerate(custom_regexes)]
    custom_info_types = dictionaries + regexes

    # Construct the configuration dictionary. Keys which are None may
    # optionally be omitted entirely.
    inspect_config = {
        'info_types': info_types,
        'custom_info_types': custom_info_types,
        'min_likelihood': min_likelihood,
        'limits': {'max_findings_per_request': max_findings},
    }

    # Construct a storage_config containing the target Datastore info.
    storage_config = {
        'datastore_options': {
            'partition_id': {
                'project_id': datastore_project,
                'namespace_id': namespace_id,
            },
            'kind': {
                'name': kind
            },
        }
    }

    # Convert the project id into a full resource id.
    parent = dlp.project_path(project)

    # Tell the API where to send a notification when the job is complete.
    actions = [{
        'pub_sub': {'topic': '{}/topics/{}'.format(parent, topic_id)}
    }]

    # Construct the inspect_job, which defines the entire inspect content task.
    inspect_job = {
        'inspect_config': inspect_config,
        'storage_config': storage_config,
        'actions': actions,
    }

    operation = dlp.create_dlp_job(parent, inspect_job=inspect_job)

    # Create a Pub/Sub client and find the subscription. The subscription is
    # expected to already be listening to the topic.
    subscriber = google.cloud.pubsub.SubscriberClient()
    subscription_path = subscriber.subscription_path(
        project, subscription_id)

    # Set up a callback to acknowledge a message. This closes around an event
    # so that it can signal that it is done and the main thread can continue.
    job_done = threading.Event()

    def callback(message):
        try:
            if (message.attributes['DlpJobName'] == operation.name):
                # This is the message we're looking for, so acknowledge it.
                message.ack()

                # Now that the job is done, fetch the results and print them.
                job = dlp.get_dlp_job(operation.name)
                if job.inspect_details.result.info_type_stats:
                    for finding in job.inspect_details.result.info_type_stats:
                        print('Info type: {}; Count: {}'.format(
                            finding.info_type.name, finding.count))
                else:
                    print('No findings.')

                # Signal to the main thread that we can exit.
                job_done.set()
            else:
                # This is not the message we're looking for.
                message.drop()
        except Exception as e:
            # Because this is executing in a thread, an exception won't be
            # noted unless we print it manually.
            print(e)
            raise

    # Register the callback and wait on the event.
    subscriber.subscribe(subscription_path, callback=callback)

    finished = job_done.wait(timeout=timeout)
    if not finished:
        print('No event received before the timeout. Please verify that the '
              'subscription provided is subscribed to the topic provided.')

Go

// inspectDatastore searches for the given info types in the given dataset kind.
func inspectDatastore(w io.Writer, client *dlp.Client, project string, minLikelihood dlppb.Likelihood, maxFindings int32, includeQuote bool, infoTypes []string, customDictionaries []string, customRegexes []string, pubSubTopic, pubSubSub, dataProject, namespaceID, kind string) {
	// Convert the info type strings to a list of InfoTypes.
	var i []*dlppb.InfoType
	for _, it := range infoTypes {
		i = append(i, &dlppb.InfoType{Name: it})
	}
	// Convert the custom dictionary word lists and custom regexes to a list of CustomInfoTypes.
	var customInfoTypes []*dlppb.CustomInfoType
	for idx, it := range customDictionaries {
		customInfoTypes = append(customInfoTypes, &dlppb.CustomInfoType{
			InfoType: &dlppb.InfoType{
				Name: fmt.Sprintf("CUSTOM_DICTIONARY_%d", idx),
			},
			Type: &dlppb.CustomInfoType_Dictionary_{
				Dictionary: &dlppb.CustomInfoType_Dictionary{
					Source: &dlppb.CustomInfoType_Dictionary_WordList_{
						WordList: &dlppb.CustomInfoType_Dictionary_WordList{
							Words: strings.Split(it, ","),
						},
					},
				},
			},
		})
	}
	for idx, it := range customRegexes {
		customInfoTypes = append(customInfoTypes, &dlppb.CustomInfoType{
			InfoType: &dlppb.InfoType{
				Name: fmt.Sprintf("CUSTOM_REGEX_%d", idx),
			},
			Type: &dlppb.CustomInfoType_Regex_{
				Regex: &dlppb.CustomInfoType_Regex{
					Pattern: it,
				},
			},
		})
	}

	ctx := context.Background()

	// Create a PubSub Client used to listen for when the inspect job finishes.
	pClient, err := pubsub.NewClient(ctx, project)
	if err != nil {
		log.Fatalf("Error creating PubSub client: %v", err)
	}
	defer pClient.Close()

	// Create a PubSub subscription we can use to listen for messages.
	s, err := setupPubSub(ctx, pClient, project, pubSubTopic, pubSubSub)
	if err != nil {
		log.Fatalf("Error setting up PubSub: %v\n", err)
	}

	// topic is the PubSub topic string where messages should be sent.
	topic := "projects/" + project + "/topics/" + pubSubTopic

	// Create a configured request.
	req := &dlppb.CreateDlpJobRequest{
		Parent: "projects/" + project,
		Job: &dlppb.CreateDlpJobRequest_InspectJob{
			InspectJob: &dlppb.InspectJobConfig{
				// StorageConfig describes where to find the data.
				StorageConfig: &dlppb.StorageConfig{
					Type: &dlppb.StorageConfig_DatastoreOptions{
						DatastoreOptions: &dlppb.DatastoreOptions{
							PartitionId: &dlppb.PartitionId{
								ProjectId:   dataProject,
								NamespaceId: namespaceID,
							},
							Kind: &dlppb.KindExpression{
								Name: kind,
							},
						},
					},
				},
				// InspectConfig describes what fields to look for.
				InspectConfig: &dlppb.InspectConfig{
					InfoTypes:       i,
					CustomInfoTypes: customInfoTypes,
					MinLikelihood:   minLikelihood,
					Limits: &dlppb.InspectConfig_FindingLimits{
						MaxFindingsPerRequest: maxFindings,
					},
					IncludeQuote: includeQuote,
				},
				// Send a message to PubSub using Actions.
				Actions: []*dlppb.Action{
					{
						Action: &dlppb.Action_PubSub{
							PubSub: &dlppb.Action_PublishToPubSub{
								Topic: topic,
							},
						},
					},
				},
			},
		},
	}
	// Create the inspect job.
	j, err := client.CreateDlpJob(context.Background(), req)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Fprintf(w, "Created job: %v\n", j.GetName())

	// Wait for the inspect job to finish by waiting for a PubSub message.
	ctx, cancel := context.WithCancel(ctx)
	err = s.Receive(ctx, func(ctx context.Context, msg *pubsub.Message) {
		// If this is the wrong job, do not process the result.
		if msg.Attributes["DlpJobName"] != j.GetName() {
			msg.Nack()
			return
		}
		msg.Ack()
		resp, err := client.GetDlpJob(ctx, &dlppb.GetDlpJobRequest{
			Name: j.GetName(),
		})
		if err != nil {
			log.Fatalf("Error getting completed job: %v\n", err)
		}
		r := resp.GetInspectDetails().GetResult().GetInfoTypeStats()
		if len(r) == 0 {
			fmt.Fprintf(w, "No results")
		}
		for _, s := range r {
			fmt.Fprintf(w, "  Found %v instances of infoType %v\n", s.GetCount(), s.GetInfoType().GetName())
		}
		// Stop listening for more messages.
		cancel()
	})
	if err != nil {
		log.Fatalf("Error receiving from PubSub: %v\n", err)
	}
}

PHP

use Google\Cloud\Dlp\V2\DlpServiceClient;
use Google\Cloud\Dlp\V2\DatastoreOptions;
use Google\Cloud\Dlp\V2\InfoType;
use Google\Cloud\Dlp\V2\Action;
use Google\Cloud\Dlp\V2\Action\PublishToPubSub;
use Google\Cloud\Dlp\V2\InspectConfig;
use Google\Cloud\Dlp\V2\InspectJobConfig;
use Google\Cloud\Dlp\V2\KindExpression;
use Google\Cloud\Dlp\V2\PartitionId;
use Google\Cloud\Dlp\V2\StorageConfig;
use Google\Cloud\Dlp\V2\Likelihood;
use Google\Cloud\Dlp\V2\DlpJob\JobState;
use Google\Cloud\Dlp\V2\InspectConfig\FindingLimits;
use Google\Cloud\PubSub\PubSubClient;

/**
 * Inspect Datastore, using Pub/Sub for job status notifications.
 *
 * @param string $callingProjectId The project ID to run the API call under
 * @param string $dataProjectId The project ID containing the target Datastore
 *        (This may or may not be equal to $callingProjectId)
 * @param string $topicId The name of the Pub/Sub topic to notify once the job completes
 * @param string $subscriptionId The name of the Pub/Sub subscription to use when listening for job
 * @param string $kind The datastore kind to inspect
 * @param string $namespaceId The ID namespace of the Datastore document to inspect
 * @param int $maxFindings (Optional) The maximum number of findings to report per request (0 = server maximum)
 */
function inspect_datastore(
    $callingProjectId,
    $dataProjectId,
    $topicId,
    $subscriptionId,
    $kind,
    $namespaceId,
    $maxFindings = 0
) {
    // Instantiate clients
    $dlp = new DlpServiceClient();
    $pubsub = new PubSubClient();
    $topic = $pubsub->topic($topicId);

    // The infoTypes of information to match
    $personNameInfoType = (new InfoType())
        ->setName('PERSON_NAME');
    $phoneNumberInfoType = (new InfoType())
        ->setName('PHONE_NUMBER');
    $infoTypes = [$personNameInfoType, $phoneNumberInfoType];

    // The minimum likelihood required before returning a match
    $minLikelihood = likelihood::LIKELIHOOD_UNSPECIFIED;

    // Specify finding limits
    $limits = (new FindingLimits())
        ->setMaxFindingsPerRequest($maxFindings);

    // Construct items to be inspected
    $partitionId = (new PartitionId())
        ->setProjectId($dataProjectId)
        ->setNamespaceId($namespaceId);

    $kindExpression = (new KindExpression())
        ->setName($kind);

    $datastoreOptions = (new DatastoreOptions())
        ->setPartitionId($partitionId)
        ->setKind($kindExpression);

    // Construct the inspect config object
    $inspectConfig = (new InspectConfig())
        ->setInfoTypes($infoTypes)
        ->setMinLikelihood($minLikelihood)
        ->setLimits($limits);

    // Construct the storage config object
    $storageConfig = (new StorageConfig())
        ->setDatastoreOptions($datastoreOptions);

    // Construct the action to run when job completes
    $pubSubAction = (new PublishToPubSub())
        ->setTopic($topic->name());

    $action = (new Action())
        ->setPubSub($pubSubAction);

    // Construct inspect job config to run
    $inspectJob = (new InspectJobConfig())
        ->setInspectConfig($inspectConfig)
        ->setStorageConfig($storageConfig)
        ->setActions([$action]);

    // Listen for job notifications via an existing topic/subscription.
    $subscription = $topic->subscription($subscriptionId);

    // Submit request
    $parent = $dlp->projectName($callingProjectId);
    $job = $dlp->createDlpJob($parent, [
        'inspectJob' => $inspectJob
    ]);

    // Poll via Pub/Sub until job finishes
    while (true) {
        foreach ($subscription->pull() as $message) {
            if (isset($message->attributes()['DlpJobName']) &&
                $message->attributes()['DlpJobName'] === $job->getName()) {
                $subscription->acknowledge($message);
                break 2;
            }
        }
    }

    // Sleep for one second to avoid race condition with the job's status.
    usleep(1000000);

    // Get the updated job
    $job = $dlp->getDlpJob($job->getName());

    // Print finding counts
    printf('Job %s status: %s' . PHP_EOL, $job->getName(), $job->getState());
    switch ($job->getState()) {
        case JobState::DONE:
            $infoTypeStats = $job->getInspectDetails()->getResult()->getInfoTypeStats();
            if (count($infoTypeStats) === 0) {
                print('No findings.' . PHP_EOL);
            } else {
                foreach ($infoTypeStats as $infoTypeStat) {
                    printf('  Found %s instance(s) of infoType %s' . PHP_EOL, $infoTypeStat->getCount(), $infoTypeStat->getInfoType()->getName());
                }
            }
            break;
        case JobState::FAILED:
            printf('Job %s had errors:' . PHP_EOL, $job->getName());
            $errors = $job->getErrors();
            foreach ($errors as $error) {
                var_dump($error->getDetails());
            }
            break;
        default:
            print('Unexpected job state. Most likely, the job is either running or has not yet started.');
    }
}

C#

public static object InspectCloudDataStore(
    string projectId,
    string minLikelihood,
    int maxFindings,
    bool includeQuote,
    string kindName,
    string namespaceId,
    IEnumerable<InfoType> infoTypes,
    IEnumerable<CustomInfoType> customInfoTypes,
    string datasetId,
    string tableId)
{
    var inspectJob = new InspectJobConfig
    {
        StorageConfig = new StorageConfig
        {
            DatastoreOptions = new DatastoreOptions
            {
                Kind = new KindExpression { Name = kindName },
                PartitionId = new PartitionId
                {
                    NamespaceId = namespaceId,
                    ProjectId = projectId,
                }
            },
            TimespanConfig = new StorageConfig.Types.TimespanConfig
            {
                StartTime = Timestamp.FromDateTime(System.DateTime.UtcNow.AddYears(-1)),
                EndTime = Timestamp.FromDateTime(System.DateTime.UtcNow)
            }
        },

        InspectConfig = new InspectConfig
        {
            InfoTypes = { infoTypes },
            CustomInfoTypes = { customInfoTypes },
            Limits = new FindingLimits
            {
                MaxFindingsPerRequest = maxFindings
            },
            ExcludeInfoTypes = false,
            IncludeQuote = includeQuote,
            MinLikelihood = (Likelihood)System.Enum.Parse(typeof(Likelihood), minLikelihood)
        },
        Actions =
        {
            new Google.Cloud.Dlp.V2.Action
            {
                // Save results in BigQuery Table
                SaveFindings = new Google.Cloud.Dlp.V2.Action.Types.SaveFindings
                {
                    OutputConfig = new OutputStorageConfig
                    {
                        Table = new Google.Cloud.Dlp.V2.BigQueryTable
                        {
                            ProjectId = projectId,
                            DatasetId = datasetId,
                            TableId = tableId
                        }
                    }
                },
            }
        }
    };

    // Issue Create Dlp Job Request
    DlpServiceClient client = DlpServiceClient.Create();
    var request = new CreateDlpJobRequest
    {
        InspectJob = inspectJob,
        ParentAsProjectName = new ProjectName(projectId),
    };

    // We need created job name
    var dlpJob = client.CreateDlpJob(request);
    var jobName = dlpJob.Name;

    // Make sure the job finishes before inspecting the results.
    // Alternatively, we can inspect results opportunistically, but
    // for testing purposes, we want consistent outcome
    bool jobFinished = EnsureJobFinishes(projectId, jobName);
    if (jobFinished)
    {
        var bigQueryClient = BigQueryClient.Create(projectId);
        var table = bigQueryClient.GetTable(datasetId, tableId);

        // Return only first page of 10 rows
        Console.WriteLine("DLP v2 Results:");
        var firstPage = table.ListRows(new ListRowsOptions { StartIndex = 0, PageSize = 10 });
        foreach (var item in firstPage)
        {
            Console.WriteLine($"\t {item[""]}");
        }
    }

    return 0;
}

BigQuery テーブルの検査

BigQuery テーブルの検査は、REST リクエストで Cloud DLP を使用するか、プログラムによりいくつかの言語でクライアント ライブラリを使用することにより、設定できます。

コードの例

次に、Cloud DLP API を使用して BigQuery テーブルを検査する方法を示す JSON とコードの例をいくつかの言語で示します。リクエストに含まれるパラメータの詳細については、このトピックの後半のストレージの検査の構成をご覧ください。

プロトコル

以下は、POST リクエストによって、指定された Cloud DLP API REST エンドポイントに送信できるサンプル JSON です。この例の JSON は、Cloud DLP API を使用して BigQuery テーブルを検査する方法を示します。リクエストに含まれるパラメータの詳細については、このトピックの後半のストレージの検査の構成をご覧ください。

projects.dlpJobs.create メソッドのリファレンス ページで API Explorer を使用すれば、これをすぐに試すことができます。API Explorer でもリクエストが成功すると、新しいスキャンジョブが作成されることに注意してください。スキャンジョブを制御する方法の詳細については、このトピックの後半の検査結果の取得を参照してください。JSON を使用して Cloud DLP API にリクエストを送信する方法については、JSON クイックスタートをご覧ください。

JSON 入力:

POST https://dlp.googleapis.com/v2/projects/[PROJECT_NAME]/dlpJobs?key={YOUR_API_KEY}

{
  "inspectJob":{
    "storageConfig":{
      "bigQueryOptions":{
        "tableReference":{
          "projectId":"[PROJECT_ID]",
          "datasetId":"[BIGQUERY-DATASET-NAME]",
          "tableId":"[BIGQUERY-TABLE-NAME]"
        },
        "identifyingFields":[
          {
            "name":"person.contactinfo"
          }
        ]
      },
      "timespanConfig":{
        "startTime":"2017-11-13T12:34:29.965633345Z ",
        "endTime":"2018-01-05T04:45:04.240912125Z "
      }
    },
    "inspectConfig":{
      "infoTypes":[
        {
          "name":"PHONE_NUMBER"
        }
      ],
      "excludeInfoTypes":false,
      "includeQuote":true,
      "minLikelihood":"LIKELY"
    },
    "actions":[
      {
        "saveFindings":{
          "outputConfig":{
            "table":{
              "projectId":"[PROJECT_ID]",
              "datasetId":"[BIGQUERY-DATASET-NAME]",
              "tableId":"[BIGQUERY-TABLE-NAME]"
            }
          },
          "outputSchema": "BASIC_COLUMNS"
        }
      }
    ]
  }
}

Java

/**
 * Inspect a BigQuery table
 *
 * @param projectId The project ID to run the API call under
 * @param datasetId The ID of the dataset to inspect, e.g. 'my_dataset'
 * @param tableId The ID of the table to inspect, e.g. 'my_table'
 * @param minLikelihood The minimum likelihood required before returning a match
 * @param infoTypes The infoTypes of information to match
 * @param maxFindings The maximum number of findings to report (0 = server maximum)
 * @param topicId Topic ID for pubsub.
 * @param subscriptionId Subscription ID for pubsub.
 */
private static void inspectBigquery(
    String projectId,
    String datasetId,
    String tableId,
    Likelihood minLikelihood,
    List<InfoType> infoTypes,
    List<CustomInfoType> customInfoTypes,
    int maxFindings,
    String topicId,
    String subscriptionId) {
  // Instantiates a client
  try (DlpServiceClient dlpServiceClient = DlpServiceClient.create()) {
    // Reference to the BigQuery table
    BigQueryTable tableReference =
        BigQueryTable.newBuilder()
            .setProjectId(projectId)
            .setDatasetId(datasetId)
            .setTableId(tableId)
            .build();
    BigQueryOptions bigQueryOptions =
        BigQueryOptions.newBuilder().setTableReference(tableReference).build();

    // Construct BigQuery configuration to be inspected
    StorageConfig storageConfig =
        StorageConfig.newBuilder().setBigQueryOptions(bigQueryOptions).build();

    FindingLimits findingLimits =
        FindingLimits.newBuilder().setMaxFindingsPerRequest(maxFindings).build();

    InspectConfig inspectConfig =
        InspectConfig.newBuilder()
            .addAllInfoTypes(infoTypes)
            .addAllCustomInfoTypes(customInfoTypes)
            .setMinLikelihood(minLikelihood)
            .setLimits(findingLimits)
            .build();

    ProjectTopicName topic = ProjectTopicName.of(projectId, topicId);
    Action.PublishToPubSub publishToPubSub =
        Action.PublishToPubSub.newBuilder().setTopic(topic.toString()).build();

    Action action = Action.newBuilder().setPubSub(publishToPubSub).build();

    InspectJobConfig inspectJobConfig =
        InspectJobConfig.newBuilder()
            .setStorageConfig(storageConfig)
            .setInspectConfig(inspectConfig)
            .addActions(action)
            .build();

    // Asynchronously submit an inspect job, and wait on results
    CreateDlpJobRequest createDlpJobRequest =
        CreateDlpJobRequest.newBuilder()
            .setParent(ProjectName.of(projectId).toString())
            .setInspectJob(inspectJobConfig)
            .build();

    DlpJob dlpJob = dlpServiceClient.createDlpJob(createDlpJobRequest);

    System.out.println("Job created with ID:" + dlpJob.getName());

    // Wait for job completion semi-synchronously
    // For long jobs, consider using a truly asynchronous execution model such as Cloud Functions
    final SettableApiFuture<Boolean> done = SettableApiFuture.create();

    // Set up a Pub/Sub subscriber to listen on the job completion status
    Subscriber subscriber =
        Subscriber.newBuilder(
                ProjectSubscriptionName.of(projectId, subscriptionId),
          (pubsubMessage, ackReplyConsumer) -> {
            if (pubsubMessage.getAttributesCount() > 0
                && pubsubMessage
                    .getAttributesMap()
                    .get("DlpJobName")
                    .equals(dlpJob.getName())) {
              // notify job completion
              done.set(true);
              ackReplyConsumer.ack();
            }
          })
            .build();
    subscriber.startAsync();

    try {
      done.get(1, TimeUnit.MINUTES);
      Thread.sleep(500); // Wait for the job to become available
    } catch (Exception e) {
      System.out.println("Unable to verify job completion.");
    }

    DlpJob completedJob =
        dlpServiceClient.getDlpJob(
            GetDlpJobRequest.newBuilder().setName(dlpJob.getName()).build());

    System.out.println("Job status: " + completedJob.getState());
    InspectDataSourceDetails inspectDataSourceDetails = completedJob.getInspectDetails();
    InspectDataSourceDetails.Result result = inspectDataSourceDetails.getResult();
    if (result.getInfoTypeStatsCount() > 0) {
      System.out.println("Findings: ");
      for (InfoTypeStats infoTypeStat : result.getInfoTypeStatsList()) {
        System.out.print("\tInfo type: " + infoTypeStat.getInfoType().getName());
        System.out.println("\tCount: " + infoTypeStat.getCount());
      }
    } else {
      System.out.println("No findings.");
    }
  } catch (Exception e) {
    System.out.println("inspectBigquery Problems: " + e.getMessage());
  }
}

Node.js

// Import the Google Cloud client libraries
const DLP = require('@google-cloud/dlp');
const {PubSub} = require('@google-cloud/pubsub');

// Instantiates clients
const dlp = new DLP.DlpServiceClient();
const pubsub = new PubSub();

// The project ID to run the API call under
// const callingProjectId = process.env.GCLOUD_PROJECT;

// The project ID the table is stored under
// This may or (for public datasets) may not equal the calling project ID
// const dataProjectId = process.env.GCLOUD_PROJECT;

// The ID of the dataset to inspect, e.g. 'my_dataset'
// const datasetId = 'my_dataset';

// The ID of the table to inspect, e.g. 'my_table'
// const tableId = 'my_table';

// The minimum likelihood required before returning a match
// const minLikelihood = 'LIKELIHOOD_UNSPECIFIED';

// The maximum number of findings to report per request (0 = server maximum)
// const maxFindings = 0;

// The infoTypes of information to match
// const infoTypes = [{ name: 'PHONE_NUMBER' }, { name: 'EMAIL_ADDRESS' }, { name: 'CREDIT_CARD_NUMBER' }];

// The customInfoTypes of information to match
// const customInfoTypes = [{ name: 'DICT_TYPE', dictionary: { wordList: { words: ['foo', 'bar', 'baz']}}},
//   { name: 'REGEX_TYPE', regex: '\\(\\d{3}\\) \\d{3}-\\d{4}'}];

// The name of the Pub/Sub topic to notify once the job completes
// TODO(developer): create a Pub/Sub topic to use for this
// const topicId = 'MY-PUBSUB-TOPIC'

// The name of the Pub/Sub subscription to use when listening for job
// completion notifications
// TODO(developer): create a Pub/Sub subscription to use for this
// const subscriptionId = 'MY-PUBSUB-SUBSCRIPTION'

// Construct item to be inspected
const storageItem = {
  bigQueryOptions: {
    tableReference: {
      projectId: dataProjectId,
      datasetId: datasetId,
      tableId: tableId,
    },
  },
};

// Construct request for creating an inspect job
const request = {
  parent: dlp.projectPath(callingProjectId),
  inspectJob: {
    inspectConfig: {
      infoTypes: infoTypes,
      customInfoTypes: customInfoTypes,
      minLikelihood: minLikelihood,
      limits: {
        maxFindingsPerRequest: maxFindings,
      },
    },
    storageConfig: storageItem,
    actions: [
      {
        pubSub: {
          topic: `projects/${callingProjectId}/topics/${topicId}`,
        },
      },
    ],
  },
};

try {
  // Run inspect-job creation request
  const [topicResponse] = await pubsub.topic(topicId).get();
  // Verify the Pub/Sub topic and listen for job notifications via an
  // existing subscription.
  const subscription = await topicResponse.subscription(subscriptionId);
  const [jobsResponse] = await dlp.createDlpJob(request);
  const jobName = jobsResponse.name;
  // Watch the Pub/Sub topic until the DLP job finishes
  await new Promise((resolve, reject) => {
    const messageHandler = message => {
      if (message.attributes && message.attributes.DlpJobName === jobName) {
        message.ack();
        subscription.removeListener('message', messageHandler);
        subscription.removeListener('error', errorHandler);
        resolve(jobName);
      } else {
        message.nack();
      }
    };

    const errorHandler = err => {
      subscription.removeListener('message', messageHandler);
      subscription.removeListener('error', errorHandler);
      reject(err);
    };

    subscription.on('message', messageHandler);
    subscription.on('error', errorHandler);
  });
  // Wait for DLP job to fully complete
  setTimeout(() => {
    console.log(`Waiting for DLP job to fully complete`);
  }, 500);
  const [job] = await dlp.getDlpJob({name: jobName});
  console.log(`Job ${job.name} status: ${job.state}`);

  const infoTypeStats = job.inspectDetails.result.infoTypeStats;
  if (infoTypeStats.length > 0) {
    infoTypeStats.forEach(infoTypeStat => {
      console.log(
        `  Found ${infoTypeStat.count} instance(s) of infoType ${
          infoTypeStat.infoType.name
        }.`
      );
    });
  } else {
    console.log(`No findings.`);
  }
} catch (err) {
  console.log(`Error in inspectBigquery: ${err.message || err}`);
}

Python

def inspect_bigquery(project, bigquery_project, dataset_id, table_id,
                     topic_id, subscription_id, info_types,
                     custom_dictionaries=None, custom_regexes=None,
                     min_likelihood=None, max_findings=None, timeout=300):
    """Uses the Data Loss Prevention API to analyze BigQuery data.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        bigquery_project: The Google Cloud project id of the target table.
        dataset_id: The id of the target BigQuery dataset.
        table_id: The id of the target BigQuery table.
        topic_id: The id of the Cloud Pub/Sub topic to which the API will
            broadcast job completion. The topic must already exist.
        subscription_id: The id of the Cloud Pub/Sub subscription to listen on
            while waiting for job completion. The subscription must already
            exist and be subscribed to the topic.
        info_types: A list of strings representing info types to look for.
            A full list of info type categories can be fetched from the API.
        namespace_id: The namespace of the Datastore document, if applicable.
        min_likelihood: A string representing the minimum likelihood threshold
            that constitutes a match. One of: 'LIKELIHOOD_UNSPECIFIED',
            'VERY_UNLIKELY', 'UNLIKELY', 'POSSIBLE', 'LIKELY', 'VERY_LIKELY'.
        max_findings: The maximum number of findings to report; 0 = no maximum.
        timeout: The number of seconds to wait for a response from the API.
    Returns:
        None; the response from the API is printed to the terminal.
    """

    # Import the client library.
    import google.cloud.dlp

    # This sample additionally uses Cloud Pub/Sub to receive results from
    # potentially long-running operations.
    import google.cloud.pubsub

    # This sample also uses threading.Event() to wait for the job to finish.
    import threading

    # Instantiate a client.
    dlp = google.cloud.dlp.DlpServiceClient()

    # Prepare info_types by converting the list of strings into a list of
    # dictionaries (protos are also accepted).
    if not info_types:
        info_types = ['FIRST_NAME', 'LAST_NAME', 'EMAIL_ADDRESS']
    info_types = [{'name': info_type} for info_type in info_types]

    # Prepare custom_info_types by parsing the dictionary word lists and
    # regex patterns.
    if custom_dictionaries is None:
        custom_dictionaries = []
    dictionaries = [{
        'info_type': {'name': 'CUSTOM_DICTIONARY_{}'.format(i)},
        'dictionary': {
            'word_list': {'words': custom_dict.split(',')}
        }
    } for i, custom_dict in enumerate(custom_dictionaries)]
    if custom_regexes is None:
        custom_regexes = []
    regexes = [{
        'info_type': {'name': 'CUSTOM_REGEX_{}'.format(i)},
        'regex': {'pattern': custom_regex}
    } for i, custom_regex in enumerate(custom_regexes)]
    custom_info_types = dictionaries + regexes

    # Construct the configuration dictionary. Keys which are None may
    # optionally be omitted entirely.
    inspect_config = {
        'info_types': info_types,
        'custom_info_types': custom_info_types,
        'min_likelihood': min_likelihood,
        'limits': {'max_findings_per_request': max_findings},
    }

    # Construct a storage_config containing the target Bigquery info.
    storage_config = {
        'big_query_options': {
            'table_reference': {
                'project_id': bigquery_project,
                'dataset_id': dataset_id,
                'table_id': table_id,
            }
        }
    }

    # Convert the project id into a full resource id.
    parent = dlp.project_path(project)

    # Tell the API where to send a notification when the job is complete.
    actions = [{
        'pub_sub': {'topic': '{}/topics/{}'.format(parent, topic_id)}
    }]

    # Construct the inspect_job, which defines the entire inspect content task.
    inspect_job = {
        'inspect_config': inspect_config,
        'storage_config': storage_config,
        'actions': actions,
    }

    operation = dlp.create_dlp_job(parent, inspect_job=inspect_job)

    # Create a Pub/Sub client and find the subscription. The subscription is
    # expected to already be listening to the topic.
    subscriber = google.cloud.pubsub.SubscriberClient()
    subscription_path = subscriber.subscription_path(
        project, subscription_id)

    # Set up a callback to acknowledge a message. This closes around an event
    # so that it can signal that it is done and the main thread can continue.
    job_done = threading.Event()

    def callback(message):
        try:
            if (message.attributes['DlpJobName'] == operation.name):
                # This is the message we're looking for, so acknowledge it.
                message.ack()

                # Now that the job is done, fetch the results and print them.
                job = dlp.get_dlp_job(operation.name)
                if job.inspect_details.result.info_type_stats:
                    for finding in job.inspect_details.result.info_type_stats:
                        print('Info type: {}; Count: {}'.format(
                            finding.info_type.name, finding.count))
                else:
                    print('No findings.')

                # Signal to the main thread that we can exit.
                job_done.set()
            else:
                # This is not the message we're looking for.
                message.drop()
        except Exception as e:
            # Because this is executing in a thread, an exception won't be
            # noted unless we print it manually.
            print(e)
            raise

    # Register the callback and wait on the event.
    subscriber.subscribe(subscription_path, callback=callback)
    finished = job_done.wait(timeout=timeout)
    if not finished:
        print('No event received before the timeout. Please verify that the '
              'subscription provided is subscribed to the topic provided.')

Go

// inspectBigquery searches for the given info types in the given Bigquery dataset table.
func inspectBigquery(w io.Writer, client *dlp.Client, project string, minLikelihood dlppb.Likelihood, maxFindings int32, includeQuote bool, infoTypes []string, customDictionaries []string, customRegexes []string, pubSubTopic, pubSubSub, dataProject, datasetID, tableID string) {
	// Convert the info type strings to a list of InfoTypes.
	var i []*dlppb.InfoType
	for _, it := range infoTypes {
		i = append(i, &dlppb.InfoType{Name: it})
	}
	// Convert the custom dictionary word lists and custom regexes to a list of CustomInfoTypes.
	var customInfoTypes []*dlppb.CustomInfoType
	for idx, it := range customDictionaries {
		customInfoTypes = append(customInfoTypes, &dlppb.CustomInfoType{
			InfoType: &dlppb.InfoType{
				Name: fmt.Sprintf("CUSTOM_DICTIONARY_%d", idx),
			},
			Type: &dlppb.CustomInfoType_Dictionary_{
				Dictionary: &dlppb.CustomInfoType_Dictionary{
					Source: &dlppb.CustomInfoType_Dictionary_WordList_{
						WordList: &dlppb.CustomInfoType_Dictionary_WordList{
							Words: strings.Split(it, ","),
						},
					},
				},
			},
		})
	}
	for idx, it := range customRegexes {
		customInfoTypes = append(customInfoTypes, &dlppb.CustomInfoType{
			InfoType: &dlppb.InfoType{
				Name: fmt.Sprintf("CUSTOM_REGEX_%d", idx),
			},
			Type: &dlppb.CustomInfoType_Regex_{
				Regex: &dlppb.CustomInfoType_Regex{
					Pattern: it,
				},
			},
		})
	}

	ctx := context.Background()

	// Create a PubSub Client used to listen for when the inspect job finishes.
	pClient, err := pubsub.NewClient(ctx, project)
	if err != nil {
		log.Fatalf("Error creating PubSub client: %v", err)
	}
	defer pClient.Close()

	// Create a PubSub subscription we can use to listen for messages.
	s, err := setupPubSub(ctx, pClient, project, pubSubTopic, pubSubSub)
	if err != nil {
		log.Fatalf("Error setting up PubSub: %v\n", err)
	}

	// topic is the PubSub topic string where messages should be sent.
	topic := "projects/" + project + "/topics/" + pubSubTopic

	// Create a configured request.
	req := &dlppb.CreateDlpJobRequest{
		Parent: "projects/" + project,
		Job: &dlppb.CreateDlpJobRequest_InspectJob{
			InspectJob: &dlppb.InspectJobConfig{
				// StorageConfig describes where to find the data.
				StorageConfig: &dlppb.StorageConfig{
					Type: &dlppb.StorageConfig_BigQueryOptions{
						BigQueryOptions: &dlppb.BigQueryOptions{
							TableReference: &dlppb.BigQueryTable{
								ProjectId: dataProject,
								DatasetId: datasetID,
								TableId:   tableID,
							},
						},
					},
				},
				// InspectConfig describes what fields to look for.
				InspectConfig: &dlppb.InspectConfig{
					InfoTypes:       i,
					CustomInfoTypes: customInfoTypes,
					MinLikelihood:   minLikelihood,
					Limits: &dlppb.InspectConfig_FindingLimits{
						MaxFindingsPerRequest: maxFindings,
					},
					IncludeQuote: includeQuote,
				},
				// Send a message to PubSub using Actions.
				Actions: []*dlppb.Action{
					{
						Action: &dlppb.Action_PubSub{
							PubSub: &dlppb.Action_PublishToPubSub{
								Topic: topic,
							},
						},
					},
				},
			},
		},
	}
	// Create the inspect job.
	j, err := client.CreateDlpJob(context.Background(), req)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Fprintf(w, "Created job: %v\n", j.GetName())

	// Wait for the inspect job to finish by waiting for a PubSub message.
	ctx, cancel := context.WithCancel(ctx)
	err = s.Receive(ctx, func(ctx context.Context, msg *pubsub.Message) {
		// If this is the wrong job, do not process the result.
		if msg.Attributes["DlpJobName"] != j.GetName() {
			msg.Nack()
			return
		}
		msg.Ack()
		resp, err := client.GetDlpJob(ctx, &dlppb.GetDlpJobRequest{
			Name: j.GetName(),
		})
		if err != nil {
			log.Fatalf("Error getting completed job: %v\n", err)
		}
		r := resp.GetInspectDetails().GetResult().GetInfoTypeStats()
		if len(r) == 0 {
			fmt.Fprintf(w, "No results")
		}
		for _, s := range r {
			fmt.Fprintf(w, "  Found %v instances of infoType %v\n", s.GetCount(), s.GetInfoType().GetName())
		}
		// Stop listening for more messages.
		cancel()
	})
	if err != nil {
		log.Fatalf("Error receiving from PubSub: %v\n", err)
	}
}

PHP

use Google\Cloud\Dlp\V2\DlpServiceClient;
use Google\Cloud\Dlp\V2\BigQueryOptions;
use Google\Cloud\Dlp\V2\InfoType;
use Google\Cloud\Dlp\V2\InspectConfig;
use Google\Cloud\Dlp\V2\StorageConfig;
use Google\Cloud\Dlp\V2\BigQueryTable;
use Google\Cloud\Dlp\V2\Likelihood;
use Google\Cloud\Dlp\V2\DlpJob\JobState;
use Google\Cloud\Dlp\V2\InspectConfig\FindingLimits;
use Google\Cloud\Dlp\V2\Action;
use Google\Cloud\Dlp\V2\Action\PublishToPubSub;
use Google\Cloud\Dlp\V2\InspectJobConfig;
use Google\Cloud\PubSub\PubSubClient;

/**
 * Inspect a BigQuery table , using Pub/Sub for job status notifications.
 *
 * @param string $callingProjectId The project ID to run the API call under
 * @param string $dataProjectId The project ID containing the target Datastore
 * @param string $topicId The name of the Pub/Sub topic to notify once the job completes
 * @param string $subscriptionId The name of the Pub/Sub subscription to use when listening for job
 * @param string $datasetId The ID of the dataset to inspect
 * @param string $tableId The ID of the table to inspect
 * @param int $maxFindings The maximum number of findings to report per request (0 = server maximum)
 */
function inspect_bigquery(
  $callingProjectId,
  $dataProjectId,
  $topicId,
  $subscriptionId,
  $datasetId,
  $tableId,
  $maxFindings = 0
) {
    // Instantiate a client.
    $dlp = new DlpServiceClient();
    $pubsub = new PubSubClient();
    $topic = $pubsub->topic($topicId);

    // The infoTypes of information to match
    $personNameInfoType = (new InfoType())
        ->setName('PERSON_NAME');
    $creditCardNumberInfoType = (new InfoType())
        ->setName('CREDIT_CARD_NUMBER');
    $infoTypes = [$personNameInfoType, $creditCardNumberInfoType];

    // The minimum likelihood required before returning a match
    $minLikelihood = likelihood::LIKELIHOOD_UNSPECIFIED;

    // Specify finding limits
    $limits = (new FindingLimits())
        ->setMaxFindingsPerRequest($maxFindings);

    // Construct items to be inspected
    $bigqueryTable = (new BigQueryTable())
        ->setProjectId($dataProjectId)
        ->setDatasetId($datasetId)
        ->setTableId($tableId);

    $bigQueryOptions = (new BigQueryOptions())
        ->setTableReference($bigqueryTable);

    $storageConfig = (new StorageConfig())
        ->setBigQueryOptions($bigQueryOptions);

    // Construct the inspect config object
    $inspectConfig = (new InspectConfig())
        ->setMinLikelihood($minLikelihood)
        ->setLimits($limits)
        ->setInfoTypes($infoTypes);

    // Construct the action to run when job completes
    $pubSubAction = (new PublishToPubSub())
        ->setTopic($topic->name());

    $action = (new Action())
        ->setPubSub($pubSubAction);

    // Construct inspect job config to run
    $inspectJob = (new InspectJobConfig())
        ->setInspectConfig($inspectConfig)
        ->setStorageConfig($storageConfig)
        ->setActions([$action]);

    // Listen for job notifications via an existing topic/subscription.
    $subscription = $topic->subscription($subscriptionId);

    // Submit request
    $parent = $dlp->projectName($callingProjectId);
    $job = $dlp->createDlpJob($parent, [
        'inspectJob' => $inspectJob
    ]);

    // Poll via Pub/Sub until job finishes
    while (true) {
        foreach ($subscription->pull() as $message) {
            if (isset($message->attributes()['DlpJobName']) &&
                $message->attributes()['DlpJobName'] === $job->getName()) {
                $subscription->acknowledge($message);
                break 2;
            }
        }
    }

    // Sleep for one second to avoid race condition with the job's status.
    usleep(1000000);

    // Get the updated job
    $job = $dlp->getDlpJob($job->getName());

    // Print finding counts
    printf('Job %s status: %s' . PHP_EOL, $job->getName(), $job->getState());
    switch ($job->getState()) {
        case JobState::DONE:
            $infoTypeStats = $job->getInspectDetails()->getResult()->getInfoTypeStats();
            if (count($infoTypeStats) === 0) {
                print('No findings.' . PHP_EOL);
            } else {
                foreach ($infoTypeStats as $infoTypeStat) {
                    printf(
                        '  Found %s instance(s) of infoType %s' . PHP_EOL,
                        $infoTypeStat->getCount(),
                        $infoTypeStat->getInfoType()->getName()
                    );
                }
            }
            break;
        case JobState::FAILED:
            printf('Job %s had errors:' . PHP_EOL, $job->getName());
            $errors = $job->getErrors();
            foreach ($errors as $error) {
                var_dump($error->getDetails());
            }
            break;
        default:
            printf('Unexpected job state. Most likely, the job is either running or has not yet started.');
    }
}

C#

public static object InspectBigQuery(
    string projectId,
    string minLikelihood,
    int maxFindings,
    bool includeQuote,
    IEnumerable<FieldId> identifyingFields,
    IEnumerable<InfoType> infoTypes,
    IEnumerable<CustomInfoType> customInfoTypes,
    string datasetId,
    string tableId)
{
    var inspectJob = new InspectJobConfig
    {
        StorageConfig = new StorageConfig
        {
            BigQueryOptions = new BigQueryOptions
            {
                TableReference = new Google.Cloud.Dlp.V2.BigQueryTable
                {
                    ProjectId = projectId,
                    DatasetId = datasetId,
                    TableId = tableId,
                },
                IdentifyingFields =
                {
                    identifyingFields
                }
            },

            TimespanConfig = new StorageConfig.Types.TimespanConfig
            {
                StartTime = Timestamp.FromDateTime(System.DateTime.UtcNow.AddYears(-1)),
                EndTime = Timestamp.FromDateTime(System.DateTime.UtcNow)
            }
        },

        InspectConfig = new InspectConfig
        {
            InfoTypes = { infoTypes },
            CustomInfoTypes = { customInfoTypes },
            Limits = new FindingLimits
            {
                MaxFindingsPerRequest = maxFindings
            },
            ExcludeInfoTypes = false,
            IncludeQuote = includeQuote,
            MinLikelihood = (Likelihood)System.Enum.Parse(typeof(Likelihood), minLikelihood)
        },
        Actions =
        {
            new Google.Cloud.Dlp.V2.Action
            {
                // Save results in BigQuery Table
                SaveFindings = new Google.Cloud.Dlp.V2.Action.Types.SaveFindings
                {
                    OutputConfig = new OutputStorageConfig
                    {
                        Table = new Google.Cloud.Dlp.V2.BigQueryTable
                        {
                            ProjectId = projectId,
                            DatasetId = datasetId,
                            TableId = tableId
                        }
                    }
                },
            }
        }
    };

    // Issue Create Dlp Job Request
    DlpServiceClient client = DlpServiceClient.Create();
    var request = new CreateDlpJobRequest
    {
        InspectJob = inspectJob,
        ParentAsProjectName = new ProjectName(projectId),
    };

    // We need created job name
    var dlpJob = client.CreateDlpJob(request);
    string jobName = dlpJob.Name;

    // Make sure the job finishes before inspecting the results.
    // Alternatively, we can inspect results opportunistically, but
    // for testing purposes, we want consistent outcome
    bool jobFinished = EnsureJobFinishes(projectId, jobName);
    if (jobFinished)
    {
        var bigQueryClient = BigQueryClient.Create(projectId);
        var table = bigQueryClient.GetTable(datasetId, tableId);

        // Return only first page of 10 rows
        Console.WriteLine("DLP v2 Results:");
        var firstPage = table.ListRows(new ListRowsOptions { StartIndex = 0, PageSize = 10 });
        foreach (var item in firstPage)
        {
            Console.WriteLine($"\t {item[""]}");
        }
    }

    return 0;
}

ストレージの検査の構成

Cloud Storage のロケーション、Cloud Datastore の種類、BigQuery のテーブルを検査するには、少なくともデータのロケーションとスキャン対象を指定して、Cloud DLP API の projects.dlpJobs.create メソッドにリクエストを送信します。これらの必須パラメータの他に、スキャン結果、サイズ、可能性のしきい値を書き込む場所を指定することもできます。リクエストが成功すると、DlpJob オブジェクトのインスタンスが作成されます。これについては、検査結果の取得で説明しています。

使用可能な構成オプションの要約を以下に示します。

  • InspectJobConfig オブジェクト: 検査ジョブの構成情報を指定します。InspectJobConfig オブジェクトは、JobTriggers オブジェクトで DlpJob の作成をスケジュールするためにも使用されることに注意してください。このオブジェクトには次のものが含まれます。

    • StorageConfig オブジェクト: 必須。スキャンするストレージ リポジトリに関する詳細を指定します。

      • スキャン対象のストレージ リポジトリのタイプに応じて、StorageConfig オブジェクトに次のいずれかを含める必要があります。

        • CloudStorageOptions オブジェクト: スキャンする Cloud Storage のバケットに関する情報を指定します。
        • DatastoreOptions オブジェクト: スキャンする Cloud Datastore のデータセットに関する情報を指定します。
        • BigQueryOptions オブジェクト: スキャンする BigQuery のテーブル(および識別フィールド(省略可))に関する情報を指定します。このオブジェクトでは、結果のサンプリングも有効になります。詳細については、後述の結果のサンプリングの有効化をご覧ください。
      • TimespanConfig オブジェクト: 省略可。スキャンに含める項目のタイムスパンを指定します。

    • InspectConfig オブジェクト: 必須。スキャンの対象(infoType可能性の値など)を指定します。

      • InfoType オブジェクト: 必須。スキャンする 1 つ以上の infoType 値。
      • Likelihood 列挙値。省略可。設定すると、Cloud DLP は可能性しきい値以上の結果のみを返します。この列挙値を省略した場合のデフォルト値は POSSIBLE です。
      • FindingLimits オブジェクト: 省略可。このオブジェクトを設定すると、戻される結果の数に対する制限を指定できます。
      • includeQuote パラメータ: 省略可。デフォルトは false です。true に設定すると、トリガーされたデータからのコンテキスト データが、各結果に含められます。
      • excludeInfoTypes パラメータ: 省略可。デフォルトは false です。true に設定すると、スキャン結果から結果のタイプ情報が除外されます。
      • CustomInfoType オブジェクト: ユーザー作成の 1 つ以上のカスタム infoType。カスタム infoType の作成の詳細については、カスタム infoType 検出器の作成をご覧ください。
    • inspectTemplateName 文字列: 省略可。InspectConfig オブジェクトにデフォルト値を入れるために使用するテンプレートを指定します。InspectConfig をすでに指定している場合、テンプレートの値はマージされます。

    • Action オブジェクト: 省略可。ジョブの完了時に実行される 1 つ以上のアクション。各アクションは、リストされている順序で実行されます。ここでは、結果を書き込む場所や、通知を Cloud Pub/Sub トピックに公開するかどうかを指定します。

  • jobId: 省略可。Cloud DLP によって返されるジョブの識別子。jobId が省略されるか空の場合、システムがジョブの ID を作成します。指定すると、ジョブにこの ID 値が割り当てられます。ジョブ ID は一意でなければならず、また大文字、小文字、数字、ハイフンを含めることができます。つまり、正規表現 [a-zA-Z\\d-]+ に一致する必要があります。

検査するコンテンツの量を制限する

BigQuery テーブルや Cloud Storage バケットをスキャンする場合、Cloud DLP ではデータセットの小規模なサブセットをスキャンできます。この方法では、データセット全体をスキャンするというコストは発生せず、一方でスキャン結果のサンプリングを行うことができます。

次のセクションでは、BigQuery スキャンCloud Storage スキャンのサイズを制限する方法について説明します。

BigQuery スキャンを制限する

スキャンするデータの量を制限することによって BigQuery でサンプリングを有効にするには、BigQueryOptions 内の次のオプション フィールドを指定します。

  • rowsLimit: スキャンする行の最大数。テーブルにこの値より多くの行がある場合、残りの行は省略されます。これを設定しない場合、または 0 に設定した場合、すべての行がスキャンされます。
  • sampleMethod: すべての行をスキャンしない場合に行をサンプリングする方法。未指定の場合、スキャンは上から開始されます。このフィールドは、次の 2 つの値のいずれかに設定できます。
    • TOP: 上からスキャンを開始します。
    • RANDOM_START: ランダムに選択された行からスキャンを開始します。

次の JSON の例では、Cloud DLP API を使用して BigQuery テーブルの 1000 行のサブセットをスキャンします。スキャンはランダムな行から開始されます。

JSON 入力:

POST https://dlp.googleapis.com/v2/projects/[PROJECT_NAME]/dlpJobs?key={YOUR_API_KEY}

{
  "inspectJob":{
    "storageConfig":{
      "bigQueryOptions":{
        "tableReference":{
          "projectId":"bigquery-public-data",
          "datasetId":"usa_names",
          "tableId":"usa_1910_current"
        },
        "rowsLimit":"1000",
        "sampleMethod":"RANDOM_START",
        "identifyingFields":[
          {
            "name":"name"
          }
        ]
      }
    },
    "inspectConfig":{
      "infoTypes":[
        {
          "name":"FIRST_NAME"
        }
      ],
      "includeQuote":true
    },
    "actions":[
      {
        "saveFindings":{
          "outputConfig":{
            "table":{
              "projectId":"[PROJECT_ID]",
              "datasetId":"testingdlp",
              "tableId":"bqsample3"
            },
            "outputSchema":"BASIC_COLUMNS"
          }
        }
      }
    ]
  }
}

指定した URL に POST リクエストの JSON 入力を送信した後、DLP ジョブが作成され、次の JSON レスポンスを受け取ります。

JSON 出力:

{
  "name":"projects/[PROJECT_ID]/dlpJobs/[JOB_ID]",
  "type":"INSPECT_JOB",
  "state":"PENDING",
  "inspectDetails":{
    "requestedOptions":{
      "snapshotInspectTemplate":{

      },
      "jobConfig":{
        "storageConfig":{
          "bigQueryOptions":{
            "tableReference":{
              "projectId":"bigquery-public-data",
              "datasetId":"usa_names",
              "tableId":"usa_1910_current"
            },
            "rowsLimit":"1000",
            "sampleMethod":"RANDOM_START"
          }
        },
        "inspectConfig":{
          "infoTypes":[
            {
              "name":"FIRST_NAME"
            }
          ],
          "minLikelihood":"POSSIBLE",
          "limits":{

          },
          "includeQuote":true
        },
        "actions":[
          {
            "saveFindings":{
              "outputConfig":{
                "table":{
                  "projectId":"[PROJECT_ID]",
                  "datasetId":"testingdlp",
                  "tableId":"bqsample3"
                },
                "outputSchema":"BASIC_COLUMNS"
              }
            }
          }
        ]
      }
    }
  },
  "createTime":"2018-05-25T21:02:50.655Z"
}

検査ジョブの実行が終了し、その結果が BigQuery によって処理されると、指定の BigQuery テーブルでスキャンの結果を使用できるようになります。検査結果の取得について詳しくは、次のセクションをご覧ください。

Cloud Storage スキャンを制限する

スキャンするデータの量を制限することで、Cloud Storage でサンプリングを有効にできます。特定のサイズ未満のファイルのみや、特定の種類のファイルのみ、そして入力ファイルセット内のファイルの総数の特定割合のみをスキャンするように Cloud DLP API に指示できます。これを行うには、CloudStorageOptions 内の次のオプション フィールドを指定します。

  • bytesLimitPerFile: ファイルからスキャンする最大バイト数を設定します。スキャンしたファイルのサイズがこの値より大きい場合、残りのバイトは省略されます。
  • fileTypes[]: スキャン対象とするファイル形式グループをリストします。次の FileType 列挙型を 1 つ以上設定できます。
    • FILE_TYPE_UNSPECIFIED: すべてのファイル。
    • BINARY_FILE: TEXT_FILE に含まれないすべてのファイル拡張子。
    • TEXT_FILE: 一部のテキスト ファイル形式。最新のリストについては、FileType をご覧ください。
  • filesLimitPercent: スキャンするファイルの数を入力の FileSet の特定の割合に制限します。0 または 100 を指定すると、制限は行われません。
  • sampleMethod: すべてのバイトをスキャンしない場合にバイトをサンプリングする方法。この値は、bytesLimitPerFile と組合せて使用する場合にのみ、指定する意味があります。未指定の場合、スキャンは上から開始されます。このフィールドは、次の 2 つの値のいずれかに設定できます。
    • TOP: 上からスキャンを開始します。
    • RANDOM_START: bytesLimitPerFile で指定されたサイズより大きい各ファイルについて、スキャンを開始するオフセットをランダムに選択します。スキャンされるバイトは連続しているバイトです。

次の JSON の例では、Cloud DLP API を使用して、人名が保存されている Cloud Storage バケットの 90% のサブセットをスキャンしています。スキャンはデータセット内のランダムな位置から開始され、200 バイト未満のテキスト ファイルのみ対象となります。

JSON 入力:

POST https://dlp.googleapis.com/v2/projects/[PROJECT_NAME]/dlpJobs?key={YOUR_API_KEY}

{
  "inspectJob":{
    "storageConfig":{
      "cloudStorageOptions":{
        "fileSet":{
          "url":"gs://[BUCKET-NAME]/*"
        },
        "bytesLimitPerFile":"200",
        "fileTypes":[
          "TEXT_FILE"
        ],
        "filesLimitPercent":90,
        "sampleMethod":"RANDOM_START"
      }
    },
    "inspectConfig":{
      "infoTypes":[
        {
          "name":"PERSON_NAME"
        }
      ],
      "excludeInfoTypes":true,
      "includeQuote":true,
      "minLikelihood":"POSSIBLE"
    },
    "actions":[
      {
        "saveFindings":{
          "outputConfig":{
            "table":{
              "projectId":"[PROJECT_ID]",
              "datasetId":"testingdlp"
            },
            "outputSchema":"BASIC_COLUMNS"
          }
        }
      }
    ]
  }
}

指定した URL に POST リクエストの JSON 入力を送信した後、DLP ジョブが作成され、次の JSON レスポンスを受け取ります。

JSON 出力:

{
  "name":"projects/[PROJECT_ID]/dlpJobs/[JOB_ID]",
  "type":"INSPECT_JOB",
  "state":"PENDING",
  "inspectDetails":{
    "requestedOptions":{
      "snapshotInspectTemplate":{

      },
      "jobConfig":{
        "storageConfig":{
          "cloudStorageOptions":{
            "fileSet":{
              "url":"gs://[BUCKET_NAME]/*"
            },
            "bytesLimitPerFile":"200",
            "fileTypes":[
              "TEXT_FILE"
            ],
            "sampleMethod":"TOP",
            "filesLimitPercent":90
          }
        },
        "inspectConfig":{
          "infoTypes":[
            {
              "name":"PERSON_NAME"
            }
          ],
          "minLikelihood":"POSSIBLE",
          "limits":{

          },
          "includeQuote":true,
          "excludeInfoTypes":true
        },
        "actions":[
          {
            "saveFindings":{
              "outputConfig":{
                "table":{
                  "projectId":"[PROJECT_ID]",
                  "datasetId":"[DATASET_ID]",
                  "tableId":"[TABLE_ID]"
                },
                "outputSchema":"BASIC_COLUMNS"
              }
            }
          }
        ]
      }
    }
  },
  "createTime":"2018-05-30T22:22:08.279Z"
}

検査結果の取得

projects.dlpJobs.get メソッドを使用して DlpJob の概要を取得できます。戻り値の DlpJob には InspectDataSourceDetails オブジェクトが含まれます。このオブジェクトには、ジョブの構成の概要(RequestedOptions)とジョブの結果の概要(Result)の両方が含まれます。結果の概要には以下が含まれます。

  • processedBytes: 処理されたバイトの合計サイズ。
  • totalEstimatedBytes: 処理する残りのバイト数の見積もり。
  • InfoTypeStatistics オブジェクト: 検査ジョブ中に見つかった各 infoType のインスタンス数の統計。

完全な検査ジョブの結果に関しては、2 つの選択肢があります。選択した Action に応じて、検査ジョブは以下のようになります。

  • 指定したテーブルの BigQuery(SaveFindings オブジェクト)に保存される。結果の表示や分析の前に、projects.dlpJobs.get メソッド(以下を参照)を使用して、ジョブが完了していることを確認してください。OutputSchema オブジェクトを使用して調査結果を保存するためのスキーマを指定できることに注意してください。
  • Cloud Pub/Sub トピック(PublishToPubSub オブジェクト)に公開される。このトピックでは、通知を送信する DlpJob を実行する Cloud DLP サービス アカウントに対して公開アクセス権を付与する必要があります。

Cloud DLP によって生成される大量のデータを選別するため、強力な SQL 分析を行う組み込み BigQuery ツールや、レポートを生成する Google データポータルなどのツールを使用できます。詳しくは、Cloud DLP の検出結果の分析と報告をご覧ください。一部のサンプルクエリについては、BigQuery で検出結果をクエリするを参照してください。

Cloud DLP にストレージ リポジトリの検査リクエストを送信すると、レスポンスとして DlpJob オブジェクトのインスタンスが作成され実行されます。このジョブの実行にはデータのサイズと指定した構成に応じて、数秒、数分、数時間かかることがあります。PublishToPubSubAction を指定することにより、Cloud Pub/Sub トピックへの公開を選択すると、ジョブ ステータスの変更時に、特定の名前を持つトピックに通知が自動的に送信されます。Cloud Pub/Sub トピックの名前は、projects/[PROJECT_ID]/topics/[PUBSUB-TOPIC-NAME] の形式で指定します。

作成したジョブを完全に制御できます。以下の管理メソッドが使用できます。

  • projects.dlpJobs.cancel メソッド: 現在進行中のジョブを停止します。サーバーは可能な限りジョブをキャンセルしますが、成功は保証されません。ジョブとその構成は、削除されるまで残ります。
  • projects.dlpJobs.delete メソッド: ジョブとその構成を削除します。
  • projects.dlpJobs.get メソッド: 1 つのジョブを取得し、そのステータス、構成、そして(ジョブが完了している場合は)結果の概要を返します。
  • projects.dlpJobs.list メソッド: すべてのジョブのリストを取得します。結果をフィルタリングする機能が含まれます。
このページは役立ちましたか?評価をお願いいたします。

フィードバックを送信...

Cloud Data Loss Prevention