Inspecciona el almacenamiento y las bases de datos en busca de datos sensibles

La administración de forma adecuada de datos sensibles almacenados en un repositorio de almacenamiento comienza con la clasificación de almacenamiento: identifica dónde están los datos sensibles en el repositorio, de qué tipo de datos sensibles se trata y cómo se usan. Esta información te ayuda a establecer de manera correcta el control de acceso y los permisos para compartir y puede ser parte de un plan de supervisión constante.

Cloud Data Loss Prevention (DLP) puede detectar y clasificar datos sensibles almacenados en una ubicación de Cloud Storage del tipo de Cloud Datastore o en tablas de BigQuery. En la página de referencia de la API para FileType se encuentra disponible una lista de extensiones de archivos de los tipos de archivos que Cloud DLP puede analizar. Los tipos de archivos que no se reconocen se analizan como archivos binarios.

En lugar de transmitir los datos textuales directo a la API, especifica la ubicación en tu solicitud, junto con información sobre la configuración. Cloud DLP inicia un trabajo que inspecciona los datos de una ubicación dada y, después, pone a disposición los detalles sobre los Infotipos que se encuentran en el contenido, los valores likelihood y demás.

La API de Cloud DLP es RESTful. También puedes interactuar con ella de manera programática mediante una biblioteca cliente de Cloud DLP en uno de varios lenguajes.

En este tema se incluye lo siguiente:

  • JSON de ejemplo por cada tipo de repositorio de almacenamiento en Google Cloud Platform (Cloud Storage, Cloud Datastore y BigQuery) y muestras de códigos en varios lenguajes de programación
  • Una descripción detallada de las opciones de configuración para los trabajos de análisis
  • Instrucciones sobre cómo recuperar los resultados de los análisis y cómo administrar los trabajos de análisis que se crean en cada solicitud exitosa

Inspecciona una ubicación de Cloud Storage

Puedes configurar una inspección de una ubicación de Cloud Storage con Cloud DLP a través de solicitudes REST o de manera programática en varios lenguajes con una biblioteca cliente.

Ejemplos de código

A continuación, se muestra un ejemplo de JSON y de código en varios lenguajes que demuestran cómo usar Cloud DLP para inspeccionar ubicaciones de Cloud Storage. Si quieres obtener información sobre los parámetros incluidos con la solicitud, consulta Configura la inspección de almacenamiento más adelante en este tema.

Protocolo

A continuación, hay una muestra JSON que se puede enviar en una solicitud POST al extremo REST especificado de Cloud DLP. En este JSON de ejemplo, se muestra cómo usar la API de Cloud DLP para inspeccionar los depósitos de Cloud Storage. Si quieres obtener información sobre los parámetros incluidos con la solicitud, consulta Configura la inspección de almacenamiento más adelante en este tema.

Para intentar realizar esto con rapidez, puedes usar el Explorador de API en la página de referencia del método projects.dlpJobs.create. Ten en cuenta que una solicitud exitosa, incluso en el Explorador de API, creará un trabajo de análisis nuevo. Si quieres obtener más información sobre cómo controlar los trabajos de análisis, consulta Recupera los resultados de inspección más adelante en este tema. Si quieres obtener información general sobre el uso de JSON para enviar solicitudes a la API de Cloud DLP, consulta la Guía de inicio rápido de JSON.

Entrada de JSON:

POST https://dlp.googleapis.com/v2/projects/[PROJECT_NAME]/dlpJobs?key={YOUR_API_KEY}

{
  "inspectJob":{
    "storageConfig":{
      "cloudStorageOptions":{
        "fileSet":{
          "url":"gs://[GCS_BUCKET_NAME]/*"
        },
        "bytesLimitPerFile":"1073741824"
      },
      "timespanConfig":{
        "startTime":"2017-11-13T12:34:29.965633345Z",
        "endTime":"2018-01-05T04:45:04.240912125Z"
      }
    },
    "inspectConfig":{
      "infoTypes":[
        {
          "name":"PHONE_NUMBER"
        }
      ],
      "excludeInfoTypes":false,
      "includeQuote":true,
      "minLikelihood":"LIKELY"
    },
    "actions":[
      {
        "saveFindings":{
          "outputConfig":{
            "table":{
              "projectId":"[PROJECT_ID]",
              "datasetId":"[DATASET_ID]"
            }
          }
        }
      }
    ]
  }
}

Resultado de JSON:

{
  "name":"projects/[PROJECT_ID]/dlpJobs/i-2304647377058311040",
  "type":"INSPECT_JOB",
  "state":"PENDING",
  "inspectDetails":{
    "requestedOptions":{
      "snapshotInspectTemplate":{

      },
      "jobConfig":{
        "storageConfig":{
          "cloudStorageOptions":{
            "fileSet":{
              "url":"gs://[GCS_BUCKET_NAME]/*"
            },
            "bytesLimitPerFile":"1073741824"
          },
          "timespanConfig":{
            "startTime":"2017-11-13T12:34:29.965633345Z",
            "endTime":"2018-01-05T04:45:04.240912125Z"
          }
        },
        "inspectConfig":{
          "infoTypes":[
            {
              "name":"PHONE_NUMBER"
            }
          ],
          "minLikelihood":"LIKELY",
          "limits":{

          },
          "includeQuote":true
        },
        "actions":[
          {
            "saveFindings":{
              "outputConfig":{
                "table":{
                  "projectId":"[PROJECT_ID]",
                  "datasetId":"[DATASET_ID]",
                  "tableId":"[NEW_TABLE_ID]"
                }
              }
            }
          }
        ]
      }
    }
  },
  "createTime":"2018-11-07T18:01:14.225Z"
}

Java

/**
 * Inspect GCS file for Info types and wait on job completion using Google Cloud Pub/Sub
 * notification
 *
 * @param bucketName The name of the bucket where the file resides.
 * @param fileName The path to the file within the bucket to inspect (can include wildcards, eg.
 *     my-image.*)
 * @param minLikelihood The minimum likelihood required before returning a match
 * @param infoTypes The infoTypes of information to match
 * @param maxFindings The maximum number of findings to report (0 = server maximum)
 * @param topicId Google Cloud Pub/Sub topic Id to notify of job status
 * @param subscriptionId Google Cloud Subscription to above topic to listen for job status updates
 * @param projectId Google Cloud project ID
 */
private static void inspectGcsFile(
    String bucketName,
    String fileName,
    Likelihood minLikelihood,
    List<InfoType> infoTypes,
    List<CustomInfoType> customInfoTypes,
    int maxFindings,
    String topicId,
    String subscriptionId,
    String projectId)
    throws Exception {
  // Instantiates a client
  try (DlpServiceClient dlpServiceClient = DlpServiceClient.create()) {

    CloudStorageOptions cloudStorageOptions =
        CloudStorageOptions.newBuilder()
            .setFileSet(
                CloudStorageOptions.FileSet.newBuilder()
                    .setUrl("gs://" + bucketName + "/" + fileName))
            .build();

    StorageConfig storageConfig =
        StorageConfig.newBuilder().setCloudStorageOptions(cloudStorageOptions).build();

    FindingLimits findingLimits =
        FindingLimits.newBuilder().setMaxFindingsPerRequest(maxFindings).build();

    InspectConfig inspectConfig =
        InspectConfig.newBuilder()
            .addAllInfoTypes(infoTypes)
            .addAllCustomInfoTypes(customInfoTypes)
            .setMinLikelihood(minLikelihood)
            .setLimits(findingLimits)
            .build();

    String pubSubTopic = String.format("projects/%s/topics/%s", projectId, topicId);
    Action.PublishToPubSub publishToPubSub =
        Action.PublishToPubSub.newBuilder().setTopic(pubSubTopic).build();

    Action action = Action.newBuilder().setPubSub(publishToPubSub).build();

    InspectJobConfig inspectJobConfig =
        InspectJobConfig.newBuilder()
            .setStorageConfig(storageConfig)
            .setInspectConfig(inspectConfig)
            .addActions(action)
            .build();

    // Semi-synchronously submit an inspect job, and wait on results
    CreateDlpJobRequest createDlpJobRequest =
        CreateDlpJobRequest.newBuilder()
            .setParent(ProjectName.of(projectId).toString())
            .setInspectJob(inspectJobConfig)
            .build();

    DlpJob dlpJob = dlpServiceClient.createDlpJob(createDlpJobRequest);

    System.out.println("Job created with ID:" + dlpJob.getName());

    final SettableApiFuture<Boolean> done = SettableApiFuture.create();

    // Set up a Pub/Sub subscriber to listen on the job completion status
    Subscriber subscriber =
        Subscriber.newBuilder(
                ProjectSubscriptionName.of(projectId, subscriptionId),
          (pubsubMessage, ackReplyConsumer) -> {
            if (pubsubMessage.getAttributesCount() > 0
                && pubsubMessage
                    .getAttributesMap()
                    .get("DlpJobName")
                    .equals(dlpJob.getName())) {
              // notify job completion
              done.set(true);
              ackReplyConsumer.ack();
            }
          })
            .build();
    subscriber.startAsync();

    // Wait for job completion semi-synchronously
    // For long jobs, consider using a truly asynchronous execution model such as Cloud Functions
    try {
      done.get(1, TimeUnit.MINUTES);
      Thread.sleep(500); // Wait for the job to become available
    } catch (Exception e) {
      System.out.println("Unable to verify job completion.");
    }

    DlpJob completedJob =
        dlpServiceClient.getDlpJob(
            GetDlpJobRequest.newBuilder().setName(dlpJob.getName()).build());

    System.out.println("Job status: " + completedJob.getState());
    InspectDataSourceDetails inspectDataSourceDetails = completedJob.getInspectDetails();
    InspectDataSourceDetails.Result result = inspectDataSourceDetails.getResult();
    if (result.getInfoTypeStatsCount() > 0) {
      System.out.println("Findings: ");
      for (InfoTypeStats infoTypeStat : result.getInfoTypeStatsList()) {
        System.out.print("\tInfo type: " + infoTypeStat.getInfoType().getName());
        System.out.println("\tCount: " + infoTypeStat.getCount());
      }
    } else {
      System.out.println("No findings.");
    }
  }
}

Node.js

// Import the Google Cloud client libraries
const DLP = require('@google-cloud/dlp');
const {PubSub} = require('@google-cloud/pubsub');

// Instantiates clients
const dlp = new DLP.DlpServiceClient();
const pubsub = new PubSub();

// The project ID to run the API call under
// const callingProjectId = process.env.GCLOUD_PROJECT;

// The name of the bucket where the file resides.
// const bucketName = 'YOUR-BUCKET';

// The path to the file within the bucket to inspect.
// Can contain wildcards, e.g. "my-image.*"
// const fileName = 'my-image.png';

// The minimum likelihood required before returning a match
// const minLikelihood = 'LIKELIHOOD_UNSPECIFIED';

// The maximum number of findings to report per request (0 = server maximum)
// const maxFindings = 0;

// The infoTypes of information to match
// const infoTypes = [{ name: 'PHONE_NUMBER' }, { name: 'EMAIL_ADDRESS' }, { name: 'CREDIT_CARD_NUMBER' }];

// The customInfoTypes of information to match
// const customInfoTypes = [{ name: 'DICT_TYPE', dictionary: { wordList: { words: ['foo', 'bar', 'baz']}}},
//   { name: 'REGEX_TYPE', regex: '\\(\\d{3}\\) \\d{3}-\\d{4}'}];

// The name of the Pub/Sub topic to notify once the job completes
// TODO(developer): create a Pub/Sub topic to use for this
// const topicId = 'MY-PUBSUB-TOPIC'

// The name of the Pub/Sub subscription to use when listening for job
// completion notifications
// TODO(developer): create a Pub/Sub subscription to use for this
// const subscriptionId = 'MY-PUBSUB-SUBSCRIPTION'

// Get reference to the file to be inspected
const storageItem = {
  cloudStorageOptions: {
    fileSet: {url: `gs://${bucketName}/${fileName}`},
  },
};

// Construct request for creating an inspect job
const request = {
  parent: dlp.projectPath(callingProjectId),
  inspectJob: {
    inspectConfig: {
      infoTypes: infoTypes,
      customInfoTypes: customInfoTypes,
      minLikelihood: minLikelihood,
      limits: {
        maxFindingsPerRequest: maxFindings,
      },
    },
    storageConfig: storageItem,
    actions: [
      {
        pubSub: {
          topic: `projects/${callingProjectId}/topics/${topicId}`,
        },
      },
    ],
  },
};

try {
  // Create a GCS File inspection job and wait for it to complete
  const [topicResponse] = await pubsub.topic(topicId).get();
  // Verify the Pub/Sub topic and listen for job notifications via an
  // existing subscription.
  const subscription = await topicResponse.subscription(subscriptionId);
  const [jobsResponse] = await dlp.createDlpJob(request);
  // Get the job's ID
  const jobName = jobsResponse.name;
  // Watch the Pub/Sub topic until the DLP job finishes
  await new Promise((resolve, reject) => {
    const messageHandler = message => {
      if (message.attributes && message.attributes.DlpJobName === jobName) {
        message.ack();
        subscription.removeListener('message', messageHandler);
        subscription.removeListener('error', errorHandler);
        resolve(jobName);
      } else {
        message.nack();
      }
    };

    const errorHandler = err => {
      subscription.removeListener('message', messageHandler);
      subscription.removeListener('error', errorHandler);
      reject(err);
    };

    subscription.on('message', messageHandler);
    subscription.on('error', errorHandler);
  });

  setTimeout(() => {
    console.log(`Waiting for DLP job to fully complete`);
  }, 500);
  const [job] = await dlp.getDlpJob({name: jobName});
  console.log(`Job ${job.name} status: ${job.state}`);

  const infoTypeStats = job.inspectDetails.result.infoTypeStats;
  if (infoTypeStats.length > 0) {
    infoTypeStats.forEach(infoTypeStat => {
      console.log(
        `  Found ${infoTypeStat.count} instance(s) of infoType ${
          infoTypeStat.infoType.name
        }.`
      );
    });
  } else {
    console.log(`No findings.`);
  }
} catch (err) {
  console.log(`Error in inspectGCSFile: ${err.message || err}`);
}

Python

def inspect_gcs_file(project, bucket, filename, topic_id, subscription_id,
                     info_types, custom_dictionaries=None,
                     custom_regexes=None, min_likelihood=None,
                     max_findings=None, timeout=300):
    """Uses the Data Loss Prevention API to analyze a file on GCS.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        bucket: The name of the GCS bucket containing the file, as a string.
        filename: The name of the file in the bucket, including the path, as a
            string; e.g. 'images/myfile.png'.
        topic_id: The id of the Cloud Pub/Sub topic to which the API will
            broadcast job completion. The topic must already exist.
        subscription_id: The id of the Cloud Pub/Sub subscription to listen on
            while waiting for job completion. The subscription must already
            exist and be subscribed to the topic.
        info_types: A list of strings representing info types to look for.
            A full list of info type categories can be fetched from the API.
        min_likelihood: A string representing the minimum likelihood threshold
            that constitutes a match. One of: 'LIKELIHOOD_UNSPECIFIED',
            'VERY_UNLIKELY', 'UNLIKELY', 'POSSIBLE', 'LIKELY', 'VERY_LIKELY'.
        max_findings: The maximum number of findings to report; 0 = no maximum.
        timeout: The number of seconds to wait for a response from the API.
    Returns:
        None; the response from the API is printed to the terminal.
    """

    # Import the client library.
    import google.cloud.dlp

    # This sample additionally uses Cloud Pub/Sub to receive results from
    # potentially long-running operations.
    import google.cloud.pubsub

    # This sample also uses threading.Event() to wait for the job to finish.
    import threading

    # Instantiate a client.
    dlp = google.cloud.dlp.DlpServiceClient()

    # Prepare info_types by converting the list of strings into a list of
    # dictionaries (protos are also accepted).
    if not info_types:
        info_types = ['FIRST_NAME', 'LAST_NAME', 'EMAIL_ADDRESS']
    info_types = [{'name': info_type} for info_type in info_types]

    # Prepare custom_info_types by parsing the dictionary word lists and
    # regex patterns.
    if custom_dictionaries is None:
        custom_dictionaries = []
    dictionaries = [{
        'info_type': {'name': 'CUSTOM_DICTIONARY_{}'.format(i)},
        'dictionary': {
            'word_list': {'words': custom_dict.split(',')}
        }
    } for i, custom_dict in enumerate(custom_dictionaries)]
    if custom_regexes is None:
        custom_regexes = []
    regexes = [{
        'info_type': {'name': 'CUSTOM_REGEX_{}'.format(i)},
        'regex': {'pattern': custom_regex}
    } for i, custom_regex in enumerate(custom_regexes)]
    custom_info_types = dictionaries + regexes

    # Construct the configuration dictionary. Keys which are None may
    # optionally be omitted entirely.
    inspect_config = {
        'info_types': info_types,
        'custom_info_types': custom_info_types,
        'min_likelihood': min_likelihood,
        'limits': {'max_findings_per_request': max_findings},
    }

    # Construct a storage_config containing the file's URL.
    url = 'gs://{}/{}'.format(bucket, filename)
    storage_config = {
        'cloud_storage_options': {
            'file_set': {'url': url}
            }
        }

    # Convert the project id into a full resource id.
    parent = dlp.project_path(project)

    # Tell the API where to send a notification when the job is complete.
    actions = [{
        'pub_sub': {'topic': '{}/topics/{}'.format(parent, topic_id)}
    }]

    # Construct the inspect_job, which defines the entire inspect content task.
    inspect_job = {
        'inspect_config': inspect_config,
        'storage_config': storage_config,
        'actions': actions,
    }

    operation = dlp.create_dlp_job(parent, inspect_job=inspect_job)

    # Create a Pub/Sub client and find the subscription. The subscription is
    # expected to already be listening to the topic.
    subscriber = google.cloud.pubsub.SubscriberClient()
    subscription_path = subscriber.subscription_path(
        project, subscription_id)
    subscription = subscriber.subscribe(subscription_path)

    # Set up a callback to acknowledge a message. This closes around an event
    # so that it can signal that it is done and the main thread can continue.
    job_done = threading.Event()

    def callback(message):
        try:
            if (message.attributes['DlpJobName'] == operation.name):
                # This is the message we're looking for, so acknowledge it.
                message.ack()

                # Now that the job is done, fetch the results and print them.
                job = dlp.get_dlp_job(operation.name)
                if job.inspect_details.result.info_type_stats:
                    for finding in job.inspect_details.result.info_type_stats:
                        print('Info type: {}; Count: {}'.format(
                            finding.info_type.name, finding.count))
                else:
                    print('No findings.')

                # Signal to the main thread that we can exit.
                job_done.set()
            else:
                # This is not the message we're looking for.
                message.drop()
        except Exception as e:
            # Because this is executing in a thread, an exception won't be
            # noted unless we print it manually.
            print(e)
            raise

    # Register the callback and wait on the event.
    subscription.open(callback)
    finished = job_done.wait(timeout=timeout)
    if not finished:
        print('No event received before the timeout. Please verify that the '
              'subscription provided is subscribed to the topic provided.')

Go

// inspectGCSFile searches for the given info types in the given file.
func inspectGCSFile(w io.Writer, client *dlp.Client, project string, minLikelihood dlppb.Likelihood, maxFindings int32, includeQuote bool, infoTypes []string, customDictionaries []string, customRegexes []string, pubSubTopic, pubSubSub, bucketName, fileName string) {
	// Convert the info type strings to a list of InfoTypes.
	var i []*dlppb.InfoType
	for _, it := range infoTypes {
		i = append(i, &dlppb.InfoType{Name: it})
	}
	// Convert the custom dictionary word lists and custom regexes to a list of CustomInfoTypes.
	var customInfoTypes []*dlppb.CustomInfoType
	for idx, it := range customDictionaries {
		customInfoTypes = append(customInfoTypes, &dlppb.CustomInfoType{
			InfoType: &dlppb.InfoType{
				Name: fmt.Sprintf("CUSTOM_DICTIONARY_%d", idx),
			},
			Type: &dlppb.CustomInfoType_Dictionary_{
				Dictionary: &dlppb.CustomInfoType_Dictionary{
					Source: &dlppb.CustomInfoType_Dictionary_WordList_{
						WordList: &dlppb.CustomInfoType_Dictionary_WordList{
							Words: strings.Split(it, ","),
						},
					},
				},
			},
		})
	}
	for idx, it := range customRegexes {
		customInfoTypes = append(customInfoTypes, &dlppb.CustomInfoType{
			InfoType: &dlppb.InfoType{
				Name: fmt.Sprintf("CUSTOM_REGEX_%d", idx),
			},
			Type: &dlppb.CustomInfoType_Regex_{
				Regex: &dlppb.CustomInfoType_Regex{
					Pattern: it,
				},
			},
		})
	}

	ctx := context.Background()

	// Create a PubSub Client used to listen for when the inspect job finishes.
	pClient, err := pubsub.NewClient(ctx, project)
	if err != nil {
		log.Fatalf("Error creating PubSub client: %v", err)
	}
	defer pClient.Close()

	// Create a PubSub subscription we can use to listen for messages.
	s, err := setupPubSub(ctx, pClient, project, pubSubTopic, pubSubSub)
	if err != nil {
		log.Fatalf("Error setting up PubSub: %v\n", err)
	}

	// topic is the PubSub topic string where messages should be sent.
	topic := "projects/" + project + "/topics/" + pubSubTopic

	// Create a configured request.
	req := &dlppb.CreateDlpJobRequest{
		Parent: "projects/" + project,
		Job: &dlppb.CreateDlpJobRequest_InspectJob{
			InspectJob: &dlppb.InspectJobConfig{
				// StorageConfig describes where to find the data.
				StorageConfig: &dlppb.StorageConfig{
					Type: &dlppb.StorageConfig_CloudStorageOptions{
						CloudStorageOptions: &dlppb.CloudStorageOptions{
							FileSet: &dlppb.CloudStorageOptions_FileSet{
								Url: "gs://" + bucketName + "/" + fileName,
							},
						},
					},
				},
				// InspectConfig describes what fields to look for.
				InspectConfig: &dlppb.InspectConfig{
					InfoTypes:       i,
					CustomInfoTypes: customInfoTypes,
					MinLikelihood:   minLikelihood,
					Limits: &dlppb.InspectConfig_FindingLimits{
						MaxFindingsPerRequest: maxFindings,
					},
					IncludeQuote: includeQuote,
				},
				// Send a message to PubSub using Actions.
				Actions: []*dlppb.Action{
					{
						Action: &dlppb.Action_PubSub{
							PubSub: &dlppb.Action_PublishToPubSub{
								Topic: topic,
							},
						},
					},
				},
			},
		},
	}
	// Create the inspect job.
	j, err := client.CreateDlpJob(context.Background(), req)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Fprintf(w, "Created job: %v\n", j.GetName())

	// Wait for the inspect job to finish by waiting for a PubSub message.
	ctx, cancel := context.WithCancel(ctx)
	err = s.Receive(ctx, func(ctx context.Context, msg *pubsub.Message) {
		// If this is the wrong job, do not process the result.
		if msg.Attributes["DlpJobName"] != j.GetName() {
			msg.Nack()
			return
		}
		msg.Ack()
		resp, err := client.GetDlpJob(ctx, &dlppb.GetDlpJobRequest{
			Name: j.GetName(),
		})
		if err != nil {
			log.Fatalf("Error getting completed job: %v\n", err)
		}
		r := resp.GetInspectDetails().GetResult().GetInfoTypeStats()
		if len(r) == 0 {
			fmt.Fprintf(w, "No results")
		}
		for _, s := range r {
			fmt.Fprintf(w, "  Found %v instances of infoType %v\n", s.GetCount(), s.GetInfoType().GetName())
		}
		// Stop listening for more messages.
		cancel()
	})
	if err != nil {
		log.Fatalf("Error receiving from PubSub: %v\n", err)
	}
}

PHP

use Google\Cloud\Dlp\V2\DlpServiceClient;
use Google\Cloud\Dlp\V2\CloudStorageOptions;
use Google\Cloud\Dlp\V2\CloudStorageOptions\FileSet;
use Google\Cloud\Dlp\V2\InfoType;
use Google\Cloud\Dlp\V2\InspectConfig;
use Google\Cloud\Dlp\V2\StorageConfig;
use Google\Cloud\Dlp\V2\Likelihood;
use Google\Cloud\Dlp\V2\DlpJob\JobState;
use Google\Cloud\Dlp\V2\InspectConfig\FindingLimits;
use Google\Cloud\Dlp\V2\Action;
use Google\Cloud\Dlp\V2\Action\PublishToPubSub;
use Google\Cloud\Dlp\V2\InspectJobConfig;
use Google\Cloud\PubSub\PubSubClient;

/**
 * Inspect a file stored on Google Cloud Storage , using Pub/Sub for job status notifications.
 *
 * @param string $callingProjectId The project ID to run the API call under
 * @param string $bucketId The name of the bucket where the file resides
 * @param string $file The path to the file within the bucket to inspect. Can contain wildcards
 *        e.g. "my-image.*"
 * @param string $topicId The name of the Pub/Sub topic to notify once the job completes
 * @param string $subscriptionId The name of the Pub/Sub subscription to use when listening for job
 * @param int $maxFindings (Optional) The maximum number of findings to report per request (0 = server maximum)
 */
function inspect_gcs(
    $callingProjectId,
    $bucketId,
    $file,
    $topicId,
    $subscriptionId,
    $maxFindings = 0
) {
    // Instantiate a client.
    $dlp = new DlpServiceClient([
        'projectId' => $callingProjectId,
    ]);
    $pubsub = new PubSubClient([
        'projectId' => $callingProjectId,
    ]);
    $topic = $pubsub->topic($topicId);

    // The infoTypes of information to match
    $personNameInfoType = (new InfoType())
        ->setName('PERSON_NAME');
    $creditCardNumberInfoType = (new InfoType())
        ->setName('CREDIT_CARD_NUMBER');
    $infoTypes = [$personNameInfoType, $creditCardNumberInfoType];

    // The minimum likelihood required before returning a match
    $minLikelihood = likelihood::LIKELIHOOD_UNSPECIFIED;

    // Specify finding limits
    $limits = (new FindingLimits())
        ->setMaxFindingsPerRequest($maxFindings);

    // Construct items to be inspected
    $fileSet = (new FileSet())
        ->setUrl('gs://' . $bucketId . '/' . $file);

    $cloudStorageOptions = (new CloudStorageOptions())
        ->setFileSet($fileSet);

    $storageConfig = (new StorageConfig())
        ->setCloudStorageOptions($cloudStorageOptions);

    // Construct the inspect config object
    $inspectConfig = (new InspectConfig())
        ->setMinLikelihood($minLikelihood)
        ->setLimits($limits)
        ->setInfoTypes($infoTypes);

    // Construct the action to run when job completes
    $pubSubAction = (new PublishToPubSub())
        ->setTopic($topic->name());

    $action = (new Action())
        ->setPubSub($pubSubAction);

    // Construct inspect job config to run
    $inspectJob = (new InspectJobConfig())
        ->setInspectConfig($inspectConfig)
        ->setStorageConfig($storageConfig)
        ->setActions([$action]);

    // Listen for job notifications via an existing topic/subscription.
    $subscription = $topic->subscription($subscriptionId);

    // Submit request
    $parent = $dlp->projectName($callingProjectId);
    $job = $dlp->createDlpJob($parent, [
        'inspectJob' => $inspectJob
    ]);

    // Poll via Pub/Sub until job finishes
    while (true) {
        foreach ($subscription->pull() as $message) {
            if (isset($message->attributes()['DlpJobName']) &&
                $message->attributes()['DlpJobName'] === $job->getName()) {
                $subscription->acknowledge($message);
                break 2;
            }
        }
    }

    // Sleep for half a second to avoid race condition with the job's status.
    usleep(500000);

    // Get the updated job
    $job = $dlp->getDlpJob($job->getName());

    // Print finding counts
    printf('Job %s status: %s' . PHP_EOL, $job->getName(), $job->getState());
    switch ($job->getState()) {
        case JobState::DONE:
            $infoTypeStats = $job->getInspectDetails()->getResult()->getInfoTypeStats();
            if (count($infoTypeStats) === 0) {
                print('No findings.' . PHP_EOL);
            } else {
                foreach ($infoTypeStats as $infoTypeStat) {
                    printf('  Found %s instance(s) of infoType %s' . PHP_EOL, $infoTypeStat->getCount(), $infoTypeStat->getInfoType()->getName());
                }
            }
            break;
        case JobState::FAILED:
            printf('Job %s had errors:' . PHP_EOL, $job->getName());
            $errors = $job->getErrors();
            foreach ($errors as $error) {
                var_dump($error->getDetails());
            }
            break;
        default:
            print('Unknown job state. Most likely, the job is either running or has not yet started.');
    }
}

C#

public static object InspectGCS(
    string projectId,
    string minLikelihood,
    int maxFindings,
    bool includeQuote,
    IEnumerable<InfoType> infoTypes,
    IEnumerable<CustomInfoType> customInfoTypes,
    string bucketName,
    string topicId,
    string subscriptionId)
{
    var inspectJob = new InspectJobConfig
    {
        StorageConfig = new StorageConfig
        {
            CloudStorageOptions = new CloudStorageOptions
            {
                FileSet = new CloudStorageOptions.Types.FileSet { Url = $"gs://{bucketName}/*.txt" },
                BytesLimitPerFile = 1073741824
            },
        },
        InspectConfig = new InspectConfig
        {
            InfoTypes = { infoTypes },
            CustomInfoTypes = { customInfoTypes },
            ExcludeInfoTypes = false,
            IncludeQuote = includeQuote,
            Limits = new FindingLimits
            {
                MaxFindingsPerRequest = maxFindings
            },
            MinLikelihood = (Likelihood)System.Enum.Parse(typeof(Likelihood), minLikelihood)
        },
        Actions =
        {
            new Google.Cloud.Dlp.V2.Action
            {
                // Send results to Pub/Sub topic
                PubSub = new Google.Cloud.Dlp.V2.Action.Types.PublishToPubSub
                {
                    Topic = topicId,
                }
            }
        }
    };

    // Issue Create Dlp Job Request
    DlpServiceClient client = DlpServiceClient.Create();
    var request = new CreateDlpJobRequest
    {
        InspectJob = inspectJob,
        ParentAsProjectName = new ProjectName(projectId),
    };

    // We need created job name
    var dlpJob = client.CreateDlpJob(request);

    // Get a pub/sub subscription and listen for DLP results
    var fireEvent = new ManualResetEventSlim();

    var subscriptionName = new SubscriptionName(projectId, subscriptionId);
    var subscriber = SubscriberClient.CreateAsync(subscriptionName).Result;
    subscriber.StartAsync(
        (pubSubMessage, cancellationToken) =>
        {
            // Given a message that we receive on this subscription, we should either acknowledge or decline it
            if (pubSubMessage.Attributes["DlpJobName"] == dlpJob.Name)
            {
                fireEvent.Set();
                return Task.FromResult(SubscriberClient.Reply.Ack);
            }

            return Task.FromResult(SubscriberClient.Reply.Nack);
        });

    // We block here until receiving a signal from a separate thread that is waiting on a message indicating receiving a result of Dlp job
    if (fireEvent.Wait(TimeSpan.FromMinutes(1)))
    {
        // Stop the thread that is listening to messages as a result of StartAsync call earlier
        subscriber.StopAsync(CancellationToken.None).Wait();

        // Now we can inspect full job results
        var job = client.GetDlpJob(new GetDlpJobRequest { DlpJobName = new DlpJobName(projectId, dlpJob.Name) });

        // Inspect Job details
        Console.WriteLine($"Processed bytes: {job.InspectDetails.Result.ProcessedBytes}");
        Console.WriteLine($"Total estimated bytes: {job.InspectDetails.Result.TotalEstimatedBytes}");
        var stats = job.InspectDetails.Result.InfoTypeStats;
        Console.WriteLine("Found stats:");
        foreach (var stat in stats)
        {
            Console.WriteLine($"{stat.InfoType.Name}");
        }
    }
    else
    {
        Console.WriteLine("Error: The wait failed on timeout");
    }

    return 0;
}

Inspecciona un tipo de Cloud Datastore

Puedes configurar una inspección de un tipo de Cloud Datastore con la API de Cloud DLP a través de solicitudes REST o de manera programática en varios lenguajes con una biblioteca cliente.

Ejemplos de código

A continuación, se muestra un ejemplo de JSON y de código en varios lenguajes que muestran cómo usar Cloud DLP para inspeccionar tipos de Cloud Datastore. Si quieres obtener información sobre los parámetros incluidos con la solicitud, consulta Configura la inspección de almacenamiento más adelante en este tema.

Protocolo

A continuación, hay un JSON de muestra que se puede enviar en una solicitud POST al extremo de REST especificado de la API de Cloud DLP. En este JSON de ejemplo, se muestra cómo usar la API de Cloud DLP para inspeccionar los tipos de Cloud Datastore. Si quieres obtener información sobre los parámetros incluidos con la solicitud, consulta Configura la inspección de almacenamiento más adelante en este tema.

Para intentar realizar esto con rapidez, puedes usar el Explorador de API en la página de referencia del método projects.dlpJobs.create. Ten en cuenta que una solicitud exitosa, incluso en el Explorador de API, creará un trabajo de análisis nuevo. Si quieres obtener más información sobre cómo controlar los trabajos de análisis, consulta Recupera los resultados de inspección más adelante en este tema. Si quieres obtener información general sobre el uso de JSON para enviar solicitudes a la API de Cloud DLP, consulta la Guía de inicio rápido de JSON.

Entrada de JSON:

POST https://dlp.googleapis.com/v2/projects/[PROJECT_NAME]/dlpJobs?key={YOUR_API_KEY}

{
  "inspectJob":{
    "storageConfig":{
      "datastoreOptions":{
        "kind":{
          "name":"Example-Kind"
        },
        "partitionId":{
          "namespaceId":"[NAMESPACE_ID]",
          "projectId":"[PROJECT_ID]"
        }
      }
    },
    "inspectConfig":{
      "infoTypes":[
        {
          "name":"PHONE_NUMBER"
        }
      ],
      "excludeInfoTypes":false,
      "includeQuote":true,
      "minLikelihood":"LIKELY"
    },
    "actions":[
      {
        "saveFindings":{
          "outputConfig":{
            "table":{
              "projectId":"[PROJECT_ID]",
              "datasetId":"[BIGQUERY-DATASET-NAME]",
              "tableId":"[BIGQUERY-TABLE-NAME]"
            }
          }
        }
      }
    ]
  }
}

Java

/**
 * Inspect a Datastore kind
 *
 * @param projectId The project ID containing the target Datastore
 * @param namespaceId The ID namespace of the Datastore document to inspect
 * @param kind The kind of the Datastore entity to inspect
 * @param minLikelihood The minimum likelihood required before returning a match
 * @param infoTypes The infoTypes of information to match
 * @param maxFindings max number of findings
 * @param topicId Google Cloud Pub/Sub topic to notify job status updates
 * @param subscriptionId Google Cloud Pub/Sub subscription to above topic to receive status
 *     updates
 */
private static void inspectDatastore(
    String projectId,
    String namespaceId,
    String kind,
    Likelihood minLikelihood,
    List<InfoType> infoTypes,
    List<CustomInfoType> customInfoTypes,
    int maxFindings,
    String topicId,
    String subscriptionId) {
  // Instantiates a client
  try (DlpServiceClient dlpServiceClient = DlpServiceClient.create()) {

    // Reference to the Datastore namespace
    PartitionId partitionId =
        PartitionId.newBuilder().setProjectId(projectId).setNamespaceId(namespaceId).build();

    // Reference to the Datastore kind
    KindExpression kindExpression = KindExpression.newBuilder().setName(kind).build();
    DatastoreOptions datastoreOptions =
        DatastoreOptions.newBuilder().setKind(kindExpression).setPartitionId(partitionId).build();

    // Construct Datastore configuration to be inspected
    StorageConfig storageConfig =
        StorageConfig.newBuilder().setDatastoreOptions(datastoreOptions).build();

    FindingLimits findingLimits =
        FindingLimits.newBuilder().setMaxFindingsPerRequest(maxFindings).build();

    InspectConfig inspectConfig =
        InspectConfig.newBuilder()
            .addAllInfoTypes(infoTypes)
            .addAllCustomInfoTypes(customInfoTypes)
            .setMinLikelihood(minLikelihood)
            .setLimits(findingLimits)
            .build();

    String pubSubTopic = String.format("projects/%s/topics/%s", projectId, topicId);
    Action.PublishToPubSub publishToPubSub =
        Action.PublishToPubSub.newBuilder().setTopic(pubSubTopic).build();

    Action action = Action.newBuilder().setPubSub(publishToPubSub).build();

    InspectJobConfig inspectJobConfig =
        InspectJobConfig.newBuilder()
            .setStorageConfig(storageConfig)
            .setInspectConfig(inspectConfig)
            .addActions(action)
            .build();

    // Asynchronously submit an inspect job, and wait on results
    CreateDlpJobRequest createDlpJobRequest =
        CreateDlpJobRequest.newBuilder()
            .setParent(ProjectName.of(projectId).toString())
            .setInspectJob(inspectJobConfig)
            .build();

    DlpJob dlpJob = dlpServiceClient.createDlpJob(createDlpJobRequest);

    System.out.println("Job created with ID:" + dlpJob.getName());

    final SettableApiFuture<Boolean> done = SettableApiFuture.create();

    // Set up a Pub/Sub subscriber to listen on the job completion status
    Subscriber subscriber =
        Subscriber.newBuilder(
                ProjectSubscriptionName.of(projectId, subscriptionId),
          (pubsubMessage, ackReplyConsumer) -> {
            if (pubsubMessage.getAttributesCount() > 0
                && pubsubMessage
                    .getAttributesMap()
                    .get("DlpJobName")
                    .equals(dlpJob.getName())) {
              // notify job completion
              done.set(true);
              ackReplyConsumer.ack();
            }
          })
            .build();
    subscriber.startAsync();

    // Wait for job completion semi-synchronously
    // For long jobs, consider using a truly asynchronous execution model such as Cloud Functions
    try {
      done.get(1, TimeUnit.MINUTES);
      Thread.sleep(500); // Wait for the job to become available
    } catch (Exception e) {
      System.out.println("Unable to verify job completion.");
    }

    DlpJob completedJob =
        dlpServiceClient.getDlpJob(
            GetDlpJobRequest.newBuilder().setName(dlpJob.getName()).build());

    System.out.println("Job status: " + completedJob.getState());
    InspectDataSourceDetails inspectDataSourceDetails = completedJob.getInspectDetails();
    InspectDataSourceDetails.Result result = inspectDataSourceDetails.getResult();
    if (result.getInfoTypeStatsCount() > 0) {
      System.out.println("Findings: ");
      for (InfoTypeStats infoTypeStat : result.getInfoTypeStatsList()) {
        System.out.print("\tInfo type: " + infoTypeStat.getInfoType().getName());
        System.out.println("\tCount: " + infoTypeStat.getCount());
      }
    } else {
      System.out.println("No findings.");
    }
  } catch (Exception e) {
    System.out.println("inspectDatastore Problems: " + e.getMessage());
  }
}

Node.js

// Import the Google Cloud client libraries
const DLP = require('@google-cloud/dlp');
const {PubSub} = require('@google-cloud/pubsub');

// Instantiates clients
const dlp = new DLP.DlpServiceClient();
const pubsub = new PubSub();

// The project ID to run the API call under
// const callingProjectId = process.env.GCLOUD_PROJECT;

// The project ID the target Datastore is stored under
// This may or may not equal the calling project ID
// const dataProjectId = process.env.GCLOUD_PROJECT;

// (Optional) The ID namespace of the Datastore document to inspect.
// To ignore Datastore namespaces, set this to an empty string ('')
// const namespaceId = '';

// The kind of the Datastore entity to inspect.
// const kind = 'Person';

// The minimum likelihood required before returning a match
// const minLikelihood = 'LIKELIHOOD_UNSPECIFIED';

// The maximum number of findings to report per request (0 = server maximum)
// const maxFindings = 0;

// The infoTypes of information to match
// const infoTypes = [{ name: 'PHONE_NUMBER' }, { name: 'EMAIL_ADDRESS' }, { name: 'CREDIT_CARD_NUMBER' }];

// The customInfoTypes of information to match
// const customInfoTypes = [{ name: 'DICT_TYPE', dictionary: { wordList: { words: ['foo', 'bar', 'baz']}}},
//   { name: 'REGEX_TYPE', regex: '\\(\\d{3}\\) \\d{3}-\\d{4}'}];

// The name of the Pub/Sub topic to notify once the job completes
// TODO(developer): create a Pub/Sub topic to use for this
// const topicId = 'MY-PUBSUB-TOPIC'

// The name of the Pub/Sub subscription to use when listening for job
// completion notifications
// TODO(developer): create a Pub/Sub subscription to use for this
// const subscriptionId = 'MY-PUBSUB-SUBSCRIPTION'

// Construct items to be inspected
const storageItems = {
  datastoreOptions: {
    partitionId: {
      projectId: dataProjectId,
      namespaceId: namespaceId,
    },
    kind: {
      name: kind,
    },
  },
};

// Construct request for creating an inspect job
const request = {
  parent: dlp.projectPath(callingProjectId),
  inspectJob: {
    inspectConfig: {
      infoTypes: infoTypes,
      customInfoTypes: customInfoTypes,
      minLikelihood: minLikelihood,
      limits: {
        maxFindingsPerRequest: maxFindings,
      },
    },
    storageConfig: storageItems,
    actions: [
      {
        pubSub: {
          topic: `projects/${callingProjectId}/topics/${topicId}`,
        },
      },
    ],
  },
};
try {
  // Run inspect-job creation request
  const [topicResponse] = await pubsub.topic(topicId).get();
  // Verify the Pub/Sub topic and listen for job notifications via an
  // existing subscription.
  const subscription = await topicResponse.subscription(subscriptionId);
  const [jobsResponse] = await dlp.createDlpJob(request);
  const jobName = jobsResponse.name;
  // Watch the Pub/Sub topic until the DLP job finishes
  await new Promise((resolve, reject) => {
    const messageHandler = message => {
      if (message.attributes && message.attributes.DlpJobName === jobName) {
        message.ack();
        subscription.removeListener('message', messageHandler);
        subscription.removeListener('error', errorHandler);
        resolve(jobName);
      } else {
        message.nack();
      }
    };

    const errorHandler = err => {
      subscription.removeListener('message', messageHandler);
      subscription.removeListener('error', errorHandler);
      reject(err);
    };

    subscription.on('message', messageHandler);
    subscription.on('error', errorHandler);
  });
  // Wait for DLP job to fully complete
  setTimeout(() => {
    console.log(`Waiting for DLP job to fully complete`);
  }, 500);
  const [job] = await dlp.getDlpJob({name: jobName});
  console.log(`Job ${job.name} status: ${job.state}`);

  const infoTypeStats = job.inspectDetails.result.infoTypeStats;
  if (infoTypeStats.length > 0) {
    infoTypeStats.forEach(infoTypeStat => {
      console.log(
        `  Found ${infoTypeStat.count} instance(s) of infoType ${
          infoTypeStat.infoType.name
        }.`
      );
    });
  } else {
    console.log(`No findings.`);
  }
} catch (err) {
  console.log(`Error in inspectDatastore: ${err.message || err}`);
}

Python

def inspect_datastore(project, datastore_project, kind,
                      topic_id, subscription_id, info_types,
                      custom_dictionaries=None, custom_regexes=None,
                      namespace_id=None, min_likelihood=None,
                      max_findings=None, timeout=300):
    """Uses the Data Loss Prevention API to analyze Datastore data.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        datastore_project: The Google Cloud project id of the target Datastore.
        kind: The kind of the Datastore entity to inspect, e.g. 'Person'.
        topic_id: The id of the Cloud Pub/Sub topic to which the API will
            broadcast job completion. The topic must already exist.
        subscription_id: The id of the Cloud Pub/Sub subscription to listen on
            while waiting for job completion. The subscription must already
            exist and be subscribed to the topic.
        info_types: A list of strings representing info types to look for.
            A full list of info type categories can be fetched from the API.
        namespace_id: The namespace of the Datastore document, if applicable.
        min_likelihood: A string representing the minimum likelihood threshold
            that constitutes a match. One of: 'LIKELIHOOD_UNSPECIFIED',
            'VERY_UNLIKELY', 'UNLIKELY', 'POSSIBLE', 'LIKELY', 'VERY_LIKELY'.
        max_findings: The maximum number of findings to report; 0 = no maximum.
        timeout: The number of seconds to wait for a response from the API.
    Returns:
        None; the response from the API is printed to the terminal.
    """

    # Import the client library.
    import google.cloud.dlp

    # This sample additionally uses Cloud Pub/Sub to receive results from
    # potentially long-running operations.
    import google.cloud.pubsub

    # This sample also uses threading.Event() to wait for the job to finish.
    import threading

    # Instantiate a client.
    dlp = google.cloud.dlp.DlpServiceClient()

    # Prepare info_types by converting the list of strings into a list of
    # dictionaries (protos are also accepted).
    if not info_types:
        info_types = ['FIRST_NAME', 'LAST_NAME', 'EMAIL_ADDRESS']
    info_types = [{'name': info_type} for info_type in info_types]

    # Prepare custom_info_types by parsing the dictionary word lists and
    # regex patterns.
    if custom_dictionaries is None:
        custom_dictionaries = []
    dictionaries = [{
        'info_type': {'name': 'CUSTOM_DICTIONARY_{}'.format(i)},
        'dictionary': {
            'word_list': {'words': custom_dict.split(',')}
        }
    } for i, custom_dict in enumerate(custom_dictionaries)]
    if custom_regexes is None:
        custom_regexes = []
    regexes = [{
        'info_type': {'name': 'CUSTOM_REGEX_{}'.format(i)},
        'regex': {'pattern': custom_regex}
    } for i, custom_regex in enumerate(custom_regexes)]
    custom_info_types = dictionaries + regexes

    # Construct the configuration dictionary. Keys which are None may
    # optionally be omitted entirely.
    inspect_config = {
        'info_types': info_types,
        'custom_info_types': custom_info_types,
        'min_likelihood': min_likelihood,
        'limits': {'max_findings_per_request': max_findings},
    }

    # Construct a storage_config containing the target Datastore info.
    storage_config = {
        'datastore_options': {
            'partition_id': {
                'project_id': datastore_project,
                'namespace_id': namespace_id,
            },
            'kind': {
                'name': kind
            },
        }
    }

    # Convert the project id into a full resource id.
    parent = dlp.project_path(project)

    # Tell the API where to send a notification when the job is complete.
    actions = [{
        'pub_sub': {'topic': '{}/topics/{}'.format(parent, topic_id)}
    }]

    # Construct the inspect_job, which defines the entire inspect content task.
    inspect_job = {
        'inspect_config': inspect_config,
        'storage_config': storage_config,
        'actions': actions,
    }

    operation = dlp.create_dlp_job(parent, inspect_job=inspect_job)

    # Create a Pub/Sub client and find the subscription. The subscription is
    # expected to already be listening to the topic.
    subscriber = google.cloud.pubsub.SubscriberClient()
    subscription_path = subscriber.subscription_path(
        project, subscription_id)
    subscription = subscriber.subscribe(subscription_path)

    # Set up a callback to acknowledge a message. This closes around an event
    # so that it can signal that it is done and the main thread can continue.
    job_done = threading.Event()

    def callback(message):
        try:
            if (message.attributes['DlpJobName'] == operation.name):
                # This is the message we're looking for, so acknowledge it.
                message.ack()

                # Now that the job is done, fetch the results and print them.
                job = dlp.get_dlp_job(operation.name)
                if job.inspect_details.result.info_type_stats:
                    for finding in job.inspect_details.result.info_type_stats:
                        print('Info type: {}; Count: {}'.format(
                            finding.info_type.name, finding.count))
                else:
                    print('No findings.')

                # Signal to the main thread that we can exit.
                job_done.set()
            else:
                # This is not the message we're looking for.
                message.drop()
        except Exception as e:
            # Because this is executing in a thread, an exception won't be
            # noted unless we print it manually.
            print(e)
            raise

    # Register the callback and wait on the event.
    subscription.open(callback)
    finished = job_done.wait(timeout=timeout)
    if not finished:
        print('No event received before the timeout. Please verify that the '
              'subscription provided is subscribed to the topic provided.')

Go

// inspectDatastore searches for the given info types in the given dataset kind.
func inspectDatastore(w io.Writer, client *dlp.Client, project string, minLikelihood dlppb.Likelihood, maxFindings int32, includeQuote bool, infoTypes []string, customDictionaries []string, customRegexes []string, pubSubTopic, pubSubSub, dataProject, namespaceID, kind string) {
	// Convert the info type strings to a list of InfoTypes.
	var i []*dlppb.InfoType
	for _, it := range infoTypes {
		i = append(i, &dlppb.InfoType{Name: it})
	}
	// Convert the custom dictionary word lists and custom regexes to a list of CustomInfoTypes.
	var customInfoTypes []*dlppb.CustomInfoType
	for idx, it := range customDictionaries {
		customInfoTypes = append(customInfoTypes, &dlppb.CustomInfoType{
			InfoType: &dlppb.InfoType{
				Name: fmt.Sprintf("CUSTOM_DICTIONARY_%d", idx),
			},
			Type: &dlppb.CustomInfoType_Dictionary_{
				Dictionary: &dlppb.CustomInfoType_Dictionary{
					Source: &dlppb.CustomInfoType_Dictionary_WordList_{
						WordList: &dlppb.CustomInfoType_Dictionary_WordList{
							Words: strings.Split(it, ","),
						},
					},
				},
			},
		})
	}
	for idx, it := range customRegexes {
		customInfoTypes = append(customInfoTypes, &dlppb.CustomInfoType{
			InfoType: &dlppb.InfoType{
				Name: fmt.Sprintf("CUSTOM_REGEX_%d", idx),
			},
			Type: &dlppb.CustomInfoType_Regex_{
				Regex: &dlppb.CustomInfoType_Regex{
					Pattern: it,
				},
			},
		})
	}

	ctx := context.Background()

	// Create a PubSub Client used to listen for when the inspect job finishes.
	pClient, err := pubsub.NewClient(ctx, project)
	if err != nil {
		log.Fatalf("Error creating PubSub client: %v", err)
	}
	defer pClient.Close()

	// Create a PubSub subscription we can use to listen for messages.
	s, err := setupPubSub(ctx, pClient, project, pubSubTopic, pubSubSub)
	if err != nil {
		log.Fatalf("Error setting up PubSub: %v\n", err)
	}

	// topic is the PubSub topic string where messages should be sent.
	topic := "projects/" + project + "/topics/" + pubSubTopic

	// Create a configured request.
	req := &dlppb.CreateDlpJobRequest{
		Parent: "projects/" + project,
		Job: &dlppb.CreateDlpJobRequest_InspectJob{
			InspectJob: &dlppb.InspectJobConfig{
				// StorageConfig describes where to find the data.
				StorageConfig: &dlppb.StorageConfig{
					Type: &dlppb.StorageConfig_DatastoreOptions{
						DatastoreOptions: &dlppb.DatastoreOptions{
							PartitionId: &dlppb.PartitionId{
								ProjectId:   dataProject,
								NamespaceId: namespaceID,
							},
							Kind: &dlppb.KindExpression{
								Name: kind,
							},
						},
					},
				},
				// InspectConfig describes what fields to look for.
				InspectConfig: &dlppb.InspectConfig{
					InfoTypes:       i,
					CustomInfoTypes: customInfoTypes,
					MinLikelihood:   minLikelihood,
					Limits: &dlppb.InspectConfig_FindingLimits{
						MaxFindingsPerRequest: maxFindings,
					},
					IncludeQuote: includeQuote,
				},
				// Send a message to PubSub using Actions.
				Actions: []*dlppb.Action{
					{
						Action: &dlppb.Action_PubSub{
							PubSub: &dlppb.Action_PublishToPubSub{
								Topic: topic,
							},
						},
					},
				},
			},
		},
	}
	// Create the inspect job.
	j, err := client.CreateDlpJob(context.Background(), req)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Fprintf(w, "Created job: %v\n", j.GetName())

	// Wait for the inspect job to finish by waiting for a PubSub message.
	ctx, cancel := context.WithCancel(ctx)
	err = s.Receive(ctx, func(ctx context.Context, msg *pubsub.Message) {
		// If this is the wrong job, do not process the result.
		if msg.Attributes["DlpJobName"] != j.GetName() {
			msg.Nack()
			return
		}
		msg.Ack()
		resp, err := client.GetDlpJob(ctx, &dlppb.GetDlpJobRequest{
			Name: j.GetName(),
		})
		if err != nil {
			log.Fatalf("Error getting completed job: %v\n", err)
		}
		r := resp.GetInspectDetails().GetResult().GetInfoTypeStats()
		if len(r) == 0 {
			fmt.Fprintf(w, "No results")
		}
		for _, s := range r {
			fmt.Fprintf(w, "  Found %v instances of infoType %v\n", s.GetCount(), s.GetInfoType().GetName())
		}
		// Stop listening for more messages.
		cancel()
	})
	if err != nil {
		log.Fatalf("Error receiving from PubSub: %v\n", err)
	}
}

PHP

use Google\Cloud\Dlp\V2\DlpServiceClient;
use Google\Cloud\Dlp\V2\DatastoreOptions;
use Google\Cloud\Dlp\V2\InfoType;
use Google\Cloud\Dlp\V2\Action;
use Google\Cloud\Dlp\V2\Action\PublishToPubSub;
use Google\Cloud\Dlp\V2\InspectConfig;
use Google\Cloud\Dlp\V2\InspectJobConfig;
use Google\Cloud\Dlp\V2\KindExpression;
use Google\Cloud\Dlp\V2\PartitionId;
use Google\Cloud\Dlp\V2\StorageConfig;
use Google\Cloud\Dlp\V2\Likelihood;
use Google\Cloud\Dlp\V2\DlpJob\JobState;
use Google\Cloud\Dlp\V2\InspectConfig\FindingLimits;
use Google\Cloud\PubSub\PubSubClient;

/**
 * Inspect Datastore, using Pub/Sub for job status notifications.
 *
 * @param string $callingProjectId The project ID to run the API call under
 * @param string $dataProjectId The project ID containing the target Datastore
 *        (This may or may not be equal to $callingProjectId)
 * @param string $topicId The name of the Pub/Sub topic to notify once the job completes
 * @param string $subscriptionId The name of the Pub/Sub subscription to use when listening for job
 * @param string $kind The datastore kind to inspect
 * @param string $namespaceId The ID namespace of the Datastore document to inspect
 * @param int $maxFindings (Optional) The maximum number of findings to report per request (0 = server maximum)
 */
function inspect_datastore(
    $callingProjectId,
    $dataProjectId,
    $topicId,
    $subscriptionId,
    $kind,
    $namespaceId,
    $maxFindings = 0
) {
    // Instantiate clients
    $dlp = new DlpServiceClient();
    $pubsub = new PubSubClient();
    $topic = $pubsub->topic($topicId);

    // The infoTypes of information to match
    $personNameInfoType = (new InfoType())
        ->setName('PERSON_NAME');
    $phoneNumberInfoType = (new InfoType())
        ->setName('PHONE_NUMBER');
    $infoTypes = [$personNameInfoType, $phoneNumberInfoType];

    // The minimum likelihood required before returning a match
    $minLikelihood = likelihood::LIKELIHOOD_UNSPECIFIED;

    // Specify finding limits
    $limits = (new FindingLimits())
        ->setMaxFindingsPerRequest($maxFindings);

    // Construct items to be inspected
    $partitionId = (new PartitionId())
        ->setProjectId($dataProjectId)
        ->setNamespaceId($namespaceId);

    $kindExpression = (new KindExpression())
        ->setName($kind);

    $datastoreOptions = (new DatastoreOptions())
        ->setPartitionId($partitionId)
        ->setKind($kindExpression);

    // Construct the inspect config object
    $inspectConfig = (new InspectConfig())
        ->setInfoTypes($infoTypes)
        ->setMinLikelihood($minLikelihood)
        ->setLimits($limits);

    // Construct the storage config object
    $storageConfig = (new StorageConfig())
        ->setDatastoreOptions($datastoreOptions);

    // Construct the action to run when job completes
    $pubSubAction = (new PublishToPubSub())
        ->setTopic($topic->name());

    $action = (new Action())
        ->setPubSub($pubSubAction);

    // Construct inspect job config to run
    $inspectJob = (new InspectJobConfig())
        ->setInspectConfig($inspectConfig)
        ->setStorageConfig($storageConfig)
        ->setActions([$action]);

    // Listen for job notifications via an existing topic/subscription.
    $subscription = $topic->subscription($subscriptionId);

    // Submit request
    $parent = $dlp->projectName($callingProjectId);
    $job = $dlp->createDlpJob($parent, [
        'inspectJob' => $inspectJob
    ]);

    // Poll via Pub/Sub until job finishes
    while (true) {
        foreach ($subscription->pull() as $message) {
            if (isset($message->attributes()['DlpJobName']) &&
                $message->attributes()['DlpJobName'] === $job->getName()) {
                $subscription->acknowledge($message);
                break 2;
            }
        }
    }

    // Sleep for half a second to avoid race condition with the job's status.
    usleep(500000);

    // Get the updated job
    $job = $dlp->getDlpJob($job->getName());

    // Print finding counts
    printf('Job %s status: %s' . PHP_EOL, $job->getName(), $job->getState());
    switch ($job->getState()) {
        case JobState::DONE:
            $infoTypeStats = $job->getInspectDetails()->getResult()->getInfoTypeStats();
            if (count($infoTypeStats) === 0) {
                print('No findings.' . PHP_EOL);
            } else {
                foreach ($infoTypeStats as $infoTypeStat) {
                    printf('  Found %s instance(s) of infoType %s' . PHP_EOL, $infoTypeStat->getCount(), $infoTypeStat->getInfoType()->getName());
                }
            }
            break;
        case JobState::FAILED:
            printf('Job %s had errors:' . PHP_EOL, $job->getName());
            $errors = $job->getErrors();
            foreach ($errors as $error) {
                var_dump($error->getDetails());
            }
            break;
        default:
            print('Unknown job state. Most likely, the job is either running or has not yet started.');
    }
}

C#

public static object InspectCloudDataStore(
    string projectId,
    string minLikelihood,
    int maxFindings,
    bool includeQuote,
    string kindName,
    string namespaceId,
    IEnumerable<InfoType> infoTypes,
    IEnumerable<CustomInfoType> customInfoTypes,
    string datasetId,
    string tableId)
{
    var inspectJob = new InspectJobConfig
    {
        StorageConfig = new StorageConfig
        {
            DatastoreOptions = new DatastoreOptions
            {
                Kind = new KindExpression { Name = kindName },
                PartitionId = new PartitionId
                {
                    NamespaceId = namespaceId,
                    ProjectId = projectId,
                }
            },
            TimespanConfig = new StorageConfig.Types.TimespanConfig
            {
                StartTime = Timestamp.FromDateTime(System.DateTime.UtcNow.AddYears(-1)),
                EndTime = Timestamp.FromDateTime(System.DateTime.UtcNow)
            }
        },

        InspectConfig = new InspectConfig
        {
            InfoTypes = { infoTypes },
            CustomInfoTypes = { customInfoTypes },
            Limits = new FindingLimits
            {
                MaxFindingsPerRequest = maxFindings
            },
            ExcludeInfoTypes = false,
            IncludeQuote = includeQuote,
            MinLikelihood = (Likelihood)System.Enum.Parse(typeof(Likelihood), minLikelihood)
        },
        Actions =
        {
            new Google.Cloud.Dlp.V2.Action
            {
                // Save results in BigQuery Table
                SaveFindings = new Google.Cloud.Dlp.V2.Action.Types.SaveFindings
                {
                    OutputConfig = new OutputStorageConfig
                    {
                        Table = new Google.Cloud.Dlp.V2.BigQueryTable
                        {
                            ProjectId = projectId,
                            DatasetId = datasetId,
                            TableId = tableId
                        }
                    }
                },
            }
        }
    };

    // Issue Create Dlp Job Request
    DlpServiceClient client = DlpServiceClient.Create();
    var request = new CreateDlpJobRequest
    {
        InspectJob = inspectJob,
        ParentAsProjectName = new ProjectName(projectId),
    };

    // We need created job name
    var dlpJob = client.CreateDlpJob(request);
    var jobName = dlpJob.Name;

    // Make sure the job finishes before inspecting the results.
    // Alternatively, we can inspect results opportunistically, but
    // for testing purposes, we want consistent outcome
    bool jobFinished = EnsureJobFinishes(projectId, jobName);
    if (jobFinished)
    {
        var bigQueryClient = BigQueryClient.Create(projectId);
        var table = bigQueryClient.GetTable(datasetId, tableId);

        // Return only first page of 10 rows
        Console.WriteLine("DLP v2 Results:");
        var firstPage = table.ListRows(new ListRowsOptions { StartIndex = 0, PageSize = 10 });
        foreach (var item in firstPage)
        {
            Console.WriteLine($"\t {item[""]}");
        }
    }

    return 0;
}

Inspecciona una tabla de BigQuery

Puedes configurar una inspección de una tabla de BigQuery con Cloud DLP a través de solicitudes REST o de manera programática en varios lenguajes con una biblioteca cliente.

Ejemplos de código

A continuación, se muestra un ejemplo de JSON y de código en varios lenguajes que muestran cómo usar la API de Cloud DLP para inspeccionar tablas de BigQuery. Si quieres obtener información sobre los parámetros incluidos con la solicitud, consulta Configura la inspección de almacenamiento más adelante en este tema.

Protocolo

A continuación, hay un JSON de muestra que se puede enviar en una solicitud POST al extremo de REST especificado de la API de Cloud DLP. En este JSON de ejemplo, se muestra cómo usar la API de Cloud DLP para inspeccionar tablas de BigQuery. Si quieres obtener información sobre los parámetros incluidos con la solicitud, consulta Configura la inspección de almacenamiento más adelante en este tema.

Para intentar realizar esto con rapidez, puedes usar el Explorador de API en la página de referencia del método projects.dlpJobs.create. Ten en cuenta que una solicitud exitosa, incluso en el Explorador de API, creará un trabajo de análisis nuevo. Si quieres obtener más información sobre cómo controlar los trabajos de análisis, consulta Recupera los resultados de inspección más adelante en este tema. Si quieres obtener información general sobre el uso de JSON para enviar solicitudes a la API de Cloud DLP, consulta la Guía de inicio rápido de JSON.

Entrada de JSON:

POST https://dlp.googleapis.com/v2/projects/[PROJECT_NAME]/dlpJobs?key={YOUR_API_KEY}

{
  "inspectJob":{
    "storageConfig":{
      "bigQueryOptions":{
        "tableReference":{
          "projectId":"[PROJECT_ID]",
          "datasetId":"[BIGQUERY-DATASET-NAME]",
          "tableId":"[BIGQUERY-TABLE-NAME]"
        },
        "identifyingFields":[
          {
            "name":"person.contactinfo"
          }
        ]
      },
      "timespanConfig":{
        "startTime":"2017-11-13T12:34:29.965633345Z ",
        "endTime":"2018-01-05T04:45:04.240912125Z "
      }
    },
    "inspectConfig":{
      "infoTypes":[
        {
          "name":"PHONE_NUMBER"
        }
      ],
      "excludeInfoTypes":false,
      "includeQuote":true,
      "minLikelihood":"LIKELY"
    },
    "actions":[
      {
        "saveFindings":{
          "outputConfig":{
            "table":{
              "projectId":"[PROJECT_ID]",
              "datasetId":"[BIGQUERY-DATASET-NAME]",
              "tableId":"[BIGQUERY-TABLE-NAME]"
            }
          },
          "outputSchema": "BASIC_COLUMNS"
        }
      }
    ]
  }
}

Java

/**
 * Inspect a BigQuery table
 *
 * @param projectId The project ID to run the API call under
 * @param datasetId The ID of the dataset to inspect, e.g. 'my_dataset'
 * @param tableId The ID of the table to inspect, e.g. 'my_table'
 * @param minLikelihood The minimum likelihood required before returning a match
 * @param infoTypes The infoTypes of information to match
 * @param maxFindings The maximum number of findings to report (0 = server maximum)
 * @param topicId Topic ID for pubsub.
 * @param subscriptionId Subscription ID for pubsub.
 */
private static void inspectBigquery(
    String projectId,
    String datasetId,
    String tableId,
    Likelihood minLikelihood,
    List<InfoType> infoTypes,
    List<CustomInfoType> customInfoTypes,
    int maxFindings,
    String topicId,
    String subscriptionId) {
  // Instantiates a client
  try (DlpServiceClient dlpServiceClient = DlpServiceClient.create()) {
    // Reference to the BigQuery table
    BigQueryTable tableReference =
        BigQueryTable.newBuilder()
            .setProjectId(projectId)
            .setDatasetId(datasetId)
            .setTableId(tableId)
            .build();
    BigQueryOptions bigQueryOptions =
        BigQueryOptions.newBuilder().setTableReference(tableReference).build();

    // Construct BigQuery configuration to be inspected
    StorageConfig storageConfig =
        StorageConfig.newBuilder().setBigQueryOptions(bigQueryOptions).build();

    FindingLimits findingLimits =
        FindingLimits.newBuilder().setMaxFindingsPerRequest(maxFindings).build();

    InspectConfig inspectConfig =
        InspectConfig.newBuilder()
            .addAllInfoTypes(infoTypes)
            .addAllCustomInfoTypes(customInfoTypes)
            .setMinLikelihood(minLikelihood)
            .setLimits(findingLimits)
            .build();

    ProjectTopicName topic = ProjectTopicName.of(projectId, topicId);
    Action.PublishToPubSub publishToPubSub =
        Action.PublishToPubSub.newBuilder().setTopic(topic.toString()).build();

    Action action = Action.newBuilder().setPubSub(publishToPubSub).build();

    InspectJobConfig inspectJobConfig =
        InspectJobConfig.newBuilder()
            .setStorageConfig(storageConfig)
            .setInspectConfig(inspectConfig)
            .addActions(action)
            .build();

    // Asynchronously submit an inspect job, and wait on results
    CreateDlpJobRequest createDlpJobRequest =
        CreateDlpJobRequest.newBuilder()
            .setParent(ProjectName.of(projectId).toString())
            .setInspectJob(inspectJobConfig)
            .build();

    DlpJob dlpJob = dlpServiceClient.createDlpJob(createDlpJobRequest);

    System.out.println("Job created with ID:" + dlpJob.getName());

    // Wait for job completion semi-synchronously
    // For long jobs, consider using a truly asynchronous execution model such as Cloud Functions
    final SettableApiFuture<Boolean> done = SettableApiFuture.create();

    // Set up a Pub/Sub subscriber to listen on the job completion status
    Subscriber subscriber =
        Subscriber.newBuilder(
                ProjectSubscriptionName.of(projectId, subscriptionId),
          (pubsubMessage, ackReplyConsumer) -> {
            if (pubsubMessage.getAttributesCount() > 0
                && pubsubMessage
                    .getAttributesMap()
                    .get("DlpJobName")
                    .equals(dlpJob.getName())) {
              // notify job completion
              done.set(true);
              ackReplyConsumer.ack();
            }
          })
            .build();
    subscriber.startAsync();

    try {
      done.get(1, TimeUnit.MINUTES);
      Thread.sleep(500); // Wait for the job to become available
    } catch (Exception e) {
      System.out.println("Unable to verify job completion.");
    }

    DlpJob completedJob =
        dlpServiceClient.getDlpJob(
            GetDlpJobRequest.newBuilder().setName(dlpJob.getName()).build());

    System.out.println("Job status: " + completedJob.getState());
    InspectDataSourceDetails inspectDataSourceDetails = completedJob.getInspectDetails();
    InspectDataSourceDetails.Result result = inspectDataSourceDetails.getResult();
    if (result.getInfoTypeStatsCount() > 0) {
      System.out.println("Findings: ");
      for (InfoTypeStats infoTypeStat : result.getInfoTypeStatsList()) {
        System.out.print("\tInfo type: " + infoTypeStat.getInfoType().getName());
        System.out.println("\tCount: " + infoTypeStat.getCount());
      }
    } else {
      System.out.println("No findings.");
    }
  } catch (Exception e) {
    System.out.println("inspectBigquery Problems: " + e.getMessage());
  }
}

Node.js

// Import the Google Cloud client libraries
const DLP = require('@google-cloud/dlp');
const {PubSub} = require('@google-cloud/pubsub');

// Instantiates clients
const dlp = new DLP.DlpServiceClient();
const pubsub = new PubSub();

// The project ID to run the API call under
// const callingProjectId = process.env.GCLOUD_PROJECT;

// The project ID the table is stored under
// This may or (for public datasets) may not equal the calling project ID
// const dataProjectId = process.env.GCLOUD_PROJECT;

// The ID of the dataset to inspect, e.g. 'my_dataset'
// const datasetId = 'my_dataset';

// The ID of the table to inspect, e.g. 'my_table'
// const tableId = 'my_table';

// The minimum likelihood required before returning a match
// const minLikelihood = 'LIKELIHOOD_UNSPECIFIED';

// The maximum number of findings to report per request (0 = server maximum)
// const maxFindings = 0;

// The infoTypes of information to match
// const infoTypes = [{ name: 'PHONE_NUMBER' }, { name: 'EMAIL_ADDRESS' }, { name: 'CREDIT_CARD_NUMBER' }];

// The customInfoTypes of information to match
// const customInfoTypes = [{ name: 'DICT_TYPE', dictionary: { wordList: { words: ['foo', 'bar', 'baz']}}},
//   { name: 'REGEX_TYPE', regex: '\\(\\d{3}\\) \\d{3}-\\d{4}'}];

// The name of the Pub/Sub topic to notify once the job completes
// TODO(developer): create a Pub/Sub topic to use for this
// const topicId = 'MY-PUBSUB-TOPIC'

// The name of the Pub/Sub subscription to use when listening for job
// completion notifications
// TODO(developer): create a Pub/Sub subscription to use for this
// const subscriptionId = 'MY-PUBSUB-SUBSCRIPTION'

// Construct item to be inspected
const storageItem = {
  bigQueryOptions: {
    tableReference: {
      projectId: dataProjectId,
      datasetId: datasetId,
      tableId: tableId,
    },
  },
};

// Construct request for creating an inspect job
const request = {
  parent: dlp.projectPath(callingProjectId),
  inspectJob: {
    inspectConfig: {
      infoTypes: infoTypes,
      customInfoTypes: customInfoTypes,
      minLikelihood: minLikelihood,
      limits: {
        maxFindingsPerRequest: maxFindings,
      },
    },
    storageConfig: storageItem,
    actions: [
      {
        pubSub: {
          topic: `projects/${callingProjectId}/topics/${topicId}`,
        },
      },
    ],
  },
};

try {
  // Run inspect-job creation request
  const [topicResponse] = await pubsub.topic(topicId).get();
  // Verify the Pub/Sub topic and listen for job notifications via an
  // existing subscription.
  const subscription = await topicResponse.subscription(subscriptionId);
  const [jobsResponse] = await dlp.createDlpJob(request);
  const jobName = jobsResponse.name;
  // Watch the Pub/Sub topic until the DLP job finishes
  await new Promise((resolve, reject) => {
    const messageHandler = message => {
      if (message.attributes && message.attributes.DlpJobName === jobName) {
        message.ack();
        subscription.removeListener('message', messageHandler);
        subscription.removeListener('error', errorHandler);
        resolve(jobName);
      } else {
        message.nack();
      }
    };

    const errorHandler = err => {
      subscription.removeListener('message', messageHandler);
      subscription.removeListener('error', errorHandler);
      reject(err);
    };

    subscription.on('message', messageHandler);
    subscription.on('error', errorHandler);
  });
  // Wait for DLP job to fully complete
  setTimeout(() => {
    console.log(`Waiting for DLP job to fully complete`);
  }, 500);
  const [job] = await dlp.getDlpJob({name: jobName});
  console.log(`Job ${job.name} status: ${job.state}`);

  const infoTypeStats = job.inspectDetails.result.infoTypeStats;
  if (infoTypeStats.length > 0) {
    infoTypeStats.forEach(infoTypeStat => {
      console.log(
        `  Found ${infoTypeStat.count} instance(s) of infoType ${
          infoTypeStat.infoType.name
        }.`
      );
    });
  } else {
    console.log(`No findings.`);
  }
} catch (err) {
  console.log(`Error in inspectBigquery: ${err.message || err}`);
}

Python

def inspect_bigquery(project, bigquery_project, dataset_id, table_id,
                     topic_id, subscription_id, info_types,
                     custom_dictionaries=None, custom_regexes=None,
                     min_likelihood=None, max_findings=None, timeout=300):
    """Uses the Data Loss Prevention API to analyze BigQuery data.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        bigquery_project: The Google Cloud project id of the target table.
        dataset_id: The id of the target BigQuery dataset.
        table_id: The id of the target BigQuery table.
        topic_id: The id of the Cloud Pub/Sub topic to which the API will
            broadcast job completion. The topic must already exist.
        subscription_id: The id of the Cloud Pub/Sub subscription to listen on
            while waiting for job completion. The subscription must already
            exist and be subscribed to the topic.
        info_types: A list of strings representing info types to look for.
            A full list of info type categories can be fetched from the API.
        namespace_id: The namespace of the Datastore document, if applicable.
        min_likelihood: A string representing the minimum likelihood threshold
            that constitutes a match. One of: 'LIKELIHOOD_UNSPECIFIED',
            'VERY_UNLIKELY', 'UNLIKELY', 'POSSIBLE', 'LIKELY', 'VERY_LIKELY'.
        max_findings: The maximum number of findings to report; 0 = no maximum.
        timeout: The number of seconds to wait for a response from the API.
    Returns:
        None; the response from the API is printed to the terminal.
    """

    # Import the client library.
    import google.cloud.dlp

    # This sample additionally uses Cloud Pub/Sub to receive results from
    # potentially long-running operations.
    import google.cloud.pubsub

    # This sample also uses threading.Event() to wait for the job to finish.
    import threading

    # Instantiate a client.
    dlp = google.cloud.dlp.DlpServiceClient()

    # Prepare info_types by converting the list of strings into a list of
    # dictionaries (protos are also accepted).
    if not info_types:
        info_types = ['FIRST_NAME', 'LAST_NAME', 'EMAIL_ADDRESS']
    info_types = [{'name': info_type} for info_type in info_types]

    # Prepare custom_info_types by parsing the dictionary word lists and
    # regex patterns.
    if custom_dictionaries is None:
        custom_dictionaries = []
    dictionaries = [{
        'info_type': {'name': 'CUSTOM_DICTIONARY_{}'.format(i)},
        'dictionary': {
            'word_list': {'words': custom_dict.split(',')}
        }
    } for i, custom_dict in enumerate(custom_dictionaries)]
    if custom_regexes is None:
        custom_regexes = []
    regexes = [{
        'info_type': {'name': 'CUSTOM_REGEX_{}'.format(i)},
        'regex': {'pattern': custom_regex}
    } for i, custom_regex in enumerate(custom_regexes)]
    custom_info_types = dictionaries + regexes

    # Construct the configuration dictionary. Keys which are None may
    # optionally be omitted entirely.
    inspect_config = {
        'info_types': info_types,
        'custom_info_types': custom_info_types,
        'min_likelihood': min_likelihood,
        'limits': {'max_findings_per_request': max_findings},
    }

    # Construct a storage_config containing the target Bigquery info.
    storage_config = {
        'big_query_options': {
            'table_reference': {
                'project_id': bigquery_project,
                'dataset_id': dataset_id,
                'table_id': table_id,
            }
        }
    }

    # Convert the project id into a full resource id.
    parent = dlp.project_path(project)

    # Tell the API where to send a notification when the job is complete.
    actions = [{
        'pub_sub': {'topic': '{}/topics/{}'.format(parent, topic_id)}
    }]

    # Construct the inspect_job, which defines the entire inspect content task.
    inspect_job = {
        'inspect_config': inspect_config,
        'storage_config': storage_config,
        'actions': actions,
    }

    operation = dlp.create_dlp_job(parent, inspect_job=inspect_job)

    # Create a Pub/Sub client and find the subscription. The subscription is
    # expected to already be listening to the topic.
    subscriber = google.cloud.pubsub.SubscriberClient()
    subscription_path = subscriber.subscription_path(
        project, subscription_id)
    subscription = subscriber.subscribe(subscription_path)

    # Set up a callback to acknowledge a message. This closes around an event
    # so that it can signal that it is done and the main thread can continue.
    job_done = threading.Event()

    def callback(message):
        try:
            if (message.attributes['DlpJobName'] == operation.name):
                # This is the message we're looking for, so acknowledge it.
                message.ack()

                # Now that the job is done, fetch the results and print them.
                job = dlp.get_dlp_job(operation.name)
                if job.inspect_details.result.info_type_stats:
                    for finding in job.inspect_details.result.info_type_stats:
                        print('Info type: {}; Count: {}'.format(
                            finding.info_type.name, finding.count))
                else:
                    print('No findings.')

                # Signal to the main thread that we can exit.
                job_done.set()
            else:
                # This is not the message we're looking for.
                message.drop()
        except Exception as e:
            # Because this is executing in a thread, an exception won't be
            # noted unless we print it manually.
            print(e)
            raise

    # Register the callback and wait on the event.
    subscription.open(callback)
    finished = job_done.wait(timeout=timeout)
    if not finished:
        print('No event received before the timeout. Please verify that the '
              'subscription provided is subscribed to the topic provided.')

Go

// inspectBigquery searches for the given info types in the given Bigquery dataset table.
func inspectBigquery(w io.Writer, client *dlp.Client, project string, minLikelihood dlppb.Likelihood, maxFindings int32, includeQuote bool, infoTypes []string, customDictionaries []string, customRegexes []string, pubSubTopic, pubSubSub, dataProject, datasetID, tableID string) {
	// Convert the info type strings to a list of InfoTypes.
	var i []*dlppb.InfoType
	for _, it := range infoTypes {
		i = append(i, &dlppb.InfoType{Name: it})
	}
	// Convert the custom dictionary word lists and custom regexes to a list of CustomInfoTypes.
	var customInfoTypes []*dlppb.CustomInfoType
	for idx, it := range customDictionaries {
		customInfoTypes = append(customInfoTypes, &dlppb.CustomInfoType{
			InfoType: &dlppb.InfoType{
				Name: fmt.Sprintf("CUSTOM_DICTIONARY_%d", idx),
			},
			Type: &dlppb.CustomInfoType_Dictionary_{
				Dictionary: &dlppb.CustomInfoType_Dictionary{
					Source: &dlppb.CustomInfoType_Dictionary_WordList_{
						WordList: &dlppb.CustomInfoType_Dictionary_WordList{
							Words: strings.Split(it, ","),
						},
					},
				},
			},
		})
	}
	for idx, it := range customRegexes {
		customInfoTypes = append(customInfoTypes, &dlppb.CustomInfoType{
			InfoType: &dlppb.InfoType{
				Name: fmt.Sprintf("CUSTOM_REGEX_%d", idx),
			},
			Type: &dlppb.CustomInfoType_Regex_{
				Regex: &dlppb.CustomInfoType_Regex{
					Pattern: it,
				},
			},
		})
	}

	ctx := context.Background()

	// Create a PubSub Client used to listen for when the inspect job finishes.
	pClient, err := pubsub.NewClient(ctx, project)
	if err != nil {
		log.Fatalf("Error creating PubSub client: %v", err)
	}
	defer pClient.Close()

	// Create a PubSub subscription we can use to listen for messages.
	s, err := setupPubSub(ctx, pClient, project, pubSubTopic, pubSubSub)
	if err != nil {
		log.Fatalf("Error setting up PubSub: %v\n", err)
	}

	// topic is the PubSub topic string where messages should be sent.
	topic := "projects/" + project + "/topics/" + pubSubTopic

	// Create a configured request.
	req := &dlppb.CreateDlpJobRequest{
		Parent: "projects/" + project,
		Job: &dlppb.CreateDlpJobRequest_InspectJob{
			InspectJob: &dlppb.InspectJobConfig{
				// StorageConfig describes where to find the data.
				StorageConfig: &dlppb.StorageConfig{
					Type: &dlppb.StorageConfig_BigQueryOptions{
						BigQueryOptions: &dlppb.BigQueryOptions{
							TableReference: &dlppb.BigQueryTable{
								ProjectId: dataProject,
								DatasetId: datasetID,
								TableId:   tableID,
							},
						},
					},
				},
				// InspectConfig describes what fields to look for.
				InspectConfig: &dlppb.InspectConfig{
					InfoTypes:       i,
					CustomInfoTypes: customInfoTypes,
					MinLikelihood:   minLikelihood,
					Limits: &dlppb.InspectConfig_FindingLimits{
						MaxFindingsPerRequest: maxFindings,
					},
					IncludeQuote: includeQuote,
				},
				// Send a message to PubSub using Actions.
				Actions: []*dlppb.Action{
					{
						Action: &dlppb.Action_PubSub{
							PubSub: &dlppb.Action_PublishToPubSub{
								Topic: topic,
							},
						},
					},
				},
			},
		},
	}
	// Create the inspect job.
	j, err := client.CreateDlpJob(context.Background(), req)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Fprintf(w, "Created job: %v\n", j.GetName())

	// Wait for the inspect job to finish by waiting for a PubSub message.
	ctx, cancel := context.WithCancel(ctx)
	err = s.Receive(ctx, func(ctx context.Context, msg *pubsub.Message) {
		// If this is the wrong job, do not process the result.
		if msg.Attributes["DlpJobName"] != j.GetName() {
			msg.Nack()
			return
		}
		msg.Ack()
		resp, err := client.GetDlpJob(ctx, &dlppb.GetDlpJobRequest{
			Name: j.GetName(),
		})
		if err != nil {
			log.Fatalf("Error getting completed job: %v\n", err)
		}
		r := resp.GetInspectDetails().GetResult().GetInfoTypeStats()
		if len(r) == 0 {
			fmt.Fprintf(w, "No results")
		}
		for _, s := range r {
			fmt.Fprintf(w, "  Found %v instances of infoType %v\n", s.GetCount(), s.GetInfoType().GetName())
		}
		// Stop listening for more messages.
		cancel()
	})
	if err != nil {
		log.Fatalf("Error receiving from PubSub: %v\n", err)
	}
}

PHP

use Google\Cloud\Dlp\V2\DlpServiceClient;
use Google\Cloud\Dlp\V2\BigQueryOptions;
use Google\Cloud\Dlp\V2\InfoType;
use Google\Cloud\Dlp\V2\InspectConfig;
use Google\Cloud\Dlp\V2\StorageConfig;
use Google\Cloud\Dlp\V2\BigQueryTable;
use Google\Cloud\Dlp\V2\Likelihood;
use Google\Cloud\Dlp\V2\DlpJob\JobState;
use Google\Cloud\Dlp\V2\InspectConfig\FindingLimits;
use Google\Cloud\Dlp\V2\Action;
use Google\Cloud\Dlp\V2\Action\PublishToPubSub;
use Google\Cloud\Dlp\V2\InspectJobConfig;
use Google\Cloud\PubSub\PubSubClient;

/**
 * Inspect a BigQuery table , using Pub/Sub for job status notifications.
 *
 * @param string $callingProjectId The project ID to run the API call under
 * @param string $dataProjectId The project ID containing the target Datastore
 * @param string $topicId The name of the Pub/Sub topic to notify once the job completes
 * @param string $subscriptionId The name of the Pub/Sub subscription to use when listening for job
 * @param string $datasetId The ID of the dataset to inspect
 * @param string $tableId The ID of the table to inspect
 * @param int $maxFindings The maximum number of findings to report per request (0 = server maximum)
 */
function inspect_bigquery(
  $callingProjectId,
  $dataProjectId,
  $topicId,
  $subscriptionId,
  $datasetId,
  $tableId,
  $maxFindings = 0
) {
    // Instantiate a client.
    $dlp = new DlpServiceClient();
    $pubsub = new PubSubClient();
    $topic = $pubsub->topic($topicId);

    // The infoTypes of information to match
    $personNameInfoType = (new InfoType())
        ->setName('PERSON_NAME');
    $creditCardNumberInfoType = (new InfoType())
        ->setName('CREDIT_CARD_NUMBER');
    $infoTypes = [$personNameInfoType, $creditCardNumberInfoType];

    // The minimum likelihood required before returning a match
    $minLikelihood = likelihood::LIKELIHOOD_UNSPECIFIED;

    // Specify finding limits
    $limits = (new FindingLimits())
        ->setMaxFindingsPerRequest($maxFindings);

    // Construct items to be inspected
    $bigqueryTable = (new BigQueryTable())
        ->setProjectId($dataProjectId)
        ->setDatasetId($datasetId)
        ->setTableId($tableId);

    $bigQueryOptions = (new BigQueryOptions())
        ->setTableReference($bigqueryTable);

    $storageConfig = (new StorageConfig())
        ->setBigQueryOptions($bigQueryOptions);

    // Construct the inspect config object
    $inspectConfig = (new InspectConfig())
        ->setMinLikelihood($minLikelihood)
        ->setLimits($limits)
        ->setInfoTypes($infoTypes);

    // Construct the action to run when job completes
    $pubSubAction = (new PublishToPubSub())
        ->setTopic($topic->name());

    $action = (new Action())
        ->setPubSub($pubSubAction);

    // Construct inspect job config to run
    $inspectJob = (new InspectJobConfig())
        ->setInspectConfig($inspectConfig)
        ->setStorageConfig($storageConfig)
        ->setActions([$action]);

    // Listen for job notifications via an existing topic/subscription.
    $subscription = $topic->subscription($subscriptionId);

    // Submit request
    $parent = $dlp->projectName($callingProjectId);
    $job = $dlp->createDlpJob($parent, [
        'inspectJob' => $inspectJob
    ]);

    // Poll via Pub/Sub until job finishes
    while (true) {
        foreach ($subscription->pull() as $message) {
            if (isset($message->attributes()['DlpJobName']) &&
                $message->attributes()['DlpJobName'] === $job->getName()) {
                $subscription->acknowledge($message);
                break 2;
            }
        }
    }

    // Sleep for half a second to avoid race condition with the job's status.
    usleep(500000);

    // Get the updated job
    $job = $dlp->getDlpJob($job->getName());

    // Print finding counts
    printf('Job %s status: %s' . PHP_EOL, $job->getName(), $job->getState());
    switch ($job->getState()) {
        case JobState::DONE:
            $infoTypeStats = $job->getInspectDetails()->getResult()->getInfoTypeStats();
            if (count($infoTypeStats) === 0) {
                print('No findings.' . PHP_EOL);
            } else {
                foreach ($infoTypeStats as $infoTypeStat) {
                    printf(
                        '  Found %s instance(s) of infoType %s' . PHP_EOL,
                        $infoTypeStat->getCount(),
                        $infoTypeStat->getInfoType()->getName()
                    );
                }
            }
            break;
        case JobState::FAILED:
            printf('Job %s had errors:' . PHP_EOL, $job->getName());
            $errors = $job->getErrors();
            foreach ($errors as $error) {
                var_dump($error->getDetails());
            }
            break;
        default:
            printf('Unknown job state. Most likely, the job is either running or has not yet started.');
    }
}

C#

public static object InspectBigQuery(
    string projectId,
    string minLikelihood,
    int maxFindings,
    bool includeQuote,
    IEnumerable<FieldId> identifyingFields,
    IEnumerable<InfoType> infoTypes,
    IEnumerable<CustomInfoType> customInfoTypes,
    string datasetId,
    string tableId)
{
    var inspectJob = new InspectJobConfig
    {
        StorageConfig = new StorageConfig
        {
            BigQueryOptions = new BigQueryOptions
            {
                TableReference = new Google.Cloud.Dlp.V2.BigQueryTable
                {
                    ProjectId = projectId,
                    DatasetId = datasetId,
                    TableId = tableId,
                },
                IdentifyingFields =
                {
                    identifyingFields
                }
            },

            TimespanConfig = new StorageConfig.Types.TimespanConfig
            {
                StartTime = Timestamp.FromDateTime(System.DateTime.UtcNow.AddYears(-1)),
                EndTime = Timestamp.FromDateTime(System.DateTime.UtcNow)
            }
        },

        InspectConfig = new InspectConfig
        {
            InfoTypes = { infoTypes },
            CustomInfoTypes = { customInfoTypes },
            Limits = new FindingLimits
            {
                MaxFindingsPerRequest = maxFindings
            },
            ExcludeInfoTypes = false,
            IncludeQuote = includeQuote,
            MinLikelihood = (Likelihood)System.Enum.Parse(typeof(Likelihood), minLikelihood)
        },
        Actions =
        {
            new Google.Cloud.Dlp.V2.Action
            {
                // Save results in BigQuery Table
                SaveFindings = new Google.Cloud.Dlp.V2.Action.Types.SaveFindings
                {
                    OutputConfig = new OutputStorageConfig
                    {
                        Table = new Google.Cloud.Dlp.V2.BigQueryTable
                        {
                            ProjectId = projectId,
                            DatasetId = datasetId,
                            TableId = tableId
                        }
                    }
                },
            }
        }
    };

    // Issue Create Dlp Job Request
    DlpServiceClient client = DlpServiceClient.Create();
    var request = new CreateDlpJobRequest
    {
        InspectJob = inspectJob,
        ParentAsProjectName = new ProjectName(projectId),
    };

    // We need created job name
    var dlpJob = client.CreateDlpJob(request);
    string jobName = dlpJob.Name;

    // Make sure the job finishes before inspecting the results.
    // Alternatively, we can inspect results opportunistically, but
    // for testing purposes, we want consistent outcome
    bool jobFinished = EnsureJobFinishes(projectId, jobName);
    if (jobFinished)
    {
        var bigQueryClient = BigQueryClient.Create(projectId);
        var table = bigQueryClient.GetTable(datasetId, tableId);

        // Return only first page of 10 rows
        Console.WriteLine("DLP v2 Results:");
        var firstPage = table.ListRows(new ListRowsOptions { StartIndex = 0, PageSize = 10 });
        foreach (var item in firstPage)
        {
            Console.WriteLine($"\t {item[""]}");
        }
    }

    return 0;
}

Configura la inspección de almacenamiento

Para inspeccionar una ubicación de Cloud Storage, un tipo de Cloud Datastore o una tabla de BigQuery, envía una solicitud al método projects.dlpJobs.create de la API de Cloud DLP que contenga, al menos, la ubicación de los datos que se deben analizar y cuáles. Más allá de esos parámetros requeridos, también puedes especificar dónde se escribirán los resultados del análisis, los límites de tamaño y probabilidades y mucho más. Una solicitud exitosa da como resultado la creación de un objeto de instancia DlpJob que se analiza en Recupera resultados de inspección.

Las opciones de configuración disponibles se resumen de esta forma:

  • Objeto InspectJobConfig: contiene la información sobre la configuración para el trabajo de inspección. Ten en cuenta que el objeto JobTriggers también usa el objeto InspectJobConfig para programar la creación de DlpJobs. En este objeto, se incluye lo siguiente:

    • Objeto StorageConfig: requerido. Contiene detalles sobre el repositorio de almacenamiento para analizar:

      • Se debe incluir uno de los siguientes en el objeto StorageConfig, según el tipo de repositorio de almacenamiento que se analiza:

        • Objeto CloudStorageOptions: contiene información sobre el depósito de Cloud Storage que se debe analizar.
        • Objeto DatastoreOptions: contiene información sobre el conjunto de datos de Cloud Datastore que se debe analizar.
        • Objeto BigQueryOptions: contiene información sobre la tabla de BigQuery (y, de forma opcional, campos de identificación) que se deben analizar. Este objeto también habilita el muestreo de resultados. Para obtener más información, consulta Habilita el muestreo de resultados debajo.
      • Objeto TimespanConfig: opcional. Especifica el período de los elementos que se deben incluir en el análisis.

    • Objeto InspectConfig: requerido. Especifica qué analizar, como Infotipos y valores likelihood.

      • Objetos InfoType: requeridos. Uno o más valores de Infotipos que se deben analizar.
      • Enumeración Likelihood: opcional. Cuando se establece, Cloud DLP solo mostrará resultados iguales o superiores a este límite de likelihood. Si se omite esta numeración, el valor predeterminado será POSSIBLE.
      • Objeto FindingLimits: opcional. Cuando se establece, este objeto te habilita a especificar un límite para la cantidad de resultados que se muestran.
      • Parámetro includeQuote: opcional. La configuración predeterminada es false. Cuando se establece en true, cada resultado incluirá una cita contextual de los datos que lo activaron.
      • Parámetro excludeInfoTypes: opcional. La configuración predeterminada es false. Cuando se establece en true, los resultados del análisis excluirán el tipo de información de los resultados.
      • Objetos CustomInfoType: Uno o más Infotipos personalizados y creados por el usuario. Para obtener más información sobre la creación de Infotipos personalizados, consulta Crea detectores de Infotipos personalizados.
    • String inspectTemplateName: opcional. Especifica una plantilla que se usa para propagar valores predeterminados en el objeto InspectConfig. Si ya especificaste InspectConfig, los valores de la plantilla se combinarán.

    • Objetos Action: opcionales. Una o más acciones para ejecutar cuando finaliza el trabajo. Cada acción se ejecuta en el orden en que se enumera. Aquí debes especificar dónde escribir los resultados o si quieres publicar una notificación en un tema Cloud Pub/Sub.

  • jobId: opcional. Un identificador del trabajo que muestra Cloud DLP. Si se omite jobId o queda vacío, el sistema crea un ID para el trabajo. Si se especifica, al trabajo se le asigna este valor de ID. El ID de trabajo debe ser único y puede contener letras mayúsculas y minúsculas, números y guiones. Esto significa que debe coincidir con la expresión regular siguiente: [a-zA-Z\\d-]+.

Limita la cantidad de contenido inspeccionado

Si analizas tablas de BigQuery o depósitos de Cloud Storage, Cloud DLP incluye una manera de analizar un pequeño subconjunto del conjunto de datos. Lo que se genera con esto es proporcionar una muestra de resultados de análisis sin incurrir en los costos potenciales de analizar un conjunto de datos completo.

En las secciones siguientes hay información sobre el límite de tamaño tanto de los análisis de BigQuery como los análisis de Cloud Storage.

Limita los análisis de BigQuery

Para habilitar el muestreo en BigQuery mediante la limitación de la cantidad de datos que se analizan, especifica los campos opcionales siguientes dentro de BigQueryOptions:

  • rowsLimit: el número máximo de filas que se deben analizar. Si la tabla tiene más filas que este valor, el resto de las filas se omiten. Si no se establece, o si se establece en 0, se analizarán todas las filas.
  • sampleMethod: cómo generar muestras de las filas en caso de que no se analicen todas. Si no se especifica, el análisis comienza desde arriba. Este campo se puede establecer en uno de dos valores:
    • TOP: el análisis comienza desde arriba.
    • RANDOM_START: el análisis comienza desde una fila seleccionada al azar.

En el ejemplo JSON siguiente, se muestra el uso de la API de Cloud DLP para analizar un subconjunto de 1,000 filas de una tabla de BigQuery. El análisis comienza desde una fila aleatoria.

Entrada de JSON:

POST https://dlp.googleapis.com/v2/projects/[PROJECT_NAME]/dlpJobs?key={YOUR_API_KEY}

{
  "inspectJob":{
    "storageConfig":{
      "bigQueryOptions":{
        "tableReference":{
          "projectId":"bigquery-public-data",
          "datasetId":"usa_names",
          "tableId":"usa_1910_current"
        },
        "rowsLimit":"1000",
        "sampleMethod":"RANDOM_START",
        "identifyingFields":[
          {
            "name":"name"
          }
        ]
      }
    },
    "inspectConfig":{
      "infoTypes":[
        {
          "name":"FIRST_NAME"
        }
      ],
      "includeQuote":true
    },
    "actions":[
      {
        "saveFindings":{
          "outputConfig":{
            "table":{
              "projectId":"[PROJECT_ID]",
              "datasetId":"testingdlp",
              "tableId":"bqsample3"
            },
            "outputSchema":"BASIC_COLUMNS"
          }
        }
      }
    ]
  }
}

Después de enviar la entrada JSON en una solicitud POST a la URL especificada, se crea un trabajo DLP y recibimos la respuesta JSON siguiente.

Resultado de JSON:

{
  "name":"projects/[PROJECT_ID]/dlpJobs/[JOB_ID]",
  "type":"INSPECT_JOB",
  "state":"PENDING",
  "inspectDetails":{
    "requestedOptions":{
      "snapshotInspectTemplate":{

      },
      "jobConfig":{
        "storageConfig":{
          "bigQueryOptions":{
            "tableReference":{
              "projectId":"bigquery-public-data",
              "datasetId":"usa_names",
              "tableId":"usa_1910_current"
            },
            "rowsLimit":"1000",
            "sampleMethod":"RANDOM_START"
          }
        },
        "inspectConfig":{
          "infoTypes":[
            {
              "name":"FIRST_NAME"
            }
          ],
          "minLikelihood":"POSSIBLE",
          "limits":{

          },
          "includeQuote":true
        },
        "actions":[
          {
            "saveFindings":{
              "outputConfig":{
                "table":{
                  "projectId":"[PROJECT_ID]",
                  "datasetId":"testingdlp",
                  "tableId":"bqsample3"
                },
                "outputSchema":"BASIC_COLUMNS"
              }
            }
          }
        ]
      }
    }
  },
  "createTime":"2018-05-25T21:02:50.655Z"
}

Cuando el trabajo de inspección termina de ejecutarse y BigQuery procesa los resultados, los resultados del análisis estarán disponibles en la tabla de BigQuery especificada. Para obtener más información sobre la recuperación de los resultados de inspección, consulta la sección siguiente.

Limita los análisis de Cloud Storage

Puedes habilitar el muestreo en Cloud Storage si limitas la cantidad de datos que se analizan. Puedes indicar a la API de Cloud DLP que analice solo los archivos de cierto tamaño, solo ciertos tipos de archivos y solo un cierto porcentaje de la cantidad total de archivos en el conjunto de archivos de entrada. Para hacer eso, especifica los campos opcionales siguientes dentro de CloudStorageOptions:

  • bytesLimitPerFile: establece el número máximo de bytes que se deben analizar en un archivo. Si el tamaño de un archivo analizado es mayor que este valor, el resto de los bytes se omiten.
  • fileTypes[]: enumera los grupos de tipos de archivos que se deben incluir en el análisis. Se puede establecer en uno o más de los siguientes tipos de FileType enumerados:
    • FILE_TYPE_UNSPECIFIED: todos los archivos
    • BINARY_FILE: todas las extensiones de archivo no incluidas en TEXT_FILE
    • TEXT_FILE: varios formatos de archivos de texto. Para obtener la lista más actualizada, consulta FileType
  • filesLimitPercent: limita el número de archivos que se analizarán al porcentaje especificado de la entrada FileSet. Especificar 0 o 100 aquí indica que no hay límite.
  • sampleMethod: cómo generar muestras de bytes si no se analizan todos. Especificar este valor es importante solo cuando se usa en combinación con bytesLimitPerFile. Si no se especifica, el análisis comienza desde arriba. Este campo se puede establecer en uno de dos valores:
    • TOP: el análisis comienza desde arriba.
    • RANDOM_START: para cada archivo mayor que el tamaño especificado en bytesLimitPerFile, elige al azar el desplazamiento a fin de comenzar a analizar. Los bytes analizados son contiguos.

En el ejemplo JSON siguiente, se muestra el uso de la API de Cloud DLP a fin de analizar el 90% de un subconjunto de un depósito de Cloud Storage para nombres de personas. El análisis comienza desde una ubicación aleatoria del conjunto de datos y solo incluye archivos de texto por debajo de los 200 bytes.

Entrada de JSON:

POST https://dlp.googleapis.com/v2/projects/[PROJECT_NAME]/dlpJobs?key={YOUR_API_KEY}

{
  "inspectJob":{
    "storageConfig":{
      "cloudStorageOptions":{
        "fileSet":{
          "url":"gs://[BUCKET-NAME]/*"
        },
        "bytesLimitPerFile":"200",
        "fileTypes":[
          "TEXT_FILE"
        ],
        "filesLimitPercent":90,
        "sampleMethod":"RANDOM_START"
      }
    },
    "inspectConfig":{
      "infoTypes":[
        {
          "name":"PERSON_NAME"
        }
      ],
      "excludeInfoTypes":true,
      "includeQuote":true,
      "minLikelihood":"POSSIBLE"
    },
    "actions":[
      {
        "saveFindings":{
          "outputConfig":{
            "table":{
              "projectId":"[PROJECT_ID]",
              "datasetId":"testingdlp"
            },
            "outputSchema":"BASIC_COLUMNS"
          }
        }
      }
    ]
  }
}

Después de enviar la entrada JSON en una solicitud POST a la URL especificada, se crea un trabajo DLP y recibimos la respuesta JSON siguiente.

Resultado de JSON:

{
  "name":"projects/[PROJECT_ID]/dlpJobs/[JOB_ID]",
  "type":"INSPECT_JOB",
  "state":"PENDING",
  "inspectDetails":{
    "requestedOptions":{
      "snapshotInspectTemplate":{

      },
      "jobConfig":{
        "storageConfig":{
          "cloudStorageOptions":{
            "fileSet":{
              "url":"gs://[BUCKET_NAME]/*"
            },
            "bytesLimitPerFile":"200",
            "fileTypes":[
              "TEXT_FILE"
            ],
            "sampleMethod":"TOP",
            "filesLimitPercent":90
          }
        },
        "inspectConfig":{
          "infoTypes":[
            {
              "name":"PERSON_NAME"
            }
          ],
          "minLikelihood":"POSSIBLE",
          "limits":{

          },
          "includeQuote":true,
          "excludeInfoTypes":true
        },
        "actions":[
          {
            "saveFindings":{
              "outputConfig":{
                "table":{
                  "projectId":"[PROJECT_ID]",
                  "datasetId":"[DATASET_ID]",
                  "tableId":"[TABLE_ID]"
                },
                "outputSchema":"BASIC_COLUMNS"
              }
            }
          }
        ]
      }
    }
  },
  "createTime":"2018-05-30T22:22:08.279Z"
}

Recupera los resultados de la inspección

Puedes recuperar un resumen de un DlpJob con el método projects.dlpJobs.get. El DlpJob que se muestra incluye su objeto InspectDataSourceDetails, el cual contiene un resumen de la configuración del trabajo (RequestedOptions) y un resumen del resultado del trabajo (Result). En el resumen de los resultados se incluye lo siguiente:

  • processedBytes: el tamaño total en bytes de lo que se procesó.
  • totalEstimatedBytes: estimación del número de bytes que faltan procesar.
  • Objeto InfoTypeStatistics: estadísticas sobre cuántas instancias de cada Infotipo se encontraron durante el trabajo de inspección.

Para los resultados de los trabajos de inspección completos, tienes dos opciones. Según la Action elegida, puede suceder lo siguiente con los trabajos de inspección:

  • Se guardan en BigQuery (el objeto SaveFindings) en la tabla especificada. Antes de ver o analizar los resultados, primero asegúrate de que el trabajo se completó mediante el método projects.dlpJobs.get que se describe debajo. Ten en cuenta que puedes especificar un esquema para almacenar resultados con el objeto OutputSchema.
  • Se publican en un tema de Cloud Pub/Sub (el objeto PublishToPubSub). El tema debe haberle dado derechos de acceso para publicar a la cuenta de servicio de Cloud DLP que ejecuta el DlpJob que envía notificaciones.

A modo de ayudar a examinar grandes cantidades de datos que genera Cloud DLP, puedes usar las herramientas incorporadas de BigQuery para ejecutar estadísticas de SQL enriquecidas o herramientas como Google Data Studio si quieres generar informes. Para obtener más información, consulta Analiza y también informa sobre los resultados de Cloud DLP.

El envío de una solicitud de inspección del repositorio de almacenamiento a Cloud DLP crea y ejecuta una instancia de objeto DlpJob en respuesta. Estas tareas pueden tardar segundos, minutos o incluso horas en ejecutarse, según el tamaño de tus datos y la configuración que especificaste. Si eliges publicar en un tema de Cloud Pub/Sub (mediante la especificación de PublishToPubSub en Action), se envían notificaciones de forma automática al tema con el nombre especificado cuando el estado del trabajo cambia. El nombre del tema de Cloud Pub/Sub se especifica de la forma projects/[PROJECT_ID]/topics/[PUBSUB-TOPIC-NAME].

Tendrás control total sobre los trabajos que crees, incluidos los métodos de administración siguientes:

  • Método projects.dlpJobs.cancel: detiene un trabajo que se encuentra en progreso. El servidor realiza su mejor esfuerzo para cancelar la operación, pero no se garantiza que lo logre. El trabajo y su configuración permanecerán hasta que los borres.
  • Método projects.dlpJobs.delete: borra un trabajo y su configuración.
  • Método projects.dlpJobs.get: recupera un solo trabajo y muestra su estado, configuración y, en caso de que el trabajo está completo, los resultados resumidos.
  • Método projects.dlpJobs.list: recupera una lista de todos los trabajos y también incluye la capacidad para filtrar resultados.
¿Te sirvió esta página? Envíanos tu opinión:

Enviar comentarios sobre…

Cloud Data Loss Prevention