对上传到 Cloud Storage 的数据进行自动分类

本教程介绍如何使用 Cloud Storage 和其他 Google Cloud 产品实现自动数据隔离和分类系统。本教程假定您熟悉 Google Cloud 和基本的 shell 编程。

在每个组织中,像您这样的数据保护官面临着越来越多的数据,这些数据必须进行妥善保护和处理。对这些数据进行隔离和分类可能既复杂又耗时,尤其是在每天有数百或数千个文件的情况下。

如果可以获取每个文件,将其上传到隔离区,使其自动分类并根据分类结果将其移动到合适的位置该有多好。本教程介绍如何使用 Cloud FunctionsCloud StorageCloud Data Loss Prevention 实现此类系统。

目标

  • 创建 Cloud Storage 存储分区,以用作隔离和分类流水线的一部分。
  • 创建 Pub/Sub 主题和订阅,以在文件处理完成时向您发出通知。
  • 创建一个在上传文件后调用 DLP API 的简单 Cloud Functions 函数。
  • 将一些示例文件上传到隔离区以调用 Cloud Functions 函数。该函数使用 DLP API 对文件进行检查和分类,并将其移动到合适的存储分区。

费用

本教程使用 Google Cloud 的可计费组件,包括:

  • Cloud Storage
  • Cloud Functions
  • Cloud Data Loss Prevention

您可以使用价格计算器根据您的预计使用情况来估算费用。

准备工作

  1. 登录您的 Google 帐号。

    如果您还没有 Google 帐号,请注册一个新帐号

  2. 在 Google Cloud Console 的项目选择器页面上,选择或创建一个 Google Cloud 项目。

    转到项目选择器页面

  3. 确保您的 Cloud 项目已启用结算功能。 了解如何确认您的项目是否已启用结算功能

  4. 启用 Cloud Functions, Cloud Storage, and Cloud Data Loss Prevention API。

    启用 API

为服务帐号授予权限

第一步是为以下的 Cloud Functions 服务帐号和 Cloud DLP 服务帐号授予权限:

为 App Engine 默认服务帐号授予权限

  1. 在 Cloud Console 中,打开“IAM 和管理”页面并选择您创建的项目:

    转到“IAM 和管理”页面

  2. 找到 App Engine 服务帐号。此帐号的格式为 [PROJECT_ID]@appspot.gserviceaccount.com。将 [PROJECT_ID] 替换为您的项目 ID。

  3. 选择服务帐号旁边的修改图标

  4. 添加以下角色:

    • 项目 > 所有者
    • Cloud DLP > DLP 管理员
    • 服务管理 > DLP API 服务代理
  5. 点击保存

为 DLP 服务帐号授予权限

  1. 在 Cloud Console 中,打开 IAM 和管理页面并选择您创建的项目:

    转到“IAM 和管理”页面

  2. 找到 Cloud DLP 服务代理服务帐号。此帐号的格式为 service-[PROJECT_NUMBER]@dlp-api.iam.gserviceaccount.com。将 [PROJECT_NUMBER] 替换为您的项目编号。

  3. 选择服务帐号旁边的修改图标

  4. 添加角色(项目 > Viewer),然后点击保存

构建隔离和分类流水线

在本部分中,您将构建下图所示的隔离和分类流水线。

隔离和分类工作流

此流水线中的数字对应于以下步骤:

  1. 将文件上传到 Cloud Storage。
  2. 调用 Cloud Functions 函数。
  3. Cloud DLP 对数据进行检查和分类。
  4. 文件被移动到合适的存储分区。

创建 Cloud Storage 存储分区

按照存储分区命名指南,创建三个具有唯一名称的存储分区,您将在本教程中使用这些存储分区:

  • 存储分区 1:将 [YOUR_QUARANTINE_BUCKET] 替换为一个唯一名称。
  • 存储分区 2:将 [YOUR_SENSITIVE_DATA_BUCKET] 替换为一个唯一名称。
  • 存储分区 3:将 [YOUR_NON_SENSITIVE_DATA_BUCKET] 替换为一个唯一名称。

控制台

  1. 在 Cloud Console 中打开 Cloud Storage 浏览器:

    转到 Cloud Storage 浏览器

  2. 点击创建存储分区

  3. 存储分区名称文本框中,输入为 [YOUR_QUARANTINE_BUCKET] 选择的名称,然后点击创建

  4. [YOUR_SENSITIVE_DATA_BUCKET][YOUR_NON_SENSITIVE_DATA_BUCKET] 存储分区重复上述步骤。

gcloud

  1. 打开 Cloud Shell:

    转到 Cloud Shell

  2. 使用以下命令创建三个存储分区:

    gsutil mb gs://[YOUR_QUARANTINE_BUCKET]
    gsutil mb gs://[YOUR_SENSITIVE_DATA_BUCKET]
    gsutil mb gs://[YOUR_NON_SENSITIVE_DATA_BUCKET]
    

创建 Pub/Sub 主题和订阅

控制台

  1. 打开 Pub/Sub 主题页面:

    转到“Pub/Sub 主题”

  2. 点击创建主题

  3. 在具有 PROJECTS/[YOUR_PROJECT_NAME]/TOPICS/ 格式条目的文本框中,将主题名称附加到条目尾部,如下所示:

    PROJECTS/[YOUR_PROJECT_NAME]/TOPICS/[PUB/SUB_TOPIC]
  4. 点击创建

  5. 选择新创建的主题,点击主题名称后面的三个点 (...),然后选择新建订阅

  6. 在具有 PROJECTS/[YOUR_PROJECT_NAME]/TOPICS/[PUB/SUB_TOPIC] 格式条目的文本框中,将订阅名称附加到条目尾部,如下所示:

    PROJECTS/[YOUR_PROJECT_NAME]/TOPICS/[PUB/SUB_TOPIC]/[PUB/SUB_SUBSCRIPTION]
  7. 点击创建

gcloud

  1. 打开 Cloud Shell:

    转到 Cloud Shell

  2. 创建一个主题,将 [PUB/SUB_TOPIC] 替换为您选择的名称:

    gcloud pubsub topics create [PUB/SUB_TOPIC]
  3. 创建一个订阅,将 [PUB/SUB_SUBSCRIPTION] 替换为您选择的名称:

    gcloud pubsub subscriptions create [PUB/SUB_SUBSCRIPTION] --topic [PUB/SUB_TOPIC]

创建 Cloud Functions 函数

本部分内容介绍部署包含以下两个 Cloud Functions 函数的 Python 脚本:

  • 将一个对象上传到 Cloud Storage 时调用的函数。
  • Pub/Sub 队列中收到消息时调用的函数。

创建第一个函数

Console

  1. 打开 Cloud Functions 概览页面:

    转到“Cloud Functions 概览”页面

  2. 选择已启用 Cloud Functions 的项目。

  3. 点击创建函数

  4. 名称框中,将默认名称替换为 create_DLP_job

  5. 触发器字段中,选择 Cloud Storage

  6. 存储分区字段中,点击浏览,通过突出显示下拉列表中的存储分区来选择隔离区,然后点击选择

  7. 运行时下,选择 Python 3.7

  8. 源代码下,选中内嵌编辑器

  9. 将以下代码粘贴到 main.py 框中,替换现有文本:

    """ Copyright 2018, Google, Inc.
    
    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.
    You may obtain a copy of the License at
    
      http://www.apache.org/licenses/LICENSE-2.0
    
    Unless  required by applicable law or agreed to in writing, software
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.
    
    Authors: Yuhan Guo, Zhaoyuan Sun, Fengyi Huang, Weimu Song.
    Date:    October 2018
    
    """
    
    from google.cloud import dlp
    from google.cloud import storage
    from google.cloud import pubsub
    import os
    
    # ----------------------------
    #  User-configurable Constants
    
    PROJECT_ID = '[PROJECT_ID_HOSTING_STAGING_BUCKET]'
    """The bucket the to-be-scanned files are uploaded to."""
    STAGING_BUCKET = '[YOUR_QUARANTINE_BUCKET]'
    """The bucket to move "sensitive" files to."""
    SENSITIVE_BUCKET = '[YOUR_SENSITIVE_DATA_BUCKET]'
    """The bucket to move "non sensitive" files to."""
    NONSENSITIVE_BUCKET = '[YOUR_NON_SENSITIVE_DATA_BUCKET]'
    """ Pub/Sub topic to notify once the  DLP job completes."""
    PUB_SUB_TOPIC = '[PUB/SUB_TOPIC]'
    """The minimum_likelihood (Enum) required before returning a match"""
    """For more info visit: https://cloud.google.com/dlp/docs/likelihood"""
    MIN_LIKELIHOOD = 'POSSIBLE'
    """The maximum number of findings to report (0 = server maximum)"""
    MAX_FINDINGS = 0
    """The infoTypes of information to match"""
    """For more info visit: https://cloud.google.com/dlp/docs/concepts-infotypes"""
    INFO_TYPES = [
        'FIRST_NAME', 'PHONE_NUMBER', 'EMAIL_ADDRESS', 'US_SOCIAL_SECURITY_NUMBER'
    ]
    
    # End of User-configurable Constants
    # ----------------------------------
    
    # Initialize the Google Cloud client libraries
    dlp = dlp.DlpServiceClient()
    storage_client = storage.Client()
    publisher = pubsub.PublisherClient()
    subscriber = pubsub.SubscriberClient()
    
    def create_DLP_job(data, done):
      """This function is triggered by new files uploaded to the designated Cloud Storage quarantine/staging bucket.
    
           It creates a dlp job for the uploaded file.
        Arg:
           data: The Cloud Storage Event
        Returns:
            None. Debug information is printed to the log.
        """
      # Get the targeted file in the quarantine bucket
      file_name = data['name']
      print('Function triggered for file [{}]'.format(file_name))
    
      # Prepare info_types by converting the list of strings (INFO_TYPES) into a list of dictionaries
      info_types = [{'name': info_type} for info_type in INFO_TYPES]
    
      # Convert the project id into a full resource id.
      parent = dlp.project_path(PROJECT_ID)
    
      # Construct the configuration dictionary.
      inspect_job = {
          'inspect_config': {
              'info_types': info_types,
              'min_likelihood': MIN_LIKELIHOOD,
              'limits': {
                  'max_findings_per_request': MAX_FINDINGS
              },
          },
          'storage_config': {
              'cloud_storage_options': {
                  'file_set': {
                      'url':
                          'gs://{bucket_name}/{file_name}'.format(
                              bucket_name=STAGING_BUCKET, file_name=file_name)
                  }
              }
          },
          'actions': [{
              'pub_sub': {
                  'topic':
                      'projects/{project_id}/topics/{topic_id}'.format(
                          project_id=PROJECT_ID, topic_id=PUB_SUB_TOPIC)
              }
          }]
      }
    
      # Create the DLP job and let the DLP api processes it.
      try:
        dlp.create_dlp_job(parent, inspect_job)
        print('Job created by create_DLP_job')
      except Exception as e:
        print(e)
    
    def resolve_DLP(data, context):
      """This function listens to the pub/sub notification from function above.
    
        As soon as it gets pub/sub notification, it picks up results from the
        DLP job and moves the file to sensitive bucket or nonsensitive bucket
        accordingly.
        Args:
            data: The Cloud Pub/Sub event
    
        Returns:
            None. Debug information is printed to the log.
        """
      # Get the targeted DLP job name that is created by the create_DLP_job function
      job_name = data['attributes']['DlpJobName']
      print('Received pub/sub notification from DLP job: {}'.format(job_name))
    
      # Get the DLP job details by the job_name
      job = dlp.get_dlp_job(job_name)
      print('Job Name:{name}\nStatus:{status}'.format(
          name=job.name, status=job.state))
    
      # Fetching Filename in Cloud Storage from the original dlpJob config.
      # See defintion of "JSON Output' in Limiting Cloud Storage Scans':
      # https://cloud.google.com/dlp/docs/inspecting-storage
    
      file_path = (
          job.inspect_details.requested_options.job_config.storage_config
          .cloud_storage_options.file_set.url)
      file_name = os.path.basename(file_path)
    
      info_type_stats = job.inspect_details.result.info_type_stats
      source_bucket = storage_client.get_bucket(STAGING_BUCKET)
      source_blob = source_bucket.blob(file_name)
      if (len(info_type_stats) > 0):
        # Found at least one sensitive data
        for stat in info_type_stats:
          print('Found {stat_cnt} instances of {stat_type_name}.'.format(
              stat_cnt=stat.count, stat_type_name=stat.info_type.name))
        print('Moving item to sensitive bucket')
        destination_bucket = storage_client.get_bucket(SENSITIVE_BUCKET)
        source_bucket.copy_blob(source_blob, destination_bucket,
                                file_name)  # copy the item to the sensitive bucket
        source_blob.delete()  # delete item from the quarantine bucket
    
      else:
        # No sensitive data found
        print('Moving item to non-sensitive bucket')
        destination_bucket = storage_client.get_bucket(NONSENSITIVE_BUCKET)
        source_bucket.copy_blob(
            source_blob, destination_bucket,
            file_name)  # copy the item to the non-sensitive bucket
        source_blob.delete()  # delete item from the quarantine bucket
      print('{} Finished'.format(file_name))
    
  10. 在您粘贴到 main.py 框的代码中调整以下行,将变量替换为您的项目的 ID、相应的存储分区以及您之前创建的 Pub/Sub 主题和订阅名称。

    [YOUR_QUARANTINE_BUCKET]
    [YOUR_SENSITIVE_DATA_BUCKET]
    [YOUR_NON_SENSITIVE_DATA_BUCKET]
    [PROJECT_ID_HOSTING_STAGING_BUCKET]
    [PUB/SUB_TOPIC]
    
  11. 要执行的函数文本框中,将 hello_gcs 替换为 create_DLP_job

  12. 将以下代码粘贴到 requirements.txt 文本框中,替换现有文本:

    google-cloud-dlp
    google-cloud-pubsub
    google-cloud-storage
    
    
  13. 点击保存

    函数旁边的绿色对勾标记表示部署成功。

    成功部署

gcloud

  1. 打开 Cloud Shell 会话并克隆包含代码和一些示例数据文件的 GitHub 代码库:

    在 Cloud Shell 中打开

  2. 将目录切换至已克隆代码库的文件夹:

    cd gcs-dlp-classification-python/
  3. 调整 main.py 框中代码的以下行,将以下存储分区变量替换为您之前创建的相应存储分区。此外,请将 Pub/Sub 主题和订阅变量替换为您选择的名称。

    [YOUR_QUARANTINE_BUCKET]
    [YOUR_SENSITIVE_DATA_BUCKET]
    [YOUR_NON_SENSITIVE_DATA_BUCKET]
    [PROJECT_ID_HOSTING_STAGING_BUCKET]
    [PUB/SUB_TOPIC]
    
  4. 部署该函数,将 [YOUR_QUARANTINE_BUCKET] 替换为您的存储分区名称:

    gcloud functions deploy create_DLP_job --runtime python37 \
        --trigger-resource [YOUR_QUARANTINE_BUCKET] \
        --trigger-event google.storage.object.finalize
    
  5. 验证函数是否已成功部署:

    gcloud functions describe create_DLP_job

    成功部署会显示类似以下内容的就绪状态:

    status:  READY
    timeout:  60s
    

成功部署 Cloud Functions 函数后,请继续下一部分以创建第二个 Cloud Function 函数。

创建第二个函数

Console

  1. 打开 Cloud Functions 概览页面:

    转到“Cloud Functions 概览”页面

  2. 选择已启用 Cloud Functions 的项目。

  3. 点击创建函数

  4. 名称框中,将默认名称替换为 resolve_DLP

  5. 触发器字段中,选择 Pub/Sub

  6. 主题字段中,输入 [PUB/SUB_TOPIC]

  7. 源代码下,选中内嵌编辑器

  8. 运行时下,选择 Python 3.7

  9. 将以下代码粘贴到 main.py 框中,替换现有文本:

    """ Copyright 2018, Google, Inc.
    
    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.
    You may obtain a copy of the License at
    
      http://www.apache.org/licenses/LICENSE-2.0
    
    Unless  required by applicable law or agreed to in writing, software
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.
    
    Authors: Yuhan Guo, Zhaoyuan Sun, Fengyi Huang, Weimu Song.
    Date:    October 2018
    
    """
    
    from google.cloud import dlp
    from google.cloud import storage
    from google.cloud import pubsub
    import os
    
    # ----------------------------
    #  User-configurable Constants
    
    PROJECT_ID = '[PROJECT_ID_HOSTING_STAGING_BUCKET]'
    """The bucket the to-be-scanned files are uploaded to."""
    STAGING_BUCKET = '[YOUR_QUARANTINE_BUCKET]'
    """The bucket to move "sensitive" files to."""
    SENSITIVE_BUCKET = '[YOUR_SENSITIVE_DATA_BUCKET]'
    """The bucket to move "non sensitive" files to."""
    NONSENSITIVE_BUCKET = '[YOUR_NON_SENSITIVE_DATA_BUCKET]'
    """ Pub/Sub topic to notify once the  DLP job completes."""
    PUB_SUB_TOPIC = '[PUB/SUB_TOPIC]'
    """The minimum_likelihood (Enum) required before returning a match"""
    """For more info visit: https://cloud.google.com/dlp/docs/likelihood"""
    MIN_LIKELIHOOD = 'POSSIBLE'
    """The maximum number of findings to report (0 = server maximum)"""
    MAX_FINDINGS = 0
    """The infoTypes of information to match"""
    """For more info visit: https://cloud.google.com/dlp/docs/concepts-infotypes"""
    INFO_TYPES = [
        'FIRST_NAME', 'PHONE_NUMBER', 'EMAIL_ADDRESS', 'US_SOCIAL_SECURITY_NUMBER'
    ]
    
    # End of User-configurable Constants
    # ----------------------------------
    
    # Initialize the Google Cloud client libraries
    dlp = dlp.DlpServiceClient()
    storage_client = storage.Client()
    publisher = pubsub.PublisherClient()
    subscriber = pubsub.SubscriberClient()
    
    def create_DLP_job(data, done):
      """This function is triggered by new files uploaded to the designated Cloud Storage quarantine/staging bucket.
    
           It creates a dlp job for the uploaded file.
        Arg:
           data: The Cloud Storage Event
        Returns:
            None. Debug information is printed to the log.
        """
      # Get the targeted file in the quarantine bucket
      file_name = data['name']
      print('Function triggered for file [{}]'.format(file_name))
    
      # Prepare info_types by converting the list of strings (INFO_TYPES) into a list of dictionaries
      info_types = [{'name': info_type} for info_type in INFO_TYPES]
    
      # Convert the project id into a full resource id.
      parent = dlp.project_path(PROJECT_ID)
    
      # Construct the configuration dictionary.
      inspect_job = {
          'inspect_config': {
              'info_types': info_types,
              'min_likelihood': MIN_LIKELIHOOD,
              'limits': {
                  'max_findings_per_request': MAX_FINDINGS
              },
          },
          'storage_config': {
              'cloud_storage_options': {
                  'file_set': {
                      'url':
                          'gs://{bucket_name}/{file_name}'.format(
                              bucket_name=STAGING_BUCKET, file_name=file_name)
                  }
              }
          },
          'actions': [{
              'pub_sub': {
                  'topic':
                      'projects/{project_id}/topics/{topic_id}'.format(
                          project_id=PROJECT_ID, topic_id=PUB_SUB_TOPIC)
              }
          }]
      }
    
      # Create the DLP job and let the DLP api processes it.
      try:
        dlp.create_dlp_job(parent, inspect_job)
        print('Job created by create_DLP_job')
      except Exception as e:
        print(e)
    
    def resolve_DLP(data, context):
      """This function listens to the pub/sub notification from function above.
    
        As soon as it gets pub/sub notification, it picks up results from the
        DLP job and moves the file to sensitive bucket or nonsensitive bucket
        accordingly.
        Args:
            data: The Cloud Pub/Sub event
    
        Returns:
            None. Debug information is printed to the log.
        """
      # Get the targeted DLP job name that is created by the create_DLP_job function
      job_name = data['attributes']['DlpJobName']
      print('Received pub/sub notification from DLP job: {}'.format(job_name))
    
      # Get the DLP job details by the job_name
      job = dlp.get_dlp_job(job_name)
      print('Job Name:{name}\nStatus:{status}'.format(
          name=job.name, status=job.state))
    
      # Fetching Filename in Cloud Storage from the original dlpJob config.
      # See defintion of "JSON Output' in Limiting Cloud Storage Scans':
      # https://cloud.google.com/dlp/docs/inspecting-storage
    
      file_path = (
          job.inspect_details.requested_options.job_config.storage_config
          .cloud_storage_options.file_set.url)
      file_name = os.path.basename(file_path)
    
      info_type_stats = job.inspect_details.result.info_type_stats
      source_bucket = storage_client.get_bucket(STAGING_BUCKET)
      source_blob = source_bucket.blob(file_name)
      if (len(info_type_stats) > 0):
        # Found at least one sensitive data
        for stat in info_type_stats:
          print('Found {stat_cnt} instances of {stat_type_name}.'.format(
              stat_cnt=stat.count, stat_type_name=stat.info_type.name))
        print('Moving item to sensitive bucket')
        destination_bucket = storage_client.get_bucket(SENSITIVE_BUCKET)
        source_bucket.copy_blob(source_blob, destination_bucket,
                                file_name)  # copy the item to the sensitive bucket
        source_blob.delete()  # delete item from the quarantine bucket
    
      else:
        # No sensitive data found
        print('Moving item to non-sensitive bucket')
        destination_bucket = storage_client.get_bucket(NONSENSITIVE_BUCKET)
        source_bucket.copy_blob(
            source_blob, destination_bucket,
            file_name)  # copy the item to the non-sensitive bucket
        source_blob.delete()  # delete item from the quarantine bucket
      print('{} Finished'.format(file_name))
    
  10. 在您粘贴到 main.py 框的代码中调整以下行,将变量替换为您的项目的 ID、相应的存储分区以及您之前创建的 Pub/Sub 主题和订阅名称。

    [YOUR_QUARANTINE_BUCKET]
    [YOUR_SENSITIVE_DATA_BUCKET]
    [YOUR_NON_SENSITIVE_DATA_BUCKET]
    [PROJECT_ID_HOSTING_STAGING_BUCKET]
    [PUB/SUB_TOPIC]
    
  11. 要执行的函数文本框中,将 helloPubSub 替换为 resolve_DLP

  12. 将以下代码粘贴到 requirements.txt 文本框中,替换现有文本:

    google-cloud-dlp
    google-cloud-pubsub
    google-cloud-storage
    
    
  13. 点击保存

    函数旁边的绿色对勾标记表示部署成功。

    成功部署

gcloud

  1. 打开(或重新打开)Cloud Shell 会话并克隆包含代码和一些示例数据文件的 GitHub 代码库:

    在 Cloud Shell 中打开

  2. 将目录切换至包含以下 Python 代码的文件夹:

    cd gcs-dlp-classification-python
  3. 调整 main.py 框中代码的以下行,将以下存储分区变量替换为您之前创建的相应存储分区。此外,请将 Pub/Sub 主题和订阅变量替换为您选择的名称。

    [YOUR_QUARANTINE_BUCKET]
    [YOUR_SENSITIVE_DATA_BUCKET]
    [YOUR_NON_SENSITIVE_DATA_BUCKET]
    [PROJECT_ID_HOSTING_STAGING_BUCKET]
    [PUB/SUB_TOPIC]
    
  4. 部署该函数,将 [PUB/SUB_TOPIC] 替换为您的 Pub/Sub 主题:

    gcloud functions deploy resolve_DLP --runtime python37 --trigger-topic [PUB/SUB_TOPIC]
  5. 验证函数是否已成功部署:

    gcloud functions describe resolve_DLP

    成功部署会显示类似以下内容的就绪状态:

    status:  READY
    timeout:  60s
    

成功部署 Cloud Functions 函数后,请继续下一部分。

将示例文件上传到隔离区

与本文相关的 GitHub 代码库包含示例数据文件。 该文件夹包含一些具有敏感数据的文件和其他具有非敏感数据的文件。 敏感数据被归类为包含一个或多个以下 INFO_TYPES 值:

US_SOCIAL_SECURITY_NUMBER
EMAIL_ADDRESS
PERSON_NAME
LOCATION
PHONE_NUMBER

用于对示例文件进行分类的数据类型在 main.py 文件中的 INFO_TYPES 常量中定义,该常量最初设置为 [‘PHONE_NUMBER', ‘EMAIL_ADDRESS']

  1. 如果您尚未克隆代码库,请打开 Cloud Shell 并克隆包含代码和一些示例数据文件的 GitHub 代码库:

    在 Cloud Shell 中打开

  2. 将文件夹切换至示例数据文件的文件夹:

    cd ~/dlp-cloud-functions-tutorials/sample_data/
  3. 使用 gsutil 命令将示例数据文件复制到隔离区,将 [YOUR_QUARANTINE_BUCKET] 替换为隔离区的名称:

    gsutil -m  cp * gs://[YOUR_QUARANTINE_BUCKET]/

    Cloud DLP 对上传到隔离区的每个文件进行检查和分类,并根据其分类将其移动到合适的目标存储分区。

  4. 在 Cloud Storage 控制台中,打开 Storage 浏览器页面:

    转到 Cloud Storage 浏览器

  5. 选择您之前创建的一个目标存储分区,然后查看上传的文件。然后查看您创建的其他存储分区。

清理

学完当前教程后,您可以清理在 Google Cloud 上创建的资源,以免这些资源占用配额,日后产生费用。以下部分介绍如何删除或关闭这些资源。

删除项目

  1. 在 Cloud Console 中,转到管理资源页面。

    转到“管理资源”页面

  2. 在项目列表中,选择要删除的项目,然后点击删除
  3. 在对话框中输入项目 ID,然后点击关闭以删除项目。

后续步骤