创建自定义 Data Catalog 条目

您可以调用 Data Catalog API 来创建和管理自定义数据资源类型的条目。在本文档中,自定义数据资源类型的条目称为“自定义条目”。

创建条目组和自定义条目

自定义条目必须位于用户创建的条目组中。您创建条目组,然后在条目组中创建自定义条目。

创建条目后,您可以针对条目组设置 IAM 政策,以定义谁有权访问条目群组以及其中的条目。

REST 和命令行

请参阅以下示例,并参阅 DataGroup REST API entryGroups.createentryGroups.entries.create 文档。

1.创建一个条目组

在使用下面的请求数据之前,请先进行以下替换:

  • project-id:您的 GCP 项目 ID
  • entryGroupId:ID 必须以字母或下划线开头,只能包含英文字母、数字和下划线,长度不超过 64 个字符。
  • displayName:条目组的文本名称。

HTTP 方法和网址:

POST https://datacatalog.googleapis.com/v1/projects/project-id/locations/us-central1/entryGroups?entryGroupId=entryGroupId

请求 JSON 正文:

{
  "displayName": "Entry Group display name"
}

如需发送您的请求,请展开以下选项之一:

您应会收到如下所示的 JSON 响应:

{
  "name": "projects/my_projectid/locations/us-central1/entryGroups/my_entry_group",
  "displayName": "Entry Group display name",
  "dataCatalogTimestamps": {
    "createTime": "2019-10-19T16:35:50.135Z",
    "updateTime": "2019-10-19T16:35:50.135Z"
  }
}

2.在条目组中创建一个自定义条目

在使用下面的请求数据之前,请先进行以下替换:

  • project_id:您的 GCP 项目 ID
  • entryGroupId:现有 entryGroup 的 ID。此条目将在此 EntryGroup 中创建。
  • entryId:新条目的 ID。ID 必须以字母或下划线开头,只能包含英文字母、数字和下划线,长度不超过 64 个字符。
  • description:可选的条目说明
  • displayName:条目的可选文本名称。
  • userSpecifiedType:自定义类型名称。类型名称必须以字母或下划线开头,只能包含字母、数字和下划线,并且最多只能包含 64 个字符。
  • userSpecifiedSystem:该条目的非 GCP 源系统,未与 Data Catalog 集成。源系统名称必须以字母或下划线开头,只能包含字母、数字和下划线,并且最多只能包含 64 个字符。
  • linkedResource:该条目引用的资源的可选完整名称。
  • schema:可选数据架构。

    JSON 架构示例
    { ...
      "schema": {
        "columns": [
          {
            "column": "first_name",
            "description": "First name",
            "mode": "REQUIRED",
            "type": "STRING"
          },
          {
            "column": "last_name",
            "description": "Last name",
            "mode": "REQUIRED",
            "type": "STRING"
          },
          {
            "column": "address",
            "description": "Address",
            "mode": "REPEATED",
            "subcolumns": [
              {
                "column": "city",
                "description": "City",
                "mode": "NULLABLE",
                "type": "STRING"
              },
              {
                "column": "state",
                "description": "State",
                "mode": "NULLABLE",
                "type": "STRING"
              }
            ],
            "type": "RECORD"
          }
        ]
      }
    ...
    }
    

HTTP 方法和网址:

POST https://datacatalog.googleapis.com/v1/projects/project_id/locations/us-central1/entryGroups/entryGroupId/entries?entryId=entryId

请求 JSON 正文:

{
  "description": "Description",
  "displayName": "Display name",
  "user_specified_type": "my_type",
  "user_specified_system": "my_system",
  "linked_resource": "abc.com/def",
  "schema": { schema }
}

如需发送您的请求,请展开以下选项之一:

您应会收到如下所示的 JSON 响应:

{
  "name": "projects/my_project_id/locations/us-central1/entryGroups/my_entryGroup_id/entries/my_entry_id",
  "userSpecifiedType": "my-type",
  "userSpecifiedSystem": "my_system",
  "displayName": "On-prem entry",
  "description": "My entry description.",
  "schema": {
    "columns": [
      {
        "type": "STRING",
        "description": "First name",
        "mode": "REQUIRED",
        "column": "first_name"
      },
      {
        "type": "STRING",
        "description": "Last name",
        "mode": "REQUIRED",
        "column": "last_name"
      },
      {
        "type": "RECORD",
        "description": "Address",
        "mode": "REPEATED",
        "column": "address",
        "subcolumns": [
          {
            "type": "STRING",
            "description": "City",
            "mode": "NULLABLE",
            "column": "city"
          },
          {
            "type": "STRING",
            "description": "State",
            "mode": "NULLABLE",
            "column": "state"
          }
        ]
      }
    ]
  },
  "sourceSystemTimestamps": {
    "createTime": "2019-10-23T23:11:26.326Z",
    "updateTime": "2019-10-23T23:11:26.326Z"
  },
"linkedResource": "abc.com/def"
}

Python

  1. 安装客户端库
  2. 设置应用默认凭据
  3. 运行代码
    """
    This application demonstrates how to perform core operations with the
    Data Catalog API.
    
    For more information, see the README.md and the official documentation at
    https://cloud.google.com/data-catalog/docs.
    """
    
    # -------------------------------
    # Import required modules.
    # -------------------------------
    from google.api_core.exceptions import NotFound, PermissionDenied
    from google.cloud import datacatalog_v1
    
    # -------------------------------
    # Currently, Data Catalog stores metadata in the
    # us-central1 region.
    # -------------------------------
    location = 'us-central1'
    
    # -------------------------------
    # TODO: Set these values before running the sample.
    # -------------------------------
    project_id = 'my-project'
    entry_group_id = 'onprem_entry_group'
    entry_id = 'onprem_entry_id'
    tag_template_id = 'onprem_tag_template'
    
    # -------------------------------
    # Use Application Default Credentials to create a new
    # Data Catalog client. GOOGLE_APPLICATION_CREDENTIALS
    # environment variable must be set with the location
    # of a service account key file.
    # -------------------------------
    datacatalog = datacatalog_v1.DataCatalogClient()
    
    # -------------------------------
    # 1. Environment cleanup: delete pre-existing data.
    # -------------------------------
    # Delete any pre-existing Entry with the same name
    # that will be used in step 3.
    expected_entry_name = datacatalog_v1.DataCatalogClient \
        .entry_path(project_id, location, entry_group_id, entry_id)
    
    try:
        datacatalog.delete_entry(name=expected_entry_name)
    except (NotFound, PermissionDenied):
        pass
    
    # Delete any pre-existing Entry Group with the same name
    # that will be used in step 2.
    expected_entry_group_name = datacatalog_v1.DataCatalogClient \
        .entry_group_path(project_id, location, entry_group_id)
    
    try:
        datacatalog.delete_entry_group(name=expected_entry_group_name)
    except (NotFound, PermissionDenied):
        pass
    
    # Delete any pre-existing Template with the same name
    # that will be used in step 4.
    expected_template_name = datacatalog_v1.DataCatalogClient \
        .tag_template_path(project_id, location, tag_template_id)
    
    try:
        datacatalog.delete_tag_template(name=expected_template_name, force=True)
    except (NotFound, PermissionDenied):
        pass
    
    # -------------------------------
    # 2. Create an Entry Group.
    # -------------------------------
    entry_group_obj = datacatalog_v1.types.EntryGroup()
    entry_group_obj.display_name = 'My awesome Entry Group'
    entry_group_obj.description = 'This Entry Group represents an external system'
    
    entry_group = datacatalog.create_entry_group(
        parent=datacatalog_v1.DataCatalogClient.location_path(project_id, location),
        entry_group_id=entry_group_id,
        entry_group=entry_group_obj)
    print('Created entry group: {}'.format(entry_group.name))
    
    # -------------------------------
    # 3. Create an Entry.
    # -------------------------------
    entry = datacatalog_v1.types.Entry()
    entry.user_specified_system = 'onprem_data_system'
    entry.user_specified_type = 'onprem_data_asset'
    entry.display_name = 'My awesome data asset'
    entry.description = 'This data asset is managed by an external system.'
    entry.linked_resource = '//my-onprem-server.com/dataAssets/my-awesome-data-asset'
    
    # Create the Schema, this is optional.
    columns = []
    columns.append(datacatalog_v1.types.ColumnSchema(
        column='first_column',
        type='STRING',
        description='This columns consists of ....',
        mode=None))
    
    columns.append(datacatalog_v1.types.ColumnSchema(
        column='second_column',
        type='DOUBLE',
        description='This columns consists of ....',
        mode=None))
    
    entry.schema.columns.extend(columns)
    
    entry = datacatalog.create_entry(
        parent=entry_group.name,
        entry_id=entry_id,
        entry=entry)
    print('Created entry: {}'.format(entry.name))
    
    # -------------------------------
    # 4. Create a Tag Template.
    # For more field types, including ENUM, please refer to
    # https://cloud.google.com/data-catalog/docs/quickstarts/quickstart-search-tag#data-catalog
    # -quickstart-python.
    # -------------------------------
    tag_template = datacatalog_v1.types.TagTemplate()
    tag_template.display_name = 'On-premises Tag Template'
    tag_template.fields['source'].display_name = 'Source of the data asset'
    tag_template.fields['source'].type.primitive_type = \
        datacatalog_v1.enums.FieldType.PrimitiveType.STRING.value
    
    tag_template = datacatalog.create_tag_template(
        parent=datacatalog_v1.DataCatalogClient.location_path(project_id, location),
        tag_template_id=tag_template_id,
        tag_template=tag_template)
    print('Created template: {}'.format(tag_template.name))
    
    # -------------------------------
    # 5. Attach a Tag to the custom Entry.
    # -------------------------------
    tag = datacatalog_v1.types.Tag()
    tag.template = tag_template.name
    tag.fields['source'].string_value = 'On-premises system name'
    
    tag = datacatalog.create_tag(parent=entry.name, tag=tag)
    print('Created tag: {}'.format(tag.name))
    
    

Java

  1. 安装客户端库
  2. 设置应用默认凭据
  3. 运行代码
    /*
    This application demonstrates how to perform core operations with the
    Data Catalog API.
    
    For more information, see the README.md and the official documentation at
    https://cloud.google.com/data-catalog/docs.
    */
    
    package com.example.datacatalog;
    
    import com.google.api.gax.rpc.AlreadyExistsException;
    import com.google.api.gax.rpc.NotFoundException;
    import com.google.api.gax.rpc.PermissionDeniedException;
    import com.google.cloud.datacatalog.v1.ColumnSchema;
    import com.google.cloud.datacatalog.v1.CreateEntryGroupRequest;
    import com.google.cloud.datacatalog.v1.CreateEntryRequest;
    import com.google.cloud.datacatalog.v1.CreateTagRequest;
    import com.google.cloud.datacatalog.v1.CreateTagTemplateRequest;
    import com.google.cloud.datacatalog.v1.DataCatalogClient;
    import com.google.cloud.datacatalog.v1.DeleteTagTemplateRequest;
    import com.google.cloud.datacatalog.v1.Entry;
    import com.google.cloud.datacatalog.v1.EntryGroup;
    import com.google.cloud.datacatalog.v1.EntryGroupName;
    import com.google.cloud.datacatalog.v1.EntryName;
    import com.google.cloud.datacatalog.v1.FieldType;
    import com.google.cloud.datacatalog.v1.LocationName;
    import com.google.cloud.datacatalog.v1.Schema;
    import com.google.cloud.datacatalog.v1.Tag;
    import com.google.cloud.datacatalog.v1.TagField;
    import com.google.cloud.datacatalog.v1.TagTemplate;
    import com.google.cloud.datacatalog.v1.TagTemplateField;
    import com.google.cloud.datacatalog.v1.TagTemplateName;
    import java.io.IOException;
    
    public class CreateCustomType {
    
      public static void createCustomType() {
        // TODO(developer): Replace these variables before running the sample.
        String projectId = "my-project";
        String entryGroupId = "onprem_entry_group";
        String entryId = "onprem_entry_id";
        String tagTemplateId = "onprem_tag_template";
        createCustomType(projectId, entryGroupId, entryId, tagTemplateId);
      }
    
      public static void createCustomType(String projectId, String entryGroupId, String entryId,
          String tagTemplateId) {
        // Currently, Data Catalog stores metadata in the us-central1 region.
        String location = "us-central1";
    
        // Initialize client that will be used to send requests. This client only needs to be created
        // once, and can be reused for multiple requests. After completing all of your requests, call
        // the "close" method on the client to safely clean up any remaining background resources.
        try (DataCatalogClient dataCatalogClient = DataCatalogClient.create()) {
    
          // 1. Environment cleanup: delete pre-existing data.
          // Delete any pre-existing Entry with the same name
          // that will be used in step 3.
          try {
            String entryName = EntryName.of(projectId, location, entryGroupId, entryId).toString();
            dataCatalogClient.deleteEntry(entryName);
            System.out.printf("\nDeleted Entry: %s", entryName);
          } catch (PermissionDeniedException | NotFoundException e) {
            // PermissionDeniedException or NotFoundException are thrown if
            // Entry does not exist.
            System.out.println("Entry does not exist.");
          }
    
          // Delete any pre-existing Entry Group with the same name
          // that will be used in step 2.
          try {
            String entryGroupName = EntryGroupName.of(projectId, location, entryGroupId).toString();
            dataCatalogClient.deleteEntryGroup(entryGroupName);
            System.out.printf("\nDeleted Entry Group: %s", entryGroupName);
          } catch (PermissionDeniedException | NotFoundException e) {
            // PermissionDeniedException or NotFoundException are thrown if
            // Entry Group does not exist.
            System.out.println("Entry Group does not exist.");
          }
    
          String tagTemplateName =
              TagTemplateName.newBuilder()
                  .setProject(projectId)
                  .setLocation(location)
                  .setTagTemplate(tagTemplateId)
                  .build()
                  .toString();
    
          // Delete any pre-existing Template with the same name
          // that will be used in step 4.
          try {
            dataCatalogClient.deleteTagTemplate(
                DeleteTagTemplateRequest.newBuilder()
                    .setName(tagTemplateName)
                    .setForce(true)
                    .build());
            System.out.printf("\nDeleted template: %s", tagTemplateName);
          } catch (Exception e) {
            System.out.printf("\nCannot delete template: %s", tagTemplateName);
          }
    
          // 2. Create an Entry Group.
          // Construct the EntryGroup for the EntryGroup request.
          EntryGroup entryGroup =
              EntryGroup.newBuilder()
                  .setDisplayName("My awesome Entry Group")
                  .setDescription("This Entry Group represents an external system")
                  .build();
    
          // Construct the EntryGroup request to be sent by the client.
          CreateEntryGroupRequest entryGroupRequest =
              CreateEntryGroupRequest.newBuilder()
                  .setParent(LocationName.of(projectId, location).toString())
                  .setEntryGroupId(entryGroupId)
                  .setEntryGroup(entryGroup)
                  .build();
    
          // Use the client to send the API request.
          EntryGroup createdEntryGroup = dataCatalogClient.createEntryGroup(entryGroupRequest);
    
          System.out.printf("\nEntry Group created with name: %s", createdEntryGroup.getName());
    
          // 3. Create an Entry.
          // Construct the Entry for the Entry request.
          Entry entry =
              Entry.newBuilder()
                  .setUserSpecifiedSystem("onprem_data_system")
                  .setUserSpecifiedType("onprem_data_asset")
                  .setDisplayName("My awesome data asset")
                  .setDescription("This data asset is managed by an external system.")
                  .setLinkedResource("//my-onprem-server.com/dataAssets/my-awesome-data-asset")
                  .setSchema(
                      Schema.newBuilder()
                          .addColumns(
                              ColumnSchema.newBuilder()
                                  .setColumn("first_column")
                                  .setDescription("This columns consists of ....")
                                  .setMode("NULLABLE")
                                  .setType("DOUBLE")
                                  .build())
                          .addColumns(
                              ColumnSchema.newBuilder()
                                  .setColumn("second_column")
                                  .setDescription("This columns consists of ....")
                                  .setMode("REQUIRED")
                                  .setType("STRING")
                                  .build())
                          .build())
                  .build();
    
          // Construct the Entry request to be sent by the client.
          CreateEntryRequest entryRequest =
              CreateEntryRequest.newBuilder()
                  .setParent(createdEntryGroup.getName())
                  .setEntryId(entryId)
                  .setEntry(entry)
                  .build();
    
          // Use the client to send the API request.
          Entry createdEntry = dataCatalogClient.createEntry(entryRequest);
          System.out.printf("\nEntry created with name: %s", createdEntry.getName());
    
          // 4. Create a Tag Template.
          // For more field types, including ENUM, please refer to
          // https://cloud.google.com/data-catalog/docs/quickstarts/quickstart-search-tag#data-catalog-quickstart-java.
          TagTemplateField sourceField =
              TagTemplateField.newBuilder()
                  .setDisplayName("Source of data asset")
                  .setType(FieldType.newBuilder().setPrimitiveType(
                      FieldType.PrimitiveType.STRING).build())
                  .build();
    
          TagTemplate tagTemplate =
              TagTemplate.newBuilder()
                  .setDisplayName("Demo Tag Template")
                  .putFields("source", sourceField)
                  .build();
    
          CreateTagTemplateRequest createTagTemplateRequest =
              CreateTagTemplateRequest.newBuilder()
                  .setParent(
                      LocationName.newBuilder()
                          .setProject(projectId)
                          .setLocation(location)
                          .build()
                          .toString())
                  .setTagTemplateId(tagTemplateId)
                  .setTagTemplate(tagTemplate)
                  .build();
    
          TagTemplate createdTagTemplate = dataCatalogClient
              .createTagTemplate(createTagTemplateRequest);
          System.out.printf("\nTemplate created with name: %s", createdTagTemplate.getName());
    
          TagField sourceValue =
              TagField.newBuilder().setStringValue("On-premises system name").build();
    
          Tag tag =
              Tag.newBuilder()
                  .setTemplate(createdTagTemplate.getName())
                  .putFields("source", sourceValue)
                  .build();
    
          CreateTagRequest createTagRequest =
              CreateTagRequest.newBuilder().setParent(createdEntry.getName()).setTag(tag).build();
    
          Tag createdTag = dataCatalogClient.createTag(createTagRequest);
          System.out.printf("\nCreated tag: %s", createdTag.getName());
    
        } catch (AlreadyExistsException | IOException e) {
          // AlreadyExistsException is thrown if the EntryGroup or Entry already exists.
          // IOException is thrown when unable to create the DataCatalogClient,
          // for example an invalid Service Account path.
          System.out.println("Error creating entry:\n" + e.toString());
        }
      }
    }
    

Node.js

  1. 安装客户端库
  2. 设置应用默认凭据
  3. 运行代码
    /**
    * This application demonstrates how to perform core operations with the
    * Data Catalog API.
    
    * For more information, see the README.md and the official documentation at
    * https://cloud.google.com/data-catalog/docs.
    */
    
    const main = async (
      projectId = process.env.GCLOUD_PROJECT,
      entryGroupId,
      entryId,
      tagTemplateId
    ) => {
    
      // -------------------------------
      // Import required modules.
      // -------------------------------
      const { DataCatalogClient } = require('@google-cloud/datacatalog').v1;
      const datacatalog = new DataCatalogClient();
    
      // -------------------------------
      // Currently, Data Catalog stores metadata in the
      // us-central1 region.
      // -------------------------------
      const location = "us-central1";
    
      // -------------------------------
      // 1. Environment cleanup: delete pre-existing data.
      // -------------------------------
      // Delete any pre-existing Entry with the same name
      // that will be used in step 3.
      try {
        const entryName = datacatalog.entryPath(projectId, location, entryGroupId, entryId);
        await datacatalog.deleteEntry({ name: entryName });
        console.log(`Deleted Entry: ${entryName}`);
      } catch (err) {
        console.log('Entry does not exist.');
      }
    
      // Delete any pre-existing Entry Group with the same name
      // that will be used in step 2.
      try {
        const entryGroupName = datacatalog.entryGroupPath(projectId, location, entryGroupId);
        await datacatalog.deleteEntryGroup({ name: entryGroupName });
        console.log(`Deleted Entry Group: ${entryGroupName}`);
      } catch (err) {
        console.log('Entry Group does not exist.');
      }
    
      // Delete any pre-existing Template with the same name
      // that will be used in step 4.
      const tagTemplateName = datacatalog.tagTemplatePath(
        projectId,
        location,
        tagTemplateId,
      );
    
      try {
        const tagTemplateRequest = {
          name: tagTemplateName,
          force: true,
        };
        await datacatalog.deleteTagTemplate(tagTemplateRequest);
        console.log(`Deleted template: ${tagTemplateName}`);
      } catch (error) {
        console.log(`Cannot delete template: ${tagTemplateName}`);
      }
    
      // -------------------------------
      // 2. Create an Entry Group.
      // -------------------------------
      // Construct the EntryGroup for the EntryGroup request.
      const entryGroup = {
        displayName: 'My awesome Entry Group',
        description: 'This Entry Group represents an external system',
      }
    
      // Construct the EntryGroup request to be sent by the client.
      const entryGroupRequest = {
        parent: datacatalog.locationPath(projectId, location),
        entryGroupId: entryGroupId,
        entryGroup: entryGroup,
      };
    
      // Use the client to send the API request.
      const [createdEntryGroup] = await datacatalog.createEntryGroup(entryGroupRequest)
      console.log(`Created entry group: ${createdEntryGroup.name}`);
    
      // -------------------------------
      // 3. Create an Entry.
      // -------------------------------
      // Construct the Entry for the Entry request.
      const entry = {
        userSpecifiedSystem: 'onprem_data_system',
        userSpecifiedType: 'onprem_data_asset',
        displayName: 'My awesome data asset',
        description: 'This data asset is managed by an external system.',
        linkedResource: '//my-onprem-server.com/dataAssets/my-awesome-data-asset',
        schema: {
          columns: [
            {
              column: 'first_column',
              description: 'This columns consists of ....',
              mode: 'NULLABLE',
              type: 'STRING',
            },
            {
              column: 'second_column',
              description: 'This columns consists of ....',
              mode: 'NULLABLE',
              type: 'DOUBLE',
            }
          ],
        },
      };
    
      // Construct the Entry request to be sent by the client.
      const entryRequest = {
        parent: datacatalog.entryGroupPath(projectId, location, entryGroupId),
        entryId: entryId,
        entry: entry,
      };
    
      // Use the client to send the API request.
      const [createdEntry] = await datacatalog.createEntry(entryRequest)
      console.log(`Created entry: ${createdEntry.name}`);
    
      // -------------------------------
      // 4. Create a Tag Template.
      // For more field types, including ENUM, please refer to
      // https://cloud.google.com/data-catalog/docs/quickstarts/quickstart-search-tag#data-catalog-quickstart-nodejs.
      // -------------------------------
      const fieldSource = {
        displayName: 'Source of data asset',
        type: {
          primitiveType: 'STRING',
        },
      };
    
      const tagTemplate = {
        displayName: 'Demo Tag Template',
        fields: {
          source: fieldSource,
        },
      };
    
      tagTemplateRequest = {
        parent: datacatalog.locationPath(projectId, location),
        tagTemplateId: tagTemplateId,
        tagTemplate: tagTemplate,
      };
    
      // Use the client to send the API request.
      const [createdTagTemplate] = await datacatalog.createTagTemplate(tagTemplateRequest);
      console.log(`Created template: ${createdTagTemplate.name}`);
    
      // -------------------------------
      // 5. Attach a Tag to the custom Entry.
      // -------------------------------
      const tag = {
        template: createdTagTemplate.name,
        fields: {
          source: {
            stringValue: 'On-premises system name',
          },
        },
      };
    
      const tagRequest = {
        parent: createdEntry.name,
        tag: tag,
      };
    
      // Use the client to send the API request.
      const [createdTag] = await datacatalog.createTag(tagRequest);
      console.log(`Created tag: ${createdTag.name}`);
    
      // [END datacatalog_custom_entries_tag]
    
    };
    
    // TODO: Change these values before running the sample
    // node createCustomType.js my-project onprem_entry_group onprem_entry_id onprem_tag_template
    main(...process.argv.slice(2));