Como criar entradas personalizadas do Data Catalog

Você pode chamar APIs do Data Catalog para criar e gerenciar entradas para tipos de recursos de dados personalizados. Neste documento, uma entrada para um tipo de recurso de dados personalizado é chamada de "entrada personalizada".

Como criar grupos de entrada e entradas personalizadas

As entradas personalizadas devem ser inseridas em um grupo de entrada criado pelo usuário. Você cria o grupo de entradas e cria a entrada personalizada no grupo de entrada.

Depois de criar uma entrada, você pode definir políticas do IAM no grupo de entrada para definir quem tem acesso ao grupo de entrada e às entradas nele.

REST e LINHA DE CMD

Veja os exemplos a seguir e consulte a documentação da API REST Data Exchange entryGroups.create e entryGroups.entries.create.

1. Criar um grupo de entrada

Antes de usar os dados da solicitação abaixo, faça as substituições a seguir:

  • project-id: ID do projeto do GCP.
  • entryGroupId: o ID deve começar com uma letra ou um sublinhado, conter somente letras, números e sublinhados ingleses e ter no máximo 64 caracteres.
  • displayName: o nome textual do grupo de entrada.

Método HTTP e URL:

POST https://datacatalog.googleapis.com/v1/projects/project-id/locations/us-central1/entryGroups?entryGroupId=entryGroupId

Corpo JSON da solicitação:

{
  "displayName": "Entry Group display name"
}

Para enviar a solicitação, expanda uma destas opções:

Você receberá uma resposta JSON semelhante a esta:

{
  "name": "projects/my_projectid/locations/us-central1/entryGroups/my_entry_group",
  "displayName": "Entry Group display name",
  "dataCatalogTimestamps": {
    "createTime": "2019-10-19T16:35:50.135Z",
    "updateTime": "2019-10-19T16:35:50.135Z"
  }
}

2. Criar uma entrada personalizada no grupo de entrada

Antes de usar os dados da solicitação abaixo, faça as substituições a seguir:

  • project_id: ID do projeto do GCP.
  • entryGroupId: ID do entryGroup existente. A entrada será criada neste EntryGroup.
  • entryId: ID da nova entrada. O ID deve começar com uma letra ou um sublinhado, conter somente letras, números e sublinhados ingleses e ter no máximo 64 caracteres.
  • description: descrição da entrada opcional
  • displayName: nome textual opcional para a entrada.
  • userSpecifiedType: nome do tipo personalizado. O nome do tipo deve começar com uma letra ou um sublinhado, conter apenas letras, números e sublinhados e deve ter no máximo 64 caracteres.
  • userSpecifiedSystem: o sistema de origem que não é do GCP, que não é integrado ao Data Catalog. O nome do sistema de origem deve começar com uma letra ou um sublinhado, conter apenas letras, números e sublinhados e deve ter no máximo 64 caracteres.
  • linkedResource: nome completo opcional do recurso ao qual a entrada se refere.
  • schema: esquema de dados opcional.

    Exemplo de esquema JSON:
    { ...
      "schema": {
        "columns": [
          {
            "column": "first_name",
            "description": "First name",
            "mode": "REQUIRED",
            "type": "STRING"
          },
          {
            "column": "last_name",
            "description": "Last name",
            "mode": "REQUIRED",
            "type": "STRING"
          },
          {
            "column": "address",
            "description": "Address",
            "mode": "REPEATED",
            "subcolumns": [
              {
                "column": "city",
                "description": "City",
                "mode": "NULLABLE",
                "type": "STRING"
              },
              {
                "column": "state",
                "description": "State",
                "mode": "NULLABLE",
                "type": "STRING"
              }
            ],
            "type": "RECORD"
          }
        ]
      }
    ...
    }
    

Método HTTP e URL:

POST https://datacatalog.googleapis.com/v1/projects/project_id/locations/us-central1/entryGroups/entryGroupId/entries?entryId=entryId

Corpo JSON da solicitação:

{
  "description": "Description",
  "displayName": "Display name",
  "user_specified_type": "my_type",
  "user_specified_system": "my_system",
  "linked_resource": "abc.com/def",
  "schema": { schema }
}

Para enviar a solicitação, expanda uma destas opções:

Você receberá uma resposta JSON semelhante a esta:

{
  "name": "projects/my_project_id/locations/us-central1/entryGroups/my_entryGroup_id/entries/my_entry_id",
  "userSpecifiedType": "my-type",
  "userSpecifiedSystem": "my_system",
  "displayName": "On-prem entry",
  "description": "My entry description.",
  "schema": {
    "columns": [
      {
        "type": "STRING",
        "description": "First name",
        "mode": "REQUIRED",
        "column": "first_name"
      },
      {
        "type": "STRING",
        "description": "Last name",
        "mode": "REQUIRED",
        "column": "last_name"
      },
      {
        "type": "RECORD",
        "description": "Address",
        "mode": "REPEATED",
        "column": "address",
        "subcolumns": [
          {
            "type": "STRING",
            "description": "City",
            "mode": "NULLABLE",
            "column": "city"
          },
          {
            "type": "STRING",
            "description": "State",
            "mode": "NULLABLE",
            "column": "state"
          }
        ]
      }
    ]
  },
  "sourceSystemTimestamps": {
    "createTime": "2019-10-23T23:11:26.326Z",
    "updateTime": "2019-10-23T23:11:26.326Z"
  },
"linkedResource": "abc.com/def"
}

Python

  1. Instalar a biblioteca cliente
  2. Configurar as credenciais padrão do aplicativo
  3. Execute o código
    """
    This application demonstrates how to perform core operations with the
    Data Catalog API.
    
    For more information, see the README.md and the official documentation at
    https://cloud.google.com/data-catalog/docs.
    """
    
    # -------------------------------
    # Import required modules.
    # -------------------------------
    from google.api_core.exceptions import NotFound, PermissionDenied
    from google.cloud import datacatalog_v1
    
    # -------------------------------
    # Currently, Data Catalog stores metadata in the
    # us-central1 region.
    # -------------------------------
    location = 'us-central1'
    
    # -------------------------------
    # TODO: Set these values before running the sample.
    # -------------------------------
    project_id = 'my-project'
    entry_group_id = 'onprem_entry_group'
    entry_id = 'onprem_entry_id'
    tag_template_id = 'onprem_tag_template'
    
    # -------------------------------
    # Use Application Default Credentials to create a new
    # Data Catalog client. GOOGLE_APPLICATION_CREDENTIALS
    # environment variable must be set with the location
    # of a service account key file.
    # -------------------------------
    datacatalog = datacatalog_v1.DataCatalogClient()
    
    # -------------------------------
    # 1. Environment cleanup: delete pre-existing data.
    # -------------------------------
    # Delete any pre-existing Entry with the same name
    # that will be used in step 3.
    expected_entry_name = datacatalog_v1.DataCatalogClient \
        .entry_path(project_id, location, entry_group_id, entry_id)
    
    try:
        datacatalog.delete_entry(name=expected_entry_name)
    except (NotFound, PermissionDenied):
        pass
    
    # Delete any pre-existing Entry Group with the same name
    # that will be used in step 2.
    expected_entry_group_name = datacatalog_v1.DataCatalogClient \
        .entry_group_path(project_id, location, entry_group_id)
    
    try:
        datacatalog.delete_entry_group(name=expected_entry_group_name)
    except (NotFound, PermissionDenied):
        pass
    
    # Delete any pre-existing Template with the same name
    # that will be used in step 4.
    expected_template_name = datacatalog_v1.DataCatalogClient \
        .tag_template_path(project_id, location, tag_template_id)
    
    try:
        datacatalog.delete_tag_template(name=expected_template_name, force=True)
    except (NotFound, PermissionDenied):
        pass
    
    # -------------------------------
    # 2. Create an Entry Group.
    # -------------------------------
    entry_group_obj = datacatalog_v1.types.EntryGroup()
    entry_group_obj.display_name = 'My awesome Entry Group'
    entry_group_obj.description = 'This Entry Group represents an external system'
    
    entry_group = datacatalog.create_entry_group(
        parent=datacatalog_v1.DataCatalogClient.location_path(project_id, location),
        entry_group_id=entry_group_id,
        entry_group=entry_group_obj)
    print('Created entry group: {}'.format(entry_group.name))
    
    # -------------------------------
    # 3. Create an Entry.
    # -------------------------------
    entry = datacatalog_v1.types.Entry()
    entry.user_specified_system = 'onprem_data_system'
    entry.user_specified_type = 'onprem_data_asset'
    entry.display_name = 'My awesome data asset'
    entry.description = 'This data asset is managed by an external system.'
    entry.linked_resource = '//my-onprem-server.com/dataAssets/my-awesome-data-asset'
    
    # Create the Schema, this is optional.
    columns = []
    columns.append(datacatalog_v1.types.ColumnSchema(
        column='first_column',
        type='STRING',
        description='This columns consists of ....',
        mode=None))
    
    columns.append(datacatalog_v1.types.ColumnSchema(
        column='second_column',
        type='DOUBLE',
        description='This columns consists of ....',
        mode=None))
    
    entry.schema.columns.extend(columns)
    
    entry = datacatalog.create_entry(
        parent=entry_group.name,
        entry_id=entry_id,
        entry=entry)
    print('Created entry: {}'.format(entry.name))
    
    # -------------------------------
    # 4. Create a Tag Template.
    # For more field types, including ENUM, please refer to
    # https://cloud.google.com/data-catalog/docs/quickstarts/quickstart-search-tag#data-catalog
    # -quickstart-python.
    # -------------------------------
    tag_template = datacatalog_v1.types.TagTemplate()
    tag_template.display_name = 'On-premises Tag Template'
    tag_template.fields['source'].display_name = 'Source of the data asset'
    tag_template.fields['source'].type.primitive_type = \
        datacatalog_v1.enums.FieldType.PrimitiveType.STRING.value
    
    tag_template = datacatalog.create_tag_template(
        parent=datacatalog_v1.DataCatalogClient.location_path(project_id, location),
        tag_template_id=tag_template_id,
        tag_template=tag_template)
    print('Created template: {}'.format(tag_template.name))
    
    # -------------------------------
    # 5. Attach a Tag to the custom Entry.
    # -------------------------------
    tag = datacatalog_v1.types.Tag()
    tag.template = tag_template.name
    tag.fields['source'].string_value = 'On-premises system name'
    
    tag = datacatalog.create_tag(parent=entry.name, tag=tag)
    print('Created tag: {}'.format(tag.name))
    
    

Java

  1. Instalar a biblioteca cliente
  2. Configurar as credenciais padrão do aplicativo
  3. Execute o código
    /*
    This application demonstrates how to perform core operations with the
    Data Catalog API.
    
    For more information, see the README.md and the official documentation at
    https://cloud.google.com/data-catalog/docs.
    */
    
    package com.example.datacatalog;
    
    import com.google.api.gax.rpc.AlreadyExistsException;
    import com.google.api.gax.rpc.NotFoundException;
    import com.google.api.gax.rpc.PermissionDeniedException;
    import com.google.cloud.datacatalog.v1.ColumnSchema;
    import com.google.cloud.datacatalog.v1.CreateEntryGroupRequest;
    import com.google.cloud.datacatalog.v1.CreateEntryRequest;
    import com.google.cloud.datacatalog.v1.CreateTagRequest;
    import com.google.cloud.datacatalog.v1.CreateTagTemplateRequest;
    import com.google.cloud.datacatalog.v1.DataCatalogClient;
    import com.google.cloud.datacatalog.v1.DeleteTagTemplateRequest;
    import com.google.cloud.datacatalog.v1.Entry;
    import com.google.cloud.datacatalog.v1.EntryGroup;
    import com.google.cloud.datacatalog.v1.EntryGroupName;
    import com.google.cloud.datacatalog.v1.EntryName;
    import com.google.cloud.datacatalog.v1.FieldType;
    import com.google.cloud.datacatalog.v1.LocationName;
    import com.google.cloud.datacatalog.v1.Schema;
    import com.google.cloud.datacatalog.v1.Tag;
    import com.google.cloud.datacatalog.v1.TagField;
    import com.google.cloud.datacatalog.v1.TagTemplate;
    import com.google.cloud.datacatalog.v1.TagTemplateField;
    import com.google.cloud.datacatalog.v1.TagTemplateName;
    import java.io.IOException;
    
    public class CreateCustomType {
    
      public static void createCustomType() {
        // TODO(developer): Replace these variables before running the sample.
        String projectId = "my-project";
        String entryGroupId = "onprem_entry_group";
        String entryId = "onprem_entry_id";
        String tagTemplateId = "onprem_tag_template";
        createCustomType(projectId, entryGroupId, entryId, tagTemplateId);
      }
    
      public static void createCustomType(String projectId, String entryGroupId, String entryId,
          String tagTemplateId) {
        // Currently, Data Catalog stores metadata in the us-central1 region.
        String location = "us-central1";
    
        // Initialize client that will be used to send requests. This client only needs to be created
        // once, and can be reused for multiple requests. After completing all of your requests, call
        // the "close" method on the client to safely clean up any remaining background resources.
        try (DataCatalogClient dataCatalogClient = DataCatalogClient.create()) {
    
          // 1. Environment cleanup: delete pre-existing data.
          // Delete any pre-existing Entry with the same name
          // that will be used in step 3.
          try {
            String entryName = EntryName.of(projectId, location, entryGroupId, entryId).toString();
            dataCatalogClient.deleteEntry(entryName);
            System.out.printf("\nDeleted Entry: %s", entryName);
          } catch (PermissionDeniedException | NotFoundException e) {
            // PermissionDeniedException or NotFoundException are thrown if
            // Entry does not exist.
            System.out.println("Entry does not exist.");
          }
    
          // Delete any pre-existing Entry Group with the same name
          // that will be used in step 2.
          try {
            String entryGroupName = EntryGroupName.of(projectId, location, entryGroupId).toString();
            dataCatalogClient.deleteEntryGroup(entryGroupName);
            System.out.printf("\nDeleted Entry Group: %s", entryGroupName);
          } catch (PermissionDeniedException | NotFoundException e) {
            // PermissionDeniedException or NotFoundException are thrown if
            // Entry Group does not exist.
            System.out.println("Entry Group does not exist.");
          }
    
          String tagTemplateName =
              TagTemplateName.newBuilder()
                  .setProject(projectId)
                  .setLocation(location)
                  .setTagTemplate(tagTemplateId)
                  .build()
                  .toString();
    
          // Delete any pre-existing Template with the same name
          // that will be used in step 4.
          try {
            dataCatalogClient.deleteTagTemplate(
                DeleteTagTemplateRequest.newBuilder()
                    .setName(tagTemplateName)
                    .setForce(true)
                    .build());
            System.out.printf("\nDeleted template: %s", tagTemplateName);
          } catch (Exception e) {
            System.out.printf("\nCannot delete template: %s", tagTemplateName);
          }
    
          // 2. Create an Entry Group.
          // Construct the EntryGroup for the EntryGroup request.
          EntryGroup entryGroup =
              EntryGroup.newBuilder()
                  .setDisplayName("My awesome Entry Group")
                  .setDescription("This Entry Group represents an external system")
                  .build();
    
          // Construct the EntryGroup request to be sent by the client.
          CreateEntryGroupRequest entryGroupRequest =
              CreateEntryGroupRequest.newBuilder()
                  .setParent(LocationName.of(projectId, location).toString())
                  .setEntryGroupId(entryGroupId)
                  .setEntryGroup(entryGroup)
                  .build();
    
          // Use the client to send the API request.
          EntryGroup createdEntryGroup = dataCatalogClient.createEntryGroup(entryGroupRequest);
    
          System.out.printf("\nEntry Group created with name: %s", createdEntryGroup.getName());
    
          // 3. Create an Entry.
          // Construct the Entry for the Entry request.
          Entry entry =
              Entry.newBuilder()
                  .setUserSpecifiedSystem("onprem_data_system")
                  .setUserSpecifiedType("onprem_data_asset")
                  .setDisplayName("My awesome data asset")
                  .setDescription("This data asset is managed by an external system.")
                  .setLinkedResource("//my-onprem-server.com/dataAssets/my-awesome-data-asset")
                  .setSchema(
                      Schema.newBuilder()
                          .addColumns(
                              ColumnSchema.newBuilder()
                                  .setColumn("first_column")
                                  .setDescription("This columns consists of ....")
                                  .setMode("NULLABLE")
                                  .setType("DOUBLE")
                                  .build())
                          .addColumns(
                              ColumnSchema.newBuilder()
                                  .setColumn("second_column")
                                  .setDescription("This columns consists of ....")
                                  .setMode("REQUIRED")
                                  .setType("STRING")
                                  .build())
                          .build())
                  .build();
    
          // Construct the Entry request to be sent by the client.
          CreateEntryRequest entryRequest =
              CreateEntryRequest.newBuilder()
                  .setParent(createdEntryGroup.getName())
                  .setEntryId(entryId)
                  .setEntry(entry)
                  .build();
    
          // Use the client to send the API request.
          Entry createdEntry = dataCatalogClient.createEntry(entryRequest);
          System.out.printf("\nEntry created with name: %s", createdEntry.getName());
    
          // 4. Create a Tag Template.
          // For more field types, including ENUM, please refer to
          // https://cloud.google.com/data-catalog/docs/quickstarts/quickstart-search-tag#data-catalog-quickstart-java.
          TagTemplateField sourceField =
              TagTemplateField.newBuilder()
                  .setDisplayName("Source of data asset")
                  .setType(FieldType.newBuilder().setPrimitiveType(
                      FieldType.PrimitiveType.STRING).build())
                  .build();
    
          TagTemplate tagTemplate =
              TagTemplate.newBuilder()
                  .setDisplayName("Demo Tag Template")
                  .putFields("source", sourceField)
                  .build();
    
          CreateTagTemplateRequest createTagTemplateRequest =
              CreateTagTemplateRequest.newBuilder()
                  .setParent(
                      LocationName.newBuilder()
                          .setProject(projectId)
                          .setLocation(location)
                          .build()
                          .toString())
                  .setTagTemplateId(tagTemplateId)
                  .setTagTemplate(tagTemplate)
                  .build();
    
          TagTemplate createdTagTemplate = dataCatalogClient
              .createTagTemplate(createTagTemplateRequest);
          System.out.printf("\nTemplate created with name: %s", createdTagTemplate.getName());
    
          TagField sourceValue =
              TagField.newBuilder().setStringValue("On-premises system name").build();
    
          Tag tag =
              Tag.newBuilder()
                  .setTemplate(createdTagTemplate.getName())
                  .putFields("source", sourceValue)
                  .build();
    
          CreateTagRequest createTagRequest =
              CreateTagRequest.newBuilder().setParent(createdEntry.getName()).setTag(tag).build();
    
          Tag createdTag = dataCatalogClient.createTag(createTagRequest);
          System.out.printf("\nCreated tag: %s", createdTag.getName());
    
        } catch (AlreadyExistsException | IOException e) {
          // AlreadyExistsException is thrown if the EntryGroup or Entry already exists.
          // IOException is thrown when unable to create the DataCatalogClient,
          // for example an invalid Service Account path.
          System.out.println("Error creating entry:\n" + e.toString());
        }
      }
    }
    

Node.js

  1. Instalar a biblioteca cliente
  2. Configurar as credenciais padrão do aplicativo
  3. Execute o código
    /**
    * This application demonstrates how to perform core operations with the
    * Data Catalog API.
    
    * For more information, see the README.md and the official documentation at
    * https://cloud.google.com/data-catalog/docs.
    */
    
    const main = async (
      projectId = process.env.GCLOUD_PROJECT,
      entryGroupId,
      entryId,
      tagTemplateId
    ) => {
    
      // -------------------------------
      // Import required modules.
      // -------------------------------
      const { DataCatalogClient } = require('@google-cloud/datacatalog').v1;
      const datacatalog = new DataCatalogClient();
    
      // -------------------------------
      // Currently, Data Catalog stores metadata in the
      // us-central1 region.
      // -------------------------------
      const location = "us-central1";
    
      // -------------------------------
      // 1. Environment cleanup: delete pre-existing data.
      // -------------------------------
      // Delete any pre-existing Entry with the same name
      // that will be used in step 3.
      try {
        const entryName = datacatalog.entryPath(projectId, location, entryGroupId, entryId);
        await datacatalog.deleteEntry({ name: entryName });
        console.log(`Deleted Entry: ${entryName}`);
      } catch (err) {
        console.log('Entry does not exist.');
      }
    
      // Delete any pre-existing Entry Group with the same name
      // that will be used in step 2.
      try {
        const entryGroupName = datacatalog.entryGroupPath(projectId, location, entryGroupId);
        await datacatalog.deleteEntryGroup({ name: entryGroupName });
        console.log(`Deleted Entry Group: ${entryGroupName}`);
      } catch (err) {
        console.log('Entry Group does not exist.');
      }
    
      // Delete any pre-existing Template with the same name
      // that will be used in step 4.
      const tagTemplateName = datacatalog.tagTemplatePath(
        projectId,
        location,
        tagTemplateId,
      );
    
      try {
        const tagTemplateRequest = {
          name: tagTemplateName,
          force: true,
        };
        await datacatalog.deleteTagTemplate(tagTemplateRequest);
        console.log(`Deleted template: ${tagTemplateName}`);
      } catch (error) {
        console.log(`Cannot delete template: ${tagTemplateName}`);
      }
    
      // -------------------------------
      // 2. Create an Entry Group.
      // -------------------------------
      // Construct the EntryGroup for the EntryGroup request.
      const entryGroup = {
        displayName: 'My awesome Entry Group',
        description: 'This Entry Group represents an external system',
      }
    
      // Construct the EntryGroup request to be sent by the client.
      const entryGroupRequest = {
        parent: datacatalog.locationPath(projectId, location),
        entryGroupId: entryGroupId,
        entryGroup: entryGroup,
      };
    
      // Use the client to send the API request.
      const [createdEntryGroup] = await datacatalog.createEntryGroup(entryGroupRequest)
      console.log(`Created entry group: ${createdEntryGroup.name}`);
    
      // -------------------------------
      // 3. Create an Entry.
      // -------------------------------
      // Construct the Entry for the Entry request.
      const entry = {
        userSpecifiedSystem: 'onprem_data_system',
        userSpecifiedType: 'onprem_data_asset',
        displayName: 'My awesome data asset',
        description: 'This data asset is managed by an external system.',
        linkedResource: '//my-onprem-server.com/dataAssets/my-awesome-data-asset',
        schema: {
          columns: [
            {
              column: 'first_column',
              description: 'This columns consists of ....',
              mode: 'NULLABLE',
              type: 'STRING',
            },
            {
              column: 'second_column',
              description: 'This columns consists of ....',
              mode: 'NULLABLE',
              type: 'DOUBLE',
            }
          ],
        },
      };
    
      // Construct the Entry request to be sent by the client.
      const entryRequest = {
        parent: datacatalog.entryGroupPath(projectId, location, entryGroupId),
        entryId: entryId,
        entry: entry,
      };
    
      // Use the client to send the API request.
      const [createdEntry] = await datacatalog.createEntry(entryRequest)
      console.log(`Created entry: ${createdEntry.name}`);
    
      // -------------------------------
      // 4. Create a Tag Template.
      // For more field types, including ENUM, please refer to
      // https://cloud.google.com/data-catalog/docs/quickstarts/quickstart-search-tag#data-catalog-quickstart-nodejs.
      // -------------------------------
      const fieldSource = {
        displayName: 'Source of data asset',
        type: {
          primitiveType: 'STRING',
        },
      };
    
      const tagTemplate = {
        displayName: 'Demo Tag Template',
        fields: {
          source: fieldSource,
        },
      };
    
      tagTemplateRequest = {
        parent: datacatalog.locationPath(projectId, location),
        tagTemplateId: tagTemplateId,
        tagTemplate: tagTemplate,
      };
    
      // Use the client to send the API request.
      const [createdTagTemplate] = await datacatalog.createTagTemplate(tagTemplateRequest);
      console.log(`Created template: ${createdTagTemplate.name}`);
    
      // -------------------------------
      // 5. Attach a Tag to the custom Entry.
      // -------------------------------
      const tag = {
        template: createdTagTemplate.name,
        fields: {
          source: {
            stringValue: 'On-premises system name',
          },
        },
      };
    
      const tagRequest = {
        parent: createdEntry.name,
        tag: tag,
      };
    
      // Use the client to send the API request.
      const [createdTag] = await datacatalog.createTag(tagRequest);
      console.log(`Created tag: ${createdTag.name}`);
    
      // [END datacatalog_custom_entries_tag]
    
    };
    
    // TODO: Change these values before running the sample
    // node createCustomType.js my-project onprem_entry_group onprem_entry_id onprem_tag_template
    main(...process.argv.slice(2));