Creating custom Data Catalog entries

You can call Data Catalog APIs to create and manage entries for custom data resource types. In this document, an entry for a custom data resource type is referred to as a "custom entry".

Creating entry groups and custom entries

Custom entries must be placed within a user-created entry group. You create the entry group, then create the custom entry within the entry group.

After creating an entry, you can set IAM policies on the entry group to define who has access to the entry group and the entries inside.

REST & CMD LINE

See the following examples and refer to the Data Catalog REST API entryGroups.create and entryGroups.entries.create documentation.

1. Create an entry group

Before using any of the request data below, make the following replacements:

  • project-id: Your GCP project ID
  • entryGroupId: The ID must begin with a letter or underscore, contain only English letters, numbers and underscores, and be at most 64 characters.
  • displayName: The textual name for the entry group.

HTTP method and URL:

POST https://datacatalog.googleapis.com/v1/projects/project-id/locations/us-central1/entryGroups?entryGroupId=entryGroupId

Request JSON body:

{
  "displayName": "Entry Group display name"
}

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

{
  "name": "projects/my_projectid/locations/us-central1/entryGroups/my_entry_group",
  "displayName": "Entry Group display name",
  "dataCatalogTimestamps": {
    "createTime": "2019-10-19T16:35:50.135Z",
    "updateTime": "2019-10-19T16:35:50.135Z"
  }
}

2. Create a custom entry within the entry group

Before using any of the request data below, make the following replacements:

  • project_id: Your GCP project ID.
  • entryGroupId: ID of the existing entryGroup. The entry will be created in this EntryGroup.
  • entryId: ID of the new entry. The ID must begin with a letter or underscore, contain only English letters, numbers and underscores, and have at most 64 characters.
  • description: Optional entry description
  • displayName: Optional textual name for the entry.
  • userSpecifiedType: Custom type name. The type name must begin with a letter or underscore, must only contain letters, numbers, and underscores, and must have at most 64 characters.
  • userSpecifiedSystem: The entry's non-GCP source system, which is not integrated with Data Catalog. The source system name must begin with a letter or underscore, must only contain letters, numbers, and underscores, and must have at most 64 characters.
  • linkedResource: Optional fullname of the resource the entry refers to.
  • schema: Optional data schema.

    Example JSON schema:
    { ...
      "schema": {
        "columns": [
          {
            "column": "first_name",
            "description": "First name",
            "mode": "REQUIRED",
            "type": "STRING"
          },
          {
            "column": "last_name",
            "description": "Last name",
            "mode": "REQUIRED",
            "type": "STRING"
          },
          {
            "column": "address",
            "description": "Address",
            "mode": "REPEATED",
            "subcolumns": [
              {
                "column": "city",
                "description": "City",
                "mode": "NULLABLE",
                "type": "STRING"
              },
              {
                "column": "state",
                "description": "State",
                "mode": "NULLABLE",
                "type": "STRING"
              }
            ],
            "type": "RECORD"
          }
        ]
      }
    ...
    }
    

HTTP method and URL:

POST https://datacatalog.googleapis.com/v1/projects/project_id/locations/us-central1/entryGroups/entryGroupId/entries?entryId=entryId

Request JSON body:

{
  "description": "Description",
  "displayName": "Display name",
  "user_specified_type": "my_type",
  "user_specified_system": "my_system",
  "linked_resource": "abc.com/def",
  "schema": { schema }
}

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

{
  "name": "projects/my_project_id/locations/us-central1/entryGroups/my_entryGroup_id/entries/my_entry_id",
  "userSpecifiedType": "my-type",
  "userSpecifiedSystem": "my_system",
  "displayName": "On-prem entry",
  "description": "My entry description.",
  "schema": {
    "columns": [
      {
        "type": "STRING",
        "description": "First name",
        "mode": "REQUIRED",
        "column": "first_name"
      },
      {
        "type": "STRING",
        "description": "Last name",
        "mode": "REQUIRED",
        "column": "last_name"
      },
      {
        "type": "RECORD",
        "description": "Address",
        "mode": "REPEATED",
        "column": "address",
        "subcolumns": [
          {
            "type": "STRING",
            "description": "City",
            "mode": "NULLABLE",
            "column": "city"
          },
          {
            "type": "STRING",
            "description": "State",
            "mode": "NULLABLE",
            "column": "state"
          }
        ]
      }
    ]
  },
  "sourceSystemTimestamps": {
    "createTime": "2019-10-23T23:11:26.326Z",
    "updateTime": "2019-10-23T23:11:26.326Z"
  },
"linkedResource": "abc.com/def"
}

Python

  1. Install the client library
  2. Set up application default credentials
  3. Run the code
    """
    This application demonstrates how to perform core operations with the
    Data Catalog API.
    
    For more information, see the README.md and the official documentation at
    https://cloud.google.com/data-catalog/docs.
    """
    
    # -------------------------------
    # Import required modules.
    # -------------------------------
    from google.api_core.exceptions import NotFound, PermissionDenied
    from google.cloud import datacatalog_v1
    
    # -------------------------------
    # Currently, Data Catalog stores metadata in the
    # us-central1 region.
    # -------------------------------
    location = 'us-central1'
    
    # -------------------------------
    # TODO: Set these values before running the sample.
    # -------------------------------
    project_id = 'my-project'
    entry_group_id = 'onprem_entry_group'
    entry_id = 'onprem_entry_id'
    tag_template_id = 'onprem_tag_template'
    
    # -------------------------------
    # Use Application Default Credentials to create a new
    # Data Catalog client. GOOGLE_APPLICATION_CREDENTIALS
    # environment variable must be set with the location
    # of a service account key file.
    # -------------------------------
    datacatalog = datacatalog_v1.DataCatalogClient()
    
    # -------------------------------
    # 1. Environment cleanup: delete pre-existing data.
    # -------------------------------
    # Delete any pre-existing Entry with the same name
    # that will be used in step 3.
    expected_entry_name = datacatalog_v1.DataCatalogClient \
        .entry_path(project_id, location, entry_group_id, entry_id)
    
    try:
        datacatalog.delete_entry(name=expected_entry_name)
    except (NotFound, PermissionDenied):
        pass
    
    # Delete any pre-existing Entry Group with the same name
    # that will be used in step 2.
    expected_entry_group_name = datacatalog_v1.DataCatalogClient \
        .entry_group_path(project_id, location, entry_group_id)
    
    try:
        datacatalog.delete_entry_group(name=expected_entry_group_name)
    except (NotFound, PermissionDenied):
        pass
    
    # Delete any pre-existing Template with the same name
    # that will be used in step 4.
    expected_template_name = datacatalog_v1.DataCatalogClient \
        .tag_template_path(project_id, location, tag_template_id)
    
    try:
        datacatalog.delete_tag_template(name=expected_template_name, force=True)
    except (NotFound, PermissionDenied):
        pass
    
    # -------------------------------
    # 2. Create an Entry Group.
    # -------------------------------
    entry_group_obj = datacatalog_v1.types.EntryGroup()
    entry_group_obj.display_name = 'My awesome Entry Group'
    entry_group_obj.description = 'This Entry Group represents an external system'
    
    entry_group = datacatalog.create_entry_group(
        parent=datacatalog_v1.DataCatalogClient.location_path(project_id, location),
        entry_group_id=entry_group_id,
        entry_group=entry_group_obj)
    print('Created entry group: {}'.format(entry_group.name))
    
    # -------------------------------
    # 3. Create an Entry.
    # -------------------------------
    entry = datacatalog_v1.types.Entry()
    entry.user_specified_system = 'onprem_data_system'
    entry.user_specified_type = 'onprem_data_asset'
    entry.display_name = 'My awesome data asset'
    entry.description = 'This data asset is managed by an external system.'
    entry.linked_resource = '//my-onprem-server.com/dataAssets/my-awesome-data-asset'
    
    # Create the Schema, this is optional.
    columns = []
    columns.append(datacatalog_v1.types.ColumnSchema(
        column='first_column',
        type='STRING',
        description='This columns consists of ....',
        mode=None))
    
    columns.append(datacatalog_v1.types.ColumnSchema(
        column='second_column',
        type='DOUBLE',
        description='This columns consists of ....',
        mode=None))
    
    entry.schema.columns.extend(columns)
    
    entry = datacatalog.create_entry(
        parent=entry_group.name,
        entry_id=entry_id,
        entry=entry)
    print('Created entry: {}'.format(entry.name))
    
    # -------------------------------
    # 4. Create a Tag Template.
    # For more field types, including ENUM, please refer to
    # https://cloud.google.com/data-catalog/docs/quickstarts/quickstart-search-tag#data-catalog
    # -quickstart-python.
    # -------------------------------
    tag_template = datacatalog_v1.types.TagTemplate()
    tag_template.display_name = 'On-premises Tag Template'
    tag_template.fields['source'].display_name = 'Source of the data asset'
    tag_template.fields['source'].type.primitive_type = \
        datacatalog_v1.enums.FieldType.PrimitiveType.STRING.value
    
    tag_template = datacatalog.create_tag_template(
        parent=datacatalog_v1.DataCatalogClient.location_path(project_id, location),
        tag_template_id=tag_template_id,
        tag_template=tag_template)
    print('Created template: {}'.format(tag_template.name))
    
    # -------------------------------
    # 5. Attach a Tag to the custom Entry.
    # -------------------------------
    tag = datacatalog_v1.types.Tag()
    tag.template = tag_template.name
    tag.fields['source'].string_value = 'On-premises system name'
    
    tag = datacatalog.create_tag(parent=entry.name, tag=tag)
    print('Created tag: {}'.format(tag.name))
    
    

Java

  1. Install the client library
  2. Set up application default credentials
  3. Run the code
    /*
    This application demonstrates how to perform core operations with the
    Data Catalog API.
    
    For more information, see the README.md and the official documentation at
    https://cloud.google.com/data-catalog/docs.
    */
    
    package com.example.datacatalog;
    
    
    import com.google.api.gax.rpc.AlreadyExistsException;
    import com.google.api.gax.rpc.NotFoundException;
    import com.google.api.gax.rpc.PermissionDeniedException;
    import com.google.cloud.datacatalog.v1.ColumnSchema;
    import com.google.cloud.datacatalog.v1.CreateEntryGroupRequest;
    import com.google.cloud.datacatalog.v1.CreateEntryRequest;
    import com.google.cloud.datacatalog.v1.CreateTagRequest;
    import com.google.cloud.datacatalog.v1.CreateTagTemplateRequest;
    import com.google.cloud.datacatalog.v1.DataCatalogClient;
    import com.google.cloud.datacatalog.v1.DeleteTagTemplateRequest;
    import com.google.cloud.datacatalog.v1.Entry;
    import com.google.cloud.datacatalog.v1.EntryGroup;
    import com.google.cloud.datacatalog.v1.EntryGroupName;
    import com.google.cloud.datacatalog.v1.EntryName;
    import com.google.cloud.datacatalog.v1.FieldType;
    import com.google.cloud.datacatalog.v1.LocationName;
    import com.google.cloud.datacatalog.v1.Schema;
    import com.google.cloud.datacatalog.v1.Tag;
    import com.google.cloud.datacatalog.v1.TagField;
    import com.google.cloud.datacatalog.v1.TagTemplate;
    import com.google.cloud.datacatalog.v1.TagTemplateField;
    import com.google.cloud.datacatalog.v1.TagTemplateName;
    import java.io.IOException;
    
    public class CreateCustomType {
    
      public static void createCustomType() {
        // TODO(developer): Replace these variables before running the sample.
        String projectId = "my-project";
        String entryGroupId = "onprem_entry_group";
        String entryId = "onprem_entry_id";
        String tagTemplateId = "onprem_tag_template";
        createCustomType(projectId, entryGroupId, entryId, tagTemplateId);
      }
    
      public static void createCustomType(String projectId, String entryGroupId, String entryId,
          String tagTemplateId) {
        // Currently, Data Catalog stores metadata in the us-central1 region.
        String location = "us-central1";
    
        // Initialize client that will be used to send requests. This client only needs to be created
        // once, and can be reused for multiple requests. After completing all of your requests, call
        // the "close" method on the client to safely clean up any remaining background resources.
        try (DataCatalogClient dataCatalogClient = DataCatalogClient.create()) {
    
          // 1. Environment cleanup: delete pre-existing data.
          // Delete any pre-existing Entry with the same name
          // that will be used in step 3.
          try {
            String entryName = EntryName.of(projectId, location, entryGroupId, entryId).toString();
            dataCatalogClient.deleteEntry(entryName);
            System.out.printf("\nDeleted Entry: %s", entryName);
          } catch (PermissionDeniedException | NotFoundException e) {
            // PermissionDeniedException or NotFoundException are thrown if
            // Entry does not exist.
            System.out.println("Entry does not exist.");
          }
    
          // Delete any pre-existing Entry Group with the same name
          // that will be used in step 2.
          try {
            String entryGroupName = EntryGroupName.of(projectId, location, entryGroupId).toString();
            dataCatalogClient.deleteEntryGroup(entryGroupName);
            System.out.printf("\nDeleted Entry Group: %s", entryGroupName);
          } catch (PermissionDeniedException | NotFoundException e) {
            // PermissionDeniedException or NotFoundException are thrown if
            // Entry Group does not exist.
            System.out.println("Entry Group does not exist.");
          }
    
          String tagTemplateName =
              TagTemplateName.newBuilder()
                  .setProject(projectId)
                  .setLocation(location)
                  .setTagTemplate(tagTemplateId)
                  .build()
                  .toString();
    
          // Delete any pre-existing Template with the same name
          // that will be used in step 4.
          try {
            dataCatalogClient.deleteTagTemplate(
                DeleteTagTemplateRequest.newBuilder()
                    .setName(tagTemplateName)
                    .setForce(true)
                    .build());
            System.out.printf("\nDeleted template: %s", tagTemplateName);
          } catch (Exception e) {
            System.out.printf("\nCannot delete template: %s", tagTemplateName);
          }
    
          // 2. Create an Entry Group.
          // Construct the EntryGroup for the EntryGroup request.
          EntryGroup entryGroup =
              EntryGroup.newBuilder()
                  .setDisplayName("My awesome Entry Group")
                  .setDescription("This Entry Group represents an external system")
                  .build();
    
          // Construct the EntryGroup request to be sent by the client.
          CreateEntryGroupRequest entryGroupRequest =
              CreateEntryGroupRequest.newBuilder()
                  .setParent(LocationName.of(projectId, location).toString())
                  .setEntryGroupId(entryGroupId)
                  .setEntryGroup(entryGroup)
                  .build();
    
          // Use the client to send the API request.
          EntryGroup createdEntryGroup = dataCatalogClient.createEntryGroup(entryGroupRequest);
    
          System.out.printf("\nEntry Group created with name: %s", createdEntryGroup.getName());
    
          // 3. Create an Entry.
          // Construct the Entry for the Entry request.
          Entry entry =
              Entry.newBuilder()
                  .setUserSpecifiedSystem("onprem_data_system")
                  .setUserSpecifiedType("onprem_data_asset")
                  .setDisplayName("My awesome data asset")
                  .setDescription("This data asset is managed by an external system.")
                  .setLinkedResource("//my-onprem-server.com/dataAssets/my-awesome-data-asset")
                  .setSchema(
                      Schema.newBuilder()
                          .addColumns(
                              ColumnSchema.newBuilder()
                                  .setColumn("first_column")
                                  .setDescription("This columns consists of ....")
                                  .setMode("NULLABLE")
                                  .setType("DOUBLE")
                                  .build())
                          .addColumns(
                              ColumnSchema.newBuilder()
                                  .setColumn("second_column")
                                  .setDescription("This columns consists of ....")
                                  .setMode("REQUIRED")
                                  .setType("STRING")
                                  .build())
                          .build())
                  .build();
    
          // Construct the Entry request to be sent by the client.
          CreateEntryRequest entryRequest =
              CreateEntryRequest.newBuilder()
                  .setParent(createdEntryGroup.getName())
                  .setEntryId(entryId)
                  .setEntry(entry)
                  .build();
    
          // Use the client to send the API request.
          Entry createdEntry = dataCatalogClient.createEntry(entryRequest);
          System.out.printf("\nEntry created with name: %s", createdEntry.getName());
    
          // 4. Create a Tag Template.
          // For more field types, including ENUM, please refer to
          // https://cloud.google.com/data-catalog/docs/quickstarts/quickstart-search-tag#data-catalog-quickstart-java.
          TagTemplateField sourceField =
              TagTemplateField.newBuilder()
                  .setDisplayName("Source of data asset")
                  .setType(FieldType.newBuilder().setPrimitiveType(
                      FieldType.PrimitiveType.STRING).build())
                  .build();
    
          TagTemplate tagTemplate =
              TagTemplate.newBuilder()
                  .setDisplayName("Demo Tag Template")
                  .putFields("source", sourceField)
                  .build();
    
          CreateTagTemplateRequest createTagTemplateRequest =
              CreateTagTemplateRequest.newBuilder()
                  .setParent(
                      LocationName.newBuilder()
                          .setProject(projectId)
                          .setLocation(location)
                          .build()
                          .toString())
                  .setTagTemplateId(tagTemplateId)
                  .setTagTemplate(tagTemplate)
                  .build();
    
          TagTemplate createdTagTemplate = dataCatalogClient
              .createTagTemplate(createTagTemplateRequest);
          System.out.printf("\nTemplate created with name: %s", createdTagTemplate.getName());
    
          TagField sourceValue =
              TagField.newBuilder().setStringValue("On-premises system name").build();
    
          Tag tag =
              Tag.newBuilder()
                  .setTemplate(createdTagTemplate.getName())
                  .putFields("source", sourceValue)
                  .build();
    
          CreateTagRequest createTagRequest =
              CreateTagRequest.newBuilder().setParent(createdEntry.getName()).setTag(tag).build();
    
          Tag createdTag = dataCatalogClient.createTag(createTagRequest);
          System.out.printf("\nCreated tag: %s", createdTag.getName());
    
        } catch (AlreadyExistsException | IOException e) {
          // AlreadyExistsException is thrown if the EntryGroup or Entry already exists.
          // IOException is thrown when unable to create the DataCatalogClient,
          // for example an invalid Service Account path.
          System.out.println("Error creating entry:\n" + e.toString());
        }
      }
    }
    

Node.js

  1. Install the client library
  2. Set up application default credentials
  3. Run the code
    /**
    * This application demonstrates how to perform core operations with the
    * Data Catalog API.
    
    * For more information, see the README.md and the official documentation at
    * https://cloud.google.com/data-catalog/docs.
    */
    
    const main = async (
      projectId = process.env.GCLOUD_PROJECT,
      entryGroupId,
      entryId,
      tagTemplateId
    ) => {
    
      // -------------------------------
      // Import required modules.
      // -------------------------------
      const { DataCatalogClient } = require('@google-cloud/datacatalog').v1;
      const datacatalog = new DataCatalogClient();
    
      // -------------------------------
      // Currently, Data Catalog stores metadata in the
      // us-central1 region.
      // -------------------------------
      const location = "us-central1";
    
      // -------------------------------
      // 1. Environment cleanup: delete pre-existing data.
      // -------------------------------
      // Delete any pre-existing Entry with the same name
      // that will be used in step 3.
      try {
        const entryName = datacatalog.entryPath(projectId, location, entryGroupId, entryId);
        await datacatalog.deleteEntry({ name: entryName });
        console.log(`Deleted Entry: ${entryName}`);
      } catch (err) {
        console.log('Entry does not exist.');
      }
    
      // Delete any pre-existing Entry Group with the same name
      // that will be used in step 2.
      try {
        const entryGroupName = datacatalog.entryGroupPath(projectId, location, entryGroupId);
        await datacatalog.deleteEntryGroup({ name: entryGroupName });
        console.log(`Deleted Entry Group: ${entryGroupName}`);
      } catch (err) {
        console.log('Entry Group does not exist.');
      }
    
      // Delete any pre-existing Template with the same name
      // that will be used in step 4.
      const tagTemplateName = datacatalog.tagTemplatePath(
        projectId,
        location,
        tagTemplateId,
      );
    
      try {
        const tagTemplateRequest = {
          name: tagTemplateName,
          force: true,
        };
        await datacatalog.deleteTagTemplate(tagTemplateRequest);
        console.log(`Deleted template: ${tagTemplateName}`);
      } catch (error) {
        console.log(`Cannot delete template: ${tagTemplateName}`);
      }
    
      // -------------------------------
      // 2. Create an Entry Group.
      // -------------------------------
      // Construct the EntryGroup for the EntryGroup request.
      const entryGroup = {
        displayName: 'My awesome Entry Group',
        description: 'This Entry Group represents an external system',
      }
    
      // Construct the EntryGroup request to be sent by the client.
      const entryGroupRequest = {
        parent: datacatalog.locationPath(projectId, location),
        entryGroupId: entryGroupId,
        entryGroup: entryGroup,
      };
    
      // Use the client to send the API request.
      const [createdEntryGroup] = await datacatalog.createEntryGroup(entryGroupRequest)
      console.log(`Created entry group: ${createdEntryGroup.name}`);
    
      // -------------------------------
      // 3. Create an Entry.
      // -------------------------------
      // Construct the Entry for the Entry request.
      const entry = {
        userSpecifiedSystem: 'onprem_data_system',
        userSpecifiedType: 'onprem_data_asset',
        displayName: 'My awesome data asset',
        description: 'This data asset is managed by an external system.',
        linkedResource: '//my-onprem-server.com/dataAssets/my-awesome-data-asset',
        schema: {
          columns: [
            {
              column: 'first_column',
              description: 'This columns consists of ....',
              mode: 'NULLABLE',
              type: 'STRING',
            },
            {
              column: 'second_column',
              description: 'This columns consists of ....',
              mode: 'NULLABLE',
              type: 'DOUBLE',
            }
          ],
        },
      };
    
      // Construct the Entry request to be sent by the client.
      const entryRequest = {
        parent: datacatalog.entryGroupPath(projectId, location, entryGroupId),
        entryId: entryId,
        entry: entry,
      };
    
      // Use the client to send the API request.
      const [createdEntry] = await datacatalog.createEntry(entryRequest)
      console.log(`Created entry: ${createdEntry.name}`);
    
      // -------------------------------
      // 4. Create a Tag Template.
      // For more field types, including ENUM, please refer to
      // https://cloud.google.com/data-catalog/docs/quickstarts/quickstart-search-tag#data-catalog-quickstart-nodejs.
      // -------------------------------
      const fieldSource = {
        displayName: 'Source of data asset',
        type: {
          primitiveType: 'STRING',
        },
      };
    
      const tagTemplate = {
        displayName: 'Demo Tag Template',
        fields: {
          source: fieldSource,
        },
      };
    
      tagTemplateRequest = {
        parent: datacatalog.locationPath(projectId, location),
        tagTemplateId: tagTemplateId,
        tagTemplate: tagTemplate,
      };
    
      // Use the client to send the API request.
      const [createdTagTemplate] = await datacatalog.createTagTemplate(tagTemplateRequest);
      console.log(`Created template: ${createdTagTemplate.name}`);
    
      // -------------------------------
      // 5. Attach a Tag to the custom Entry.
      // -------------------------------
      const tag = {
        template: createdTagTemplate.name,
        fields: {
          source: {
            stringValue: 'On-premises system name',
          },
        },
      };
    
      const tagRequest = {
        parent: createdEntry.name,
        tag: tag,
      };
    
      // Use the client to send the API request.
      const [createdTag] = await datacatalog.createTag(tagRequest);
      console.log(`Created tag: ${createdTag.name}`);
    
      // [END datacatalog_custom_entries_tag]
    
    };
    
    // TODO: Change these values before running the sample
    // node createCustomType.js my-project onprem_entry_group onprem_entry_id onprem_tag_template
    main(...process.argv.slice(2));