Use platform logs to troubleshoot Cloud Storage import topics

This guide explains how to use Google Cloud platform logs to troubleshoot issues when you are using Cloud Storage import topics to ingest data.

About ingestion failure in Cloud Storage import topics

Cloud Storage import topics can encounter issues that prevent data from being successfully ingested. For example, when using a Cloud Storage import topic, you might face issues ingesting a Cloud Storage object or part of an object.

The following list describes reasons for ingestion failure in Cloud Storage import topics that generate platform logs:

  • Message size

    • Individual messages can't be larger than 10 MB. If they are, the entire message is skipped.

    • If you're using the Avro or the Pub/Sub Avro format, message blocks can't be larger than 16 MB. Larger message blocks are skipped.

  • Message attributes

    • Messages can have a maximum of 100 attributes. Any extra attribute is dropped when the message is ingested.

    • Attribute keys can't be larger than 256 bytes and values can't be larger than 1024 bytes. Larger keys or values are removed from the message when it is ingested.

      For more information about the guidelines for using message keys and attributes, see Use attributes to publish a message.

  • Avro formatting

    • Make sure your Avro objects are correctly formatted. Incorrect formatting prevents the message from being ingested.
  • Data format

About platform logs

A supported Google Cloud service generates its own set of platform logs, capturing events and activities relevant to that service's operation. These platform logs contain detailed information about what's happening within a service, including successful operations, errors, warnings, and other noteworthy events.

Platform logs are a part of Cloud Logging and share the same features. For example, the following is a list of important features for platform logs:

  • Logs are typically structured as JSON objects that allow for further querying and filtering.

  • You can view platform logs by using Logging in the console.

  • Platform logs can also be integrated with Cloud Monitoring and other monitoring tools to create dashboards, alerts, and other monitoring mechanisms.

  • Log storage incurs costs based on ingested volume and retention period.

For more information about platform logs, see Google Cloud platform logs.

Required roles and permissions to use platform logs

Before you begin, verify that you have access to Logging. You require the Logs Viewer (roles/logging.viewer) Identity and Access Management (IAM) role. For more information about Logging access, see Access control with IAM.

The following describe how to verify and grant IAM access:

Enable platform logs

Platform logs is disabled by default for import topics. You can enable platform logs when you create or update a Cloud Storage import topic.

To disable platform logs, update the Cloud Storage import topic.

Enable platform logs while creating a Cloud Storage import topic

Ensure that you have completed the prerequisites for creating a Cloud Storage import topic.

To create a Cloud Storage import topic with platform logs enabled, follow these steps:

Console

  1. In the Google Cloud console, go to the Topics page.

    Go to Topics

  2. Click Create topic.

    The topic details page opens.

  3. In the Topic ID field, enter an ID for your Cloud Storage import topic.

    For more information about naming topics, see the naming guidelines.

  4. Select Add a default subscription.

  5. Select Enable ingestion.

  6. Specify the options for ingestion by following the instructions in Create a Cloud Storage import topic.
  7. Select Enable platform logs.
  8. Retain the other default settings.
  9. Click Create topic.

gcloud

  1. In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

    At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

  2. To enable platform logs, ensure the --ingestion-log-severity flag is set to WARNING or higher. Run the gcloud pubsub topics create command:

    gcloud pubsub topics create TOPIC_ID\
        --cloud-storage-ingestion-bucket=BUCKET_NAME\
        --cloud-storage-ingestion-input-format=INPUT_FORMAT\
        --ingestion-log-severity=WARNING
    

    Replace the following:

    • TOPIC_ID: The name or ID of your topic.

    • BUCKET_NAME: Specifies the name of an existing bucket. For example, prod_bucket. The bucket name must not include the project ID. To create a bucket, see Create buckets.

    • INPUT_FORMAT: Specifies the format of the objects that is ingested. This can be text, avro, or pubsub_avro. For more information about these options, See Input format.

If you run into issues, see Troubleshooting a Cloud Storage import topic.

Enable platform logs while updating a Cloud Storage import topic

Perform the following steps:

Console

  1. In the Google Cloud console, go to the Topics page.

    Go to Topics

  2. Click the Cloud Storage import topic.

  3. In the topic details page, click Edit.

  4. Select Enable platform logs.
  5. Click Update.

gcloud

  1. In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

    At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

  2. To avoid losing your settings for the import topic, make sure to include all of them every time you update the topic. If you leave something out, Pub/Sub resets the setting to its original default value.

    To enable platform logs, ensure the ingestion-log-severity is set to WARNING. Run the gcloud pubsub topics update command with all the flags mentioned in the following sample:

    gcloud pubsub topics update TOPIC_ID \
        --cloud-storage-ingestion-bucket=BUCKET_NAME\
        --cloud-storage-ingestion-input-format=INPUT_FORMAT\
        --cloud-storage-ingestion-text-delimiter=TEXT_DELIMITER\
        --cloud-storage-ingestion-minimum-object-create-time=MINIMUM_OBJECT_CREATE_TIME\
        --cloud-storage-ingestion-match-glob=MATCH_GLOB
        --ingestion-log-severity=WARNING
    

    Replace the following:

    • TOPIC_ID is the topic ID or name. This field cannot be updated.

    • BUCKET_NAME: Specifies the name of an existing bucket. For example, prod_bucket. The bucket name must not include the project ID.

    • INPUT_FORMAT: Specifies the format of the objects that is ingested. This can be text, avro, or pubsub_avro. For more information about these options, see Input format.

    • TEXT_DELIMITER: Specifies the delimiter with which to split text objects into Pub/Sub messages. This should be a single character and should only be set when INPUT_FORMAT is text. It defaults to the newline character (\n).

      When using gcloud CLI to specify the delimiter, pay close attention to the handling of special characters like newline \n. Use the format '\n' to ensure the delimiter is correctly interpreted. Simply using \n without quotes or escaping results in a delimiter of "n".

    • MINIMUM_OBJECT_CREATE_TIME: Specifies the minimum time at which an object was created in order for it to be ingested. This should be in UTC in the format YYYY-MM-DDThh:mm:ssZ. For example, 2024-10-14T08:30:30Z.

      Any date, past or future, from 0001-01-01T00:00:00Z to 9999-12-31T23:59:59Z inclusive, is valid.

    • MATCH_GLOB: Specifies the glob pattern to match in order for an object to be ingested. When you are using gcloud CLI, a match glob with * characters must have the * character formatted as escaped in the form \*\*.txt or the whole match glob must be in quotes "**.txt" or '**.txt'. For information about supported syntax for glob patterns, see the Cloud Storage documentation.

Disable platform logs

Perform the following steps:

Console

  1. In the Google Cloud console, go to the Topics page.

    Go to Topics

  2. Click the Cloud Storage import topic.

  3. In the topic details page, click Edit.

  4. Clear Enable platform logs.
  5. Click Update.

gcloud

  1. In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

    At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

  2. To avoid losing your settings for the import topic, make sure to include all of them every time you update the topic. If you leave something out, Pub/Sub resets the setting to its original default value.

    To disable platform logs, ensure the ingestion-log-severity is set to DISABLED. Run the gcloud pubsub topics update command with all the flags mentioned in the following sample:

    gcloud pubsub topics update TOPIC_ID \
        --cloud-storage-ingestion-bucket=BUCKET_NAME\
        --cloud-storage-ingestion-input-format=INPUT_FORMAT\
        --cloud-storage-ingestion-text-delimiter=TEXT_DELIMITER\
        --cloud-storage-ingestion-minimum-object-create-time=MINIMUM_OBJECT_CREATE_TIME\
        --cloud-storage-ingestion-match-glob=MATCH_GLOB
        --ingestion-log-severity=DISABLED
    

    Replace the following:

    • TOPIC_ID is the topic ID or name. This field cannot be updated.

    • BUCKET_NAME: Specifies the name of an existing bucket. For example, prod_bucket. The bucket name must not include the project ID.

    • INPUT_FORMAT: Specifies the format of the objects that is ingested. This can be text, avro, or pubsub_avro. For more information about these options, see Input format.

    • TEXT_DELIMITER: Specifies the delimiter with which to split text objects into Pub/Sub messages. This should be a single character and should only be set when INPUT_FORMAT is text. It defaults to the newline character (\n).

      When using gcloud CLI to specify the delimiter, pay close attention to the handling of special characters like newline \n. Use the format '\n' to ensure the delimiter is correctly interpreted. Simply using \n without quotes or escaping results in a delimiter of "n".

    • MINIMUM_OBJECT_CREATE_TIME: Specifies the minimum time at which an object was created in order for it to be ingested. This should be in UTC in the format YYYY-MM-DDThh:mm:ssZ. For example, 2024-10-14T08:30:30Z.

      Any date, past or future, from 0001-01-01T00:00:00Z to 9999-12-31T23:59:59Z inclusive, is valid.

    • MATCH_GLOB: Specifies the glob pattern to match in order for an object to be ingested. When you are using gcloud CLI, a match glob with * characters must have the * character formatted as escaped in the form \*\*.txt or the whole match glob must be in quotes "**.txt" or '**.txt'. For information about supported syntax for glob patterns, see the Cloud Storage documentation.

View platform logs

To view platform logs for Cloud Storage import topic, do the following:

Google Cloud console

  1. In the Google Cloud console, go to Logs Explorer.

    Go to Logs Explorer

  2. Select a Google Cloud project.

  3. If required, from the Upgrade menu, switch from Legacy Logs Viewer to Logs Explorer.

  4. To filter your logs to show only entries for Cloud Storage import topics, type resource.type="resource.type=pubsub_topic AND severity=WARNING into the query field and click Run query.

  5. In the Query results pane, click Edit time to change the time period for which to return results.

For more information about using the Logs Explorer, see Using the Logs Explorer.

gcloud CLI

To use the gcloud CLI to search for platform logs for Cloud Storage import topics, use the gcloud logging read command.

Specify a filter to limit your results to platform logs for Cloud Storage import topics.

gcloud logging read "resource.type=pubsub_topic AND severity=WARNING"

Cloud Logging API

Use the entries.list Cloud Logging API method.

To filter your results to include only platform logs for Cloud Storage import topics, use the filter field. The following is a sample JSON request object.

{
"resourceNames":
  [
    "projects/my-project-name"
  ],
  "orderBy": "timestamp desc",
  "filter": "resource.type=\"pubsub_topic\" AND severity=WARNING"
}

View and understand platform log format

The following section includes sample platform logs and describes the fields for platform logs.

All platform log specific fields are contained within a jsonPayload object.

Avro failure

{
  "insertId": "1xnzx8md4768",
  "jsonPayload": {
    "@type": "type.googleapis.com/google.pubsub.v1.IngestionFailureEvent",
    "cloudStorageFailure": {
      "objectGeneration": "1661148924738910",
      "bucket": "bucket_in_avro_format",
      "objectName": "counts/taxi-2022-08-15T06:10:00.000Z-2022-08-15T06:15:00.000Z-pane-0-last-00-of-01",
      "avroFailureReason": {}
    },
    "topic": "projects/interpod-p2-management/topics/avro_bucket_topic",
    "errorMessage": "Unable to parse the header of the object. The object won't be ingested."
  },
  "resource": {
    "type": "pubsub_topic",
    "labels": {
      "project_id": "interpod-p2-management",
      "topic_id": "avro_bucket_topic"
    }
  },
  "timestamp": "2024-10-07T18:55:45.650103193Z",
  "severity": "WARNING",
  "logName": "projects/interpod-p2-management/logs/pubsub.googleapis.com%2Fingestion_failures",
  "receiveTimestamp": "2024-10-07T18:55:46.678221398Z"
}
Log field Description
insertId A unique identifier for the log entry.
jsonPayload.@type Identifies the event type. Always type.googleapis.com/google.pubsub.v1.IngestionFailureEvent.
jsonPayload.cloudStorageFailure.objectGeneration The generation number of the Cloud Storage object.
jsonPayload.cloudStorageFailure.bucket The Cloud Storage bucket containing the object.
jsonPayload.cloudStorageFailure.objectName The name of the Cloud Storage object.
jsonPayload.cloudStorageFailure.avroFailureReason Contains more specific Avro parsing error details. This field is left empty.
jsonPayload.topic The Pub/Sub topic the message was intended for.
jsonPayload.errorMessage A human-readable error message.
resource.type The resource type. Always pubsub_topic.
resource.labels.project_id The Google Cloud project ID.
resource.labels.topic_id The Pub/Sub topic ID.
timestamp Log entry generation timestamp.
severity Severity level which is WARNING.
logName Name of the log.
receiveTimestamp Log entry received timestamp.

Text failure

{
  "insertId": "1kc4puoag",
  "jsonPayload": {
    "@type": "type.googleapis.com/google.pubsub.v1.IngestionFailureEvent",
    "cloudStorageFailure": {
      "bucket": "bucket_in_text_format",
      "apiViolationReason": {},
      "objectName": "counts/taxi-2022-08-15T06:10:00.000Z-2022-08-15T06:15:00.000Z-pane-0-last-00-of-01",
      "objectGeneration": "1727990048026758"
    },
    "topic": "projects/interpod-p2-management/topics/large_text_bucket_topic",
    "errorMessage": "The message has exceeded the maximum allowed size of 10000000 bytes. The message won't be published."
  },
  "resource": {
    "type": "pubsub_topic",
    "labels": {
      "topic_id": "large_text_bucket_topic",
      "project_id": "interpod-p2-management"
    }
  },
  "timestamp": "2024-10-09T14:09:07.760488386Z",
  "severity": "WARNING",
  "logName": "projects/interpod-p2-management/logs/pubsub.googleapis.com%2Fingestion_failures",
  "receiveTimestamp": "2024-10-09T14:09:08.483589656Z"
}
Log field Description
insertId A unique identifier for the log entry.
jsonPayload.@type Identifies the event type. Always type.googleapis.com/google.pubsub.v1.IngestionFailureEvent.
jsonPayload.cloudStorageFailure.objectGeneration The generation number of the Cloud Storage object.
jsonPayload.cloudStorageFailure.bucket The Cloud Storage bucket containing the object.
jsonPayload.cloudStorageFailure.objectName The name of the Cloud Storage object.
jsonPayload.cloudStorageFailure.apiViolationReason Contains details about the API violation. This field is left empty.
jsonPayload.topic The Pub/Sub topic.
jsonPayload.errorMessage A human-readable message.
resource.type Resource type, always pubsub_topic.
resource.labels.project_id Google Cloud project ID.
resource.labels.topic_id Pub/Sub topic ID.
timestamp Log entry generation timestamp.
severity Severity level which is WARNING.
logName Name of the log.
receiveTimestamp Time at which the log entry was received by Logging.