Scenarios for exporting Cloud Logging: Elasticsearch

This scenario shows how to export selected logs from Logging to an Elasticsearch cluster. Elasticsearch is an open source document database that ingests, indexes, and analyzes unstructured data such as logs, metrics, and other telemetry. Integrating logs collected by Logging with logs that you might have in Elasticsearch gives you a unified log analysis solution.

This scenario is part of the series Design patterns for exporting Logging.

Introduction

The Elastic Stack has multiple solutions for ingesting data into an Elasticsearch cluster. Logstash and Beats are the core products used for collecting, transforming, and ingesting data. Choosing between Logstash and Beats depends on your data volume, ingest rates, and latency requirements. This solution focuses on the Logstash component of the Elastic Stack because Logstash is the most flexible option for working with logs exported from Logging.

You can use Logstash to collect, transform, and export logs from multiple sources and destinations. It's best to use Logstash where you have high log volume. You can use the Pub/Sub input plugin and Cloud Storage input plugin to integrate Logstash with Elasticsearch.

Beats is an open source platform for lightweight shippers of data into Elasticsearch. You can use the pubsubbeat shipper to ingest Pub/Sub messages into Elasticsearch. Although there is a gcsbeat shipper, it's designed for lightweight shipping of files at the root level of a bucket, so it's not the best solution for importing logs exported by Logging.

Use Beats in the following circumstances:

  • Your log volume is low (can be handled by a single machine).
  • You don't want the overhead of running Logstash.
  • Your log enrichment requirements can be met by Elastic ingest pipelines.

Use Logstash in the following circumstances:

  • Your log volume is high.
  • You need to enrich your logs beyond the capabilities of Elastic ingest pipelines.
  • You need to be able to export logs to systems other than Elasticsearch.
High volume or high touch Low volume and low touch
Real-time logstash-input-google_pubsub pubsubbeat
Eventual logstash-input-google_cloud_storage gcsbeat

The following diagram shows the architecture for batch and real-time export to Elasticsearch using Logstash.

Architecture for batch and real-time export to Elasticsearch using Logstash.

Setting up the real-time logging export

In this section, you create the pipeline for real-time log exporting from Logging to Logstash by using Pub/Sub.

Set up a Pub/Sub topic

Turn on audit logging for all services

Data access audit logs—except for BigQuery—are disabled by default. In order to enable all audit logs, follow the instructions to update the IAM policy with the configuration listed in the audit policy documentation. The steps include the following:

  • Downloading the current IAM policy as a file.
  • Adding the audit log policy JSON or YAML object to the current policy file.
  • Updating the Google Cloud project with the changed policy file.

The following is an example JSON object that enables all audit logs for all services.

"auditConfigs": [
    {
        "service": "allServices",
        "auditLogConfigs": [
            { "logType": "ADMIN_READ" },
            { "logType": "DATA_READ"  },
            { "logType": "DATA_WRITE" }
        ]
    }
],

Configure the logging export

After you set up aggregated exports or logs export, you need to refine the logging filters to export audit logs, virtual machine–related logs, storage logs, and database logs. The following logging filter includes the Admin Activity and Data Access audit logs and the logs for specific resource types.

logName:"/logs/cloudaudit.googleapis.com" OR
resource.type:gce OR
resource.type=gcs_bucket OR
resource.type=bigquery_resource

From the gcloud command-line tool, use the gcloud logging sinks create command or the organizations.sinks.create API call to create a sink with the appropriate filters. The following example gcloud command creates a sink called gcp_logging_sink_pubsub for the organization. The sink includes all children projects and specifies filtering to select specific audit logs.

ORG_ID=your-org-id
PROJECT_ID=$(gcloud config get-value project)
gcloud logging sinks create gcp_logging_sink_pubsub \
    pubsub.googleapis.com/projects/${PROJECT_ID}/topics/logs-export-topic \
    --log-filter='logName="/logs/cloudaudit.googleapis.com" OR resource.type:"gce" OR resource.type="gcs_bucket" OR resource.type="bigquery_resource"' \
    --include-children \
    --organization=${ORG_ID}

The command output is similar to the following:

Created [https://logging.googleapis.com/v2/organizations/your-org-id/sinks/gcp_logging_export_pubsub_sink].
Please remember to grant `serviceAccount:gcp-logging-export-pubsub-si@logging-oyour-org-id.iam.gserviceaccount.com` Pub/Sub Publisher role to the topic.
More information about sinks can be found at /logging/docs/export/configure_export

In the serviceAccount entry returned from the API call, the identity gcp-logging-export-pubsub-si@logging-oyour-org-id.iam.gserviceaccount.com is included in the response. This identity represents a Google Cloud service account that has been created for the export. Until you grant this identity publish access to the destination topic, log entry exports from this sink will fail. For more information, see the next section or the documentation for Granting access for a resource.

Set IAM policy permissions for the Pub/Sub topic

By adding the service account gcp-logging-export-pubsub-si@logging-oyour-org-id.iam.gserviceaccount.com to the pubsub.googleapis.com/projects/${PROJECT_ID}/topics/logs-export-topic topic with the Pub/Sub Publisher permissions, you grant the service account permission to publish to the topic. Until you add these permissions, the sink export will fail.

To add the permissions to the service account, follow these steps:

  1. In the Cloud Console, open the Cloud Pub/Sub Topics page:

    GO TO THE TOPICS PAGE

  2. Select the topic name.

  3. Click Show info panel, and then configure permissions. Make sure that the Pub/Sub Publisher permission is selected.

    Set permissions.

After you create the logging export by using this filter, log files begin to populate in the Pub/Sub topic in the configured project. You can confirm that the topic is receiving messages by using the Metrics Explorer in Cloud Monitoring. Using the following resource type and metric, observe the number of message-send operations over a brief period. If you have configured the export properly, you will see activity above 0 on the graph, as in the following screenshot.

  • Resource type: pubsub_topic
  • Metric: pubsub.googleapis.com/topic/send_message_operation_count
  • Filter: topic_id="logs-export-topic"

Activity in Metrics Explorer graph.

Setting up the logging export

Because Logging has finite retention periods for logs, creating a Logging export to Cloud Storage is a best practice. Using Logstash to ingest exported logs lets you search logs outside of the default retention periods and also helps you repopulate the Elasticsearch database when needed.

Configure Logstash

  1. Download the latest logstash release.
  2. Create an IAM service account and JSON account key for gcsbeat with the roles Cloud Storage Admin and Pub/Sub Subscriber.
  3. Install the Cloud Storage and Pub/Sub input plugins for Logstash by running the following command from your Logstash installation directory:

    ./bin/logstash-plugin install \
        logstash-input-google_cloud_storage \
        logstash-input-exec \
        logstash-input-google_pubsub
    

Create the batch ingest pipeline

  • Create a logstash configuration file for Cloud Storage like the following, replacing my-cloud-logs with the ID of your bucket:

    input {
      google_cloud_storage {
        type => "gcs"
        interval => 60
        bucket_id => "my-cloud-logs"
        json_key_file => "logstash-sa.json"
        file_matches => ".*\.json"
        codec => "json"
      }
    }
    
    output {
      elasticsearch {
        document_id => "%{[@metadata][gcs][line_id]}"
      }
    }
    

    For a list of all options, see the google_cloud_storage plugin configuration documentation.

Objects in Cloud Storage that have been processed by Logstash are marked using the metadata item x-goog-meta-ls-gcs-input:processed. The input plugin skips any objects with this metadata item. If you plan on reprocessing data, you must remove the metadata label manually.

Create the real-time pipeline

  • Create a logstash config file for the Pub/Sub source in the following form:

    input {
      google_pubsub {
          type => "pubsub"
          project_id => "my-project-id"
          topic => "logs-export-topic"
          subscription => "logstash"
          json_key_file => "logstash-sa.json"
      }
    }
    
    output {
      elasticsearch {
        document_id => "%{[@metadata][pubsub_message][messageId]}"
      }
    }
    

For a full list of configuration items, see the plugin documentation.

Idempotent inserts

Elasticsearch uses the _id field of a document as a unique identifier. In Logstash, you can use the [@metadata] items and other message fields to create a unique document ID based on the types of log messages from Logging.

The google_cloud_storage plugin metadata documentation has a list of available Logstash metadata fields. The [@metadata][gcs][line_id] field is inserted explicitly for the use case of creating idempotent document ids.

The google_pubsub plugin also has a list of metadata fields, the [@metadata][pubsub_message][messageId] is commonly used for de-duplication.

Using the exported logs

After the exported logs have been ingested to Elasticsearch and the real-time Beat has been deployed as a service, you can use Elasticsearch and the products in the Elastic Stack to do the following tasks:

  • Search the logs.
  • Correlate complex events.
  • Create automated alerts.
  • Explore relationships in your data as a graph.
  • Visualize results by using Kibana dashboards.

What's next