This scenario shows how to export selected logs from Logging to an Elasticsearch cluster. Elasticsearch is an open source document database that ingests, indexes, and analyzes unstructured data such as logs, metrics, and other telemetry. Integrating logs collected by Logging with logs that you might have in Elasticsearch gives you a unified log analysis solution.
This scenario is part of the series Design patterns for exporting Logging.
Introduction
The Elastic Stack has multiple solutions for ingesting data into an Elasticsearch cluster. Logstash and Beats are the core products used for collecting, transforming, and ingesting data. Choosing between Logstash and Beats depends on your data volume, ingest rates, and latency requirements. This solution focuses on the Logstash component of the Elastic Stack because Logstash is the most flexible option for working with logs exported from Logging.
You can use Logstash to collect, transform, and export logs from multiple sources and destinations. It's best to use Logstash where you have high log volume. You can use the Pub/Sub input plugin and Cloud Storage input plugin to integrate Logstash with Elasticsearch.
Beats is an open source platform for lightweight shippers of data into
Elasticsearch. You can use the
pubsubbeat
shipper to ingest Pub/Sub messages into Elasticsearch. Although
there is a
gcsbeat
shipper, it's designed for lightweight shipping of files at the root level of a
bucket, so it's not the best solution for importing logs exported by
Logging.
Use Beats in the following circumstances:
- Your log volume is low (can be handled by a single machine).
- You don't want the overhead of running Logstash.
- Your log enrichment requirements can be met by Elastic ingest pipelines.
Use Logstash in the following circumstances:
- Your log volume is high.
- You need to enrich your logs beyond the capabilities of Elastic ingest pipelines.
- You need to be able to export logs to systems other than Elasticsearch.
High volume or high touch | Low volume and low touch | |
---|---|---|
Real-time | logstash-input-google_pubsub |
pubsubbeat |
Eventual | logstash-input-google_cloud_storage |
gcsbeat |
The following diagram shows the architecture for batch and real-time export to Elasticsearch using Logstash.
Setting up the real-time logging export
In this section, you create the pipeline for real-time log exporting from Logging to Logstash by using Pub/Sub.
Set up a Pub/Sub topic
- Follow the instructions to set up a Pub/Sub topic that will receive your exported logs.
Turn on audit logging for all services
Data access audit logs—except for BigQuery—are disabled by default. In order to enable all audit logs, follow the instructions to update the IAM policy with the configuration listed in the audit policy documentation. The steps include the following:
- Downloading the current IAM policy as a file.
- Adding the audit log policy JSON or YAML object to the current policy file.
- Updating the Google Cloud project with the changed policy file.
The following is an example JSON object that enables all audit logs for all services.
"auditConfigs": [
{
"service": "allServices",
"auditLogConfigs": [
{ "logType": "ADMIN_READ" },
{ "logType": "DATA_READ" },
{ "logType": "DATA_WRITE" }
]
}
],
Configure the logging export
After you set up aggregated exports or logs export, you need to refine the logging filters to export audit logs, virtual machine–related logs, storage logs, and database logs. The following logging filter includes the Admin Activity and Data Access audit logs and the logs for specific resource types.
logName:"logs/cloudaudit.googleapis.com" OR
resource.type:gce OR
resource.type=gcs_bucket OR
resource.type=bigquery_resource
From the gcloud
command-line tool, use the
gcloud logging sinks create
command or the
organizations.sinks.create
API call to create a sink with the appropriate filters. The following example
gcloud
command creates a sink called gcp_logging_sink_pubsub
for the
organization. The sink includes all children projects and specifies filtering to
select specific audit logs.
ORG_ID=your-org-id
PROJECT_ID=$(gcloud config get-value project)
gcloud logging sinks create gcp_logging_sink_pubsub \
pubsub.googleapis.com/projects/${PROJECT_ID}/topics/logs-export-topic \
--log-filter='logName="logs/cloudaudit.googleapis.com" OR resource.type:"gce" OR resource.type="gcs_bucket" OR resource.type="bigquery_resource"' \
--include-children \
--organization=${ORG_ID}
The command output is similar to the following:
Created [https://logging.googleapis.com/v2/organizations/your-org-id/sinks/gcp_logging_export_pubsub_sink]. Please remember to grant `serviceAccount:gcp-logging-export-pubsub-si@logging-oyour-org-id.iam.gserviceaccount.com` Pub/Sub Publisher role to the topic. More information about sinks can be found at /logging/docs/export/configure_export
In the serviceAccount
entry returned from the API call, the identity
gcp-logging-export-pubsub-si@logging-oyour-org-id.iam.gserviceaccount.com
is
included in the response. This identity represents a Google Cloud service
account that has been created for the export. Until you grant this identity
publish access to the destination topic, log entry exports from this sink will
fail. For more information, see the next section or the documentation for
Granting access for a resource.
Set IAM policy permissions for the Pub/Sub topic
By adding the service account
gcp-logging-export-pubsub-si@logging-oyour-org-id.iam.gserviceaccount.com
to
the pubsub.googleapis.com/projects/${PROJECT_ID}/topics/logs-export-topic
topic with the Pub/Sub Publisher permissions, you grant the service account
permission to publish to the topic. Until you add these permissions, the sink
export will fail.
To add the permissions to the service account, follow these steps:
In the Cloud Console, open the Cloud Pub/Sub Topics page:
Select the topic name.
Click Show info panel, and then configure permissions. Make sure that the Pub/Sub Publisher permission is selected.
After you create the logging export by using this filter, log files begin to populate in the Pub/Sub topic in the configured project. You can confirm that the topic is receiving messages by using the Metrics Explorer in Cloud Monitoring. Using the following resource type and metric, observe the number of message-send operations over a brief period. If you have configured the export properly, you will see activity above 0 on the graph, as in the following screenshot.
- Resource type:
pubsub_topic
- Metric:
pubsub.googleapis.com/topic/send_message_operation_count
- Filter:
topic_id="logs-export-topic"
Setting up the logging export
Because Logging has finite retention periods for logs, creating a Logging export to Cloud Storage is a best practice. Using Logstash to ingest exported logs lets you search logs outside of the default retention periods and also helps you repopulate the Elasticsearch database when needed.
Configure Logstash
- Download the latest logstash release.
- Create an
IAM service account
and
JSON account key
for
gcsbeat
with the rolesCloud Storage Admin
andPub/Sub Subscriber
. Install the Cloud Storage and Pub/Sub input plugins for Logstash by running the following command from your Logstash installation directory:
./bin/logstash-plugin install \ logstash-input-google_cloud_storage \ logstash-input-exec \ logstash-input-google_pubsub
Create the batch ingest pipeline
Create a logstash configuration file for Cloud Storage like the following, replacing
my-cloud-logs
with the ID of your bucket:input { google_cloud_storage { type => "gcs" interval => 60 bucket_id => "my-cloud-logs" json_key_file => "logstash-sa.json" file_matches => ".*\.json" codec => "json" } } output { elasticsearch { document_id => "%{[@metadata][gcs][line_id]}" } }
For a list of all options, see the
google_cloud_storage
plugin configuration documentation.
Objects in Cloud Storage that have been processed by Logstash are
marked using the metadata item x-goog-meta-ls-gcs-input:processed
. The input
plugin skips any objects with this metadata item. If you plan on reprocessing
data, you must remove the metadata label manually.
Create the real-time pipeline
Create a
logstash
config file for the Pub/Sub source in the following form:input { google_pubsub { type => "pubsub" project_id => "my-project-id" topic => "logs-export-topic" subscription => "logstash" json_key_file => "logstash-sa.json" } } output { elasticsearch { document_id => "%{[@metadata][pubsub_message][messageId]}" } }
For a full list of configuration items, see the plugin documentation.
Idempotent inserts
Elasticsearch uses the _id
field of a document as a unique identifier. In
Logstash, you can use the [@metadata]
items and other message fields to create
a unique document ID based on the types of log messages from
Logging.
The google_cloud_storage
plugin
metadata documentation
has a list of available Logstash metadata fields. The
[@metadata][gcs][line_id]
field is inserted explicitly for the use case of
creating idempotent document ids.
The google_pubsub plugin
also has a list of metadata fields, the
[@metadata][pubsub_message][messageId]
is commonly used for de-duplication.
Using the exported logs
After the exported logs have been ingested to Elasticsearch and the real-time Beat has been deployed as a service, you can use Elasticsearch and the products in the Elastic Stack to do the following tasks:
- Search the logs.
- Correlate complex events.
- Create automated alerts.
- Explore relationships in your data as a graph.
- Visualize results by using Kibana dashboards.
What's next
- Exporting logs to Elastic Cloud
- Look at the other export scenarios:
- Try out other Google Cloud features for yourself. Have a look at our tutorials.