This reference architecture describes how you can import logs that were previously exported to Cloud Storage back to Cloud Logging.
This reference architecture is intended for engineers and developers, including DevOps, site reliability engineers (SREs), and security investigators, who want to configure and run the log importing job. This document assumes you are familiar with running Cloud Run jobs, and how to use Cloud Storage and Cloud Logging.
Architecture
The following diagram shows how Google Cloud services are used in this reference architecture:
This workflow includes the following components:
- Cloud Storage bucket: Contains the previously exported logs you want to import back to Cloud Logging. Because these logs were previously exported, they're organized in the expected export format.
- Cloud Run
job: Runs the import logs process:
- Reads the objects that store log entries from Cloud Storage.
- Finds exported logs for the specified log ID, in the requested time range, based on the organization of the exported logs in the Cloud Storage bucket.
- Converts the objects into Cloud Logging API
LogEntry
structures. MultipleLogEntry
structures are aggregated into batches, to reduce Cloud Logging API quota consumption. The architecture handles quota errors when necessary. - Writes the converted log entries to Cloud Logging. If you re-run the same job multiple times, duplicate entries can result. For more information, see Run the import job.
- Cloud Logging: Ingests and stores the converted log entries.
The log entries are processed as described in the
Routing and storage overview.
- The Logging quotas and limits apply, including the Cloud Logging API quotas and limits and a 30-day retention period. This reference architecture is designed to work with the default write quotas, with a basic retrying mechanism. If your write quota is lower than the default, the implementation might fail.
- The imported logs aren't included in log-based metrics, because their timestamps are in the past. However, if you opt to use a label, the timestamp records the import time, and the logs are included in the metric data.
- BigQuery: Uses SQL to run analytical queries on imported logs (optional). To import audit logs from Cloud Storage, this architecture modifies the log IDs; you must account for this renaming when you query the imported logs.
Use case
You might choose to deploy this architecture if your organization requires additional log analysis for incident investigations or other audits of past events. For example, you might want to analyze connections to your databases for the first quarter of the last year, as a part of a database access audit.
Design alternatives
This section describes alternatives to the default design shown in this reference architecture document.
Retention period and imported logs
Cloud Logging requires incoming log entries to have timestamps that don't exceed a 30-day retention period. Imported log entries with timestamps older than 30 days from the import time are not stored.
This architecture validates the date range set in the Cloud Run job to avoid importing logs that are older than 29 days, leaving a one-day safety margin.
To import logs older than 29 days, you need to make the following changes to the implementation code, and then build a new container image to use in the Cloud Run job configuration.
- Remove the 30-day validation of the date range
- Add the original timestamp as a user label to the log entry
- Reset the timestamp label of the log entry to allow it to be ingested with the current timestamp
When you use this modification, you must use the
labels
field
instead of the
timestamp
field
in your Log Analytics queries. For more information about Log Analytics
queries and samples, see
Sample SQL queries.
Design considerations
The following guidelines can help you to develop an architecture that meets your organization's requirements.
Cost optimization
The cost for importing logs by using this reference architecture has multiple contributing factors.
You use the following billable components of Google Cloud:
- Cloud Logging (logs retention period costs apply)
- Cloud Run
- Cloud Storage API
Consider the following factors that might increase costs:
- Log duplication: To avoid additional log storage costs, don't run the import job with the same configuration multiple times.
- Storage in additional destinations: To avoid additional log storage costs, disable routing policies at the destination project to prevent log storage in additional locations or forwarding logs to other destinations such as Pub/Sub or BigQuery.
- Additional CPU and memory: If your import job times out, you might need to increase the import job CPU and memory in your import job configuration. Increasing these values might increase incurred Cloud Run costs.
- Additional tasks: If the expected number of logs to be imported each day within the time range is high, you might need to increase the number of tasks in the import job configuration. The job will split the time range equally between the tasks, so each task will process a similar number of days from the range concurrently. Increasing the number of tasks might increase incurred Cloud Run costs.
- Storage class: If your Cloud Storage bucket's storage class is other than Standard, such as Nearline, Durable Reduced Availability (DRA), or Coldline, you might incur additional charges.
- Data traffic between different locations: Configure the import job to run in the same location as the Cloud Storage bucket from which you import the logs. Otherwise, network egress costs might be incurred.
To generate a cost estimate based on your projected usage, including Cloud Run jobs, use the pricing calculator.
Operational efficiency
This section describes considerations for managing analytical queries after the solution is deployed.
Log names and queries
Logs are stored to the project that is defined in the
logName
field
of the log entry. To import the logs to the selected project, this architecture
modifies the logName
field of each imported log. The import logs are stored in
the selected project's default log bucket that has the log ID imported_logs
(unless
the project has a log routing policy that changes the storage destination).
The original value of the logName
field is preserved in the
labels
field
with the key original_logName
.
You must account for the location of the original logName
value when you query
the imported logs. For more information about Log Analytics queries and samples,
see Sample SQL queries.
Performance optimization
If the volume of logs that you're importing exceeds Cloud Run capacity
limits, the job might time out before the import is complete. To prevent an incomplete
data import, consider increasing the tasks
value in the
import job. Increasing CPU
and memory resources can also help
improve task performance when you increase the number of tasks.
Deployment
To deploy this architecture, see Deploy a job to import logs from Cloud Storage to Cloud Logging.
What's Next
- Review the implementation code in the GitHub repository.
- Learn how to analyze imported logs by using Log Analytics and SQL.
- For more reference architectures, diagrams, and best practices, explore the Cloud Architecture Center.
Contributors
Author: Leonid Yankulin | Developer Relations Engineer
Other contributors:
- Summit Tuladhar | Senior Staff Software Engineer
- Wilton Wong | Enterprise Architect
- Xiang Shen | Solutions Architect