Deploy a job to import logs from Cloud Storage to Cloud Logging

Last reviewed 2024-01-02 UTC

This document describes how you deploy the reference architecture described in Import logs from Cloud Storage to Cloud Logging.

These instructions are intended for engineers and developers, including DevOps, site reliability engineers (SREs), and security investigators, who want to configure and run the log importing job. This document also assumes you are familiar with running Cloud Run import jobs, and how to use Cloud Storage and Cloud Logging.

Architecture

The following diagram shows how Google Cloud services are used in this reference architecture:

Workflow diagram of log import from Cloud Storage to Cloud Logging.

For details, see Import logs from Cloud Storage to Cloud Logging.

Objectives

  • Create and configure a Cloud Run import job
  • Create a service account to run the job

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

Before you begin

  1. Ensure that the logs you intend to import were previously exported to Cloud Storage, which means that they're already organized in the expected export format.

  2. In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

  3. Create or select a Google Cloud project.

    • Create a Google Cloud project:

      gcloud projects create PROJECT_ID

      Replace PROJECT_ID with a name for the Google Cloud project you are creating.

    • Select the Google Cloud project that you created:

      gcloud config set project PROJECT_ID

      Replace PROJECT_ID with your Google Cloud project name.

  4. Replace PROJECT_ID with the destination project ID.

  5. Make sure that billing is enabled for your Google Cloud project.

  6. Enable the Cloud Run and Identity and Access Management (IAM) APIs:

    gcloud services enable run.googleapis.com iam.googleapis.com

Required roles

To get the permissions that you need to deploy this solution, ask your administrator to grant you the following IAM roles:

  • To grant the Logs Writer role on the log bucket: Project IAM Admin (roles/resourcemanager.projectIamAdmin) on the destination project
  • To grant the Storage Object Viewer role on the storage bucket: Storage Admin (roles/storage.admin) on the project where the storage bucket is hosted
  • To create a service account: Create Service Accounts (roles/iam.serviceAccountCreator) on the destination project
  • To enable services on the project: Service Usage Admin (roles/serviceusage.serviceUsageAdmin) on the destination project
  • To upgrade the log bucket and delete imported logs: Logging Admin (roles/logging.admin) on the destination project
  • To create, run, and modify the import job: Cloud Run Developer (roles/run.developer) on the destination project

For more information about granting roles, see Manage access.

You might also be able to get the required permissions through custom roles or other predefined roles.

Upgrade the log bucket to use Log Analytics

We recommend that you use the default log bucket, and upgrade it to use Log Analytics. However, in a production environment, you can use your own log bucket if the default bucket doesn't meet your requirements. If you decide to use your own bucket, you must route logs that are ingested to the destination project to this log bucket. For more information, see Configure log buckets and Create a sink.

When you upgrade the bucket, you can use SQL to query and analyze your logs. There's no additional cost to upgrade the bucket or use Log Analytics.

To upgrade the default log bucket in the destination project, do the following:

  • Upgrade the default log bucket to use Log Analytics:

    gcloud logging buckets update BUCKET_ID --location=LOCATION --enable-analytics
    

    Replace the following:

    • BUCKET_ID: the name of the log bucket (for example, _Default)
    • LOCATION: a supported region (for example, global)

Create the Cloud Run import job

When you create the job, you can use the prebuilt container image that is provided for this reference architecture. If you need to modify the implementation to change the 30-day retention period or if you have other requirements, you can build your own custom image.

  • In Cloud Shell, create the job with the configurations and environment variables:

    gcloud run jobs create JOB_NAME \
    --image=IMAGE_URL \
    --region=REGION \
    --tasks=TASKS \
    --max-retries=0 \
    --task-timeout=60m \
    --cpu=CPU \
    --memory=MEMORY \
    --set-env-vars=END_DATE=END_DATE,LOG_ID=LOG_ID,\
    START_DATE=START_DATE,STORAGE_BUCKET_NAME=STORAGE_BUCKET_NAME,\
    PROJECT_ID=PROJECT_ID
    

    Replace the following:

    • JOB_NAME: the name of your job.
    • IMAGE_URL: the reference to the container image; use us-docker.pkg.dev/cloud-devrel-public-resources/samples/import-logs-solution or the URL of the custom image, if you built one by using the instructions in GitHub.
    • REGION: the region where you want your job to be located; to avoid additional costs, we recommend keeping the job region the same or within the same multi-region as the Cloud Storage bucket region. For example, if your bucket is multi-region US, you can use us-central1. For details, see Cost optimization.
    • TASKS: the number of tasks that the job must run. The default value is 1. You can increase the number of tasks if timeouts occur.
    • CPU: the CPU limit, which can be 1, 2, 4, 6, or 8 CPUs. The default value is 2. You can increase the number if timeouts occur; for details, see Configure CPU limits.
    • MEMORY: the memory limit. The default value is 2Gi. You can increase the number if timeouts occur; for details, see Configure memory limits.
    • END_DATE: the end of the date range in the format MM/DD/YYYY. Logs with timestamps earlier than or equal to this date are imported.
    • LOG_ID: the log identifier of the logs you want to import. Log ID is a part of the logName field of the log entry. For example, cloudaudit.googleapis.com.
    • START_DATE: the start of the date range in the format MM/DD/YYYY. Logs with timestamps later than or equal to this date are imported.
    • STORAGE_BUCKET_NAME: the name of the Cloud Storage bucket where logs are stored (without the gs:// prefix).

    The max-retries option is set to zero to prevent retries for failed tasks, which can cause duplicate log entries.

    If the Cloud Run job fails due to a timeout, an incomplete import can result. To prevent incomplete imports due to timeouts, increase the tasks value, as well as the CPU and memory resources.

Increasing these values might increase costs. For details about costs, see Cost optimization.

Create a service account to run your Cloud Run job

  1. In Cloud Shell, create the user-managed service account:

    gcloud iam service-accounts create SA_NAME
    

    Replace SA_NAME with the name of the service account.

  2. Grant the Storage Object Viewer role on the storage bucket:

    gcloud storage buckets add-iam-policy-binding gs://STORAGE_BUCKET_NAME \
    --member=serviceAccount:SA_NAME@PROJECT_ID.iam.gserviceaccount.com \
    --role=roles/storage.objectViewer
    

    Replace the following:

    • STORAGE_BUCKET_NAME: the name of the storage bucket that you used in the import job configuration. For example, my-bucket.
    • PROJECT_ID: the destination project ID.
  3. Grant the Logs Writer role on the log bucket:

    gcloud projects add-iam-policy-binding PROJECT_ID \
    --member=serviceAccount:SA_NAME@PROJECT_ID.iam.gserviceaccount.com \
    --role=roles/logging.logWriter
    
  4. Set the service account for the Cloud Run job:

    gcloud run jobs update JOB_NAME \
    --region=REGION \
    --service-account SA_NAME@PROJECT_ID.iam.gserviceaccount.com
    

    Replace REGION with the same region where you deployed the Cloud Run import job.

Run the import job

  • In Cloud Shell, execute the created job:

    gcloud run jobs execute JOB_NAME \
    --region=REGION
    

For more information, see Execute jobs and Manage job executions.

If you need to rerun the job, delete the previously imported logs to avoid creating duplicates. For details, see Delete imported logs later in this document.

When you query the imported logs, duplicates don't appear in the query results. Cloud Logging removes duplicates (log entries from the same project, with the same insertion ID and timestamp) from query results. For more information, see the insert_id field in the Logging API reference.

Verify results

To validate that the job has completed successfully, in Cloud Shell, you can query import results:

  gcloud logging read 'log_id("imported_logs") AND timestamp<=END_DATE'

The output shows the imported logs. If this project was used to run more than one import job within the specified timeframe, the output shows imported logs from those jobs as well.

For more options and details about querying log entries, see gcloud logging read.

Delete imported logs

If you need to run the same job more than one time, delete the previously imported logs to avoid duplicated entries and increased costs.

  • To delete imported logs, in Cloud Shell, execute the logs delete:

    gcloud logging logs delete imported_logs
    

Be aware that deleting imported logs purges all log entries that were imported to the destination project and not only the results of the last import job execution.

What's Next

Contributors

Author: Leonid Yankulin | Developer Relations Engineer

Other contributors: