Edit on GitHub
Report issue
Page history

Export Firebase Crashlytics BigQuery logs to Datadog using Dataflow

Author(s): @anantdamle ,   Published: 2021-04-14

Anant Damle | Solutions Architect | Google

Contributed by Google employees.

This tutorial shows you how to export Firebase Crashlytics logs from BigQuery tables to Datadog.

Firebase Crashlytics is a lightweight, real-time crash reporter that helps you track, prioritize, and fix stability issues that decrease your app quality. Crashlytics saves you troubleshooting time by intelligently grouping crashes and highlighting the circumstances that cause them.

You can export Firebase Crashlytics data to BigQuery to enable further analysis in BigQuery. You can combine this data with your data exported to BigQuery from Cloud Logging and your own first-party data and use Data Studio to visualize the data.

Datadog is a log monitoring platform integrated with Google Cloud that provides application and infrastructure monitoring services.

This document is intended for a technical audience whose responsibilities include log management or data analytics. This document assumes that you're familiar with Dataflow and have some familiarity with using shell scripts and basic knowledge of Google Cloud.

Architecture

Architecture diagram

The batch Dataflow pipeline to process the Crashlytics logs in BigQuery is as follows:

  1. Read the BigQuery table or partition.
  2. Transform the BigQuery TableRow into a JSON string in Datadog log entry format.

The pipeline uses two optimizations:

  • Bundle log messages into batches of 5 MB or 1000 entries to reduce the number of API calls.
  • Compress requests using GZip to reduce the size of requests sent across the network.

Objectives

  • Create a service account with limited access.
  • Create a Dataflow Flex template pipeline to send Crashlytics logs to Datadog using the Datadog API for sending logs
  • Verify that Crashlytics imported the Crashlytics logs.

Costs

This tutorial uses billable components of Google Cloud, including the following:

Use the pricing calculator to generate a cost estimate based on your projected usage.

Before you begin

For this tutorial, you need a Google Cloud project. To make cleanup easiest at the end of the tutorial, we recommend that you create a new project for this tutorial.

  1. Create a Google Cloud project.
  2. Make sure that billing is enabled for your Google Cloud project.
  3. Open Cloud Shell.

    At the bottom of the Cloud Console, a Cloud Shell session opens and displays a command-line prompt. Cloud Shell is a shell environment with the Cloud SDK already installed, including the gcloud command-line tool, and with values already set for your current project. It can take a few seconds for the session to initialize.

  4. Enable APIs for Compute Engine, Cloud Storage, Dataflow, BigQuery, and Cloud Build services:

    gcloud services enable \
      compute.googleapis.com \
      storage.googleapis.com \
      dataflow.googleapis.com \
      bigquery.googleapis.com \
      cloudbuild.googleapis.com
    

Setting up your environment

  1. In Cloud Shell, clone the source repository and go to the directory for this tutorial:

    git clone https://github.com/GoogleCloudPlatform/crashlytics-logs-to-datadog.git
    cd crashlytics-logs-to-datadog/
    
  2. Use a text editor to modify the set_environment.sh file to set the required environment variables:

    # The Google Cloud project to use for this tutorial
    export PROJECT_ID="[YOUR_PROJECT_ID]"
    
    # The Compute Engine region to use for running Dataflow jobs
    export REGION_ID="[COMPUTE_ENGINE_REGION]"
    
    # define the Cloud Storage bucket to use for Dataflow templates and temporary location.
    export GCS_BUCKET="[NAME_OF_CLOUD_STORAGE_BUCKET]"
    
    # Name of the service account to use (not the email address)
    export PIPELINE_SERVICE_ACCOUNT_NAME="[SERVICE_ACCOUNT_NAME_FOR_RUNNER]"
    
    # The API key created in Datadog for making API calls
    # https://app.datadoghq.com/account/settings#api
    export DATADOG_API_KEY="[YOUR_DATADOG_API_KEY]"
    
  3. Run the script to set the environment variables:

    source set_environment.sh
    

Creating resources

The tutorial uses following resources:

  • A service account to run Dataflow pipelines, enabling fine-grained access control
  • A Cloud Storage bucket for temporary data storage and test data

Create service accounts

We recommend that you run pipelines with fine-grained access control to improve access partitioning, by provisioning the least permissions required for each service-account.

If your project doesn't have a user-created service account, create one using following instructions.

  1. Create a service account to use as the user-managed controller service account for Dataflow:

    gcloud iam service-accounts create  "${PIPELINE_SERVICE_ACCOUNT_NAME}" \
      --project="${PROJECT_ID}" \
      --description="Service Account for Datadog export pipelines." \
      --display-name="Datadog logs exporter"
    
  2. Create a custom role with the permissions required, as specified in the datadog_sender_permissions.yaml file:

    export DATADOG_SENDER_ROLE_NAME="datadog_sender"
    
    gcloud iam roles create "${DATADOG_SENDER_ROLE_NAME}" \
      --project="${PROJECT_ID}" \
      --file=datadog_sender_permissions.yaml
    
  3. Apply the custom role to the service account:

    gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
      --member="serviceAccount:${PIPELINE_SERVICE_ACCOUNT_EMAIL}" \
      --role="projects/${PROJECT_ID}/roles/${DATADOG_SENDER_ROLE_NAME}"
    
  4. Assign the dataflow.worker role to allow a Dataflow worker to run with the service account credentials:

    gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
      --member="serviceAccount:${PIPELINE_SERVICE_ACCOUNT_EMAIL}" \
      --role="roles/dataflow.worker"
    

Create the Cloud Storage bucket

Create a Cloud Storage bucket for storing test data and Dataflow staging location:

gsutil mb -p "${PROJECT_ID}" -l "${REGION_ID}" "gs://${GCS_BUCKET}"

Build and launch the Dataflow pipeline

  1. Build the pipeline code:

    ./gradlew clean build shadowJar
    
  2. Define the table fully qualified BigQuery table ID for the Crashlytics data:

    export CRASHLYTICS_BIGQUERY_TABLE="[YOUR_PROJECT_ID]:[YOUR_DATASET_ID].[YOUR_TABLE_ID]"
    

    Make sure that the service account has access to this BigQuery table.

  3. Run the pipeline:

    bq_2_datadog_pipeline \
      --project="${PROJECT_ID}" \
      --region="${REGION_ID}" \
      --runner="DataflowRunner" \
      --serviceAccount="${PIPELINE_SERVICE_ACCOUNT_EMAIL}" \
      --gcpTempLocation="gs://${GCS_BUCKET}/temp" \
      --stagingLocation="gs://${GCS_BUCKET}/staging" \
      --tempLocation="gs://${GCS_BUCKET}/bqtemp" \
      --datadogApiKey="${DATADOG_API_KEY}" \
      --sourceBigQueryTableId="${CRASHLYTICS_BIGQUERY_TABLE}"
    

    You can run the pipeline with the following options:

    parameter Default value Description
    sourceBigQueryTableId Fully qualified BigQuery table ID
    bigQuerySqlQuery BigQuery SQL query results to send to Datadog
    shardCount 10 Number of parallel processes to send to Datadog (Too high a number can overload the Datadog API.)
    preserveNulls false Allow null values from BigQuery source to be serialized.
    datadogApiKey API key from the Datadog console
    datadogEndpoint https://http-intake.logs.datadoghq.com/v1/input See Datadog logging endpoints.
    datadogSource crashlytics-bigquery
    datadogTags user:crashlytics-pipeline
    datadogLogHostname crashlytics

    For information about datadogSource, datadogTags, and datadogLogHostname, see Datadog log entry structure. You can customize these parameters to suit your needs.

    Use either sourceBigQueryTableId or bigQuerySqlQuery, not both.

  4. Monitor the Dataflow job in Cloud Console.

    The following diagram shows the pipeline DAG:

    Pipeline DAG

Create a Dataflow Flex Template

Dataflow templates allow you to use the Cloud Console, the gcloud command-line tool, or REST API calls to set up your pipelines on Google Cloud and run them. Classic templates are staged as execution graphs on Cloud Storage; Flex Templates bundle the pipeline as a container image in your project’s registry in Container Registry. This allows you to decouple building and running pipelines, as well as integrate with orchestration systems for daily execution. For more information, see Evaluating which template type to use in the Dataflow documentation.

  1. Define the location to store the template spec file containing all of the necessary information to run the job:

    export TEMPLATE_PATH="gs://${GCS_BUCKET}/dataflow/templates/bigquery-to-datadog.json"
    export TEMPLATE_IMAGE="us.gcr.io/${PROJECT_ID}/dataflow/bigquery-to-datadog:latest"
    
  2. Build the Dataflow Flex template:

    gcloud dataflow flex-template build "${TEMPLATE_PATH}" \
      --image-gcr-path="${TEMPLATE_IMAGE}" \
      --sdk-language="JAVA" \
      --flex-template-base-image=JAVA11 \
      --metadata-file="bigquery-to-datadog-pipeline-metadata.json" \
      --service-account-email="${PIPELINE_SERVICE_ACCOUNT_EMAIL}" \
      --jar="build/libs/crashlytics-logs-to-datadog-all.jar" \
      --env="FLEX_TEMPLATE_JAVA_MAIN_CLASS=\"com.google.cloud.solutions.bqtodatadog.BigQueryToDatadogPipeline\""    
    

Run the pipeline using the Flex Template

Run the pipeline using the Flex Template that you created in the previous step:

gcloud dataflow flex-template run "bigquery-to-datadog-`date +%Y%m%d-%H%M%S`" \
  --region "${REGION_ID}" \
  --template-file-gcs-location "${TEMPLATE_PATH}" \
  --service-account-email "${PIPELINE_SERVICE_ACCOUNT_EMAIL}" \
  --parameters sourceBigQueryTableId="${CRASHLYTICS_BIGQUERY_TABLE}" \
  --parameters datadogApiKey="${DATADOG_API_KEY}"

Verify logs in the Datadog console

Visit the Datadog log viewer to verify that the logs are available in Datadog.

Datadog screenshot

Cleaning up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, you can delete the project.

  1. In the Cloud Console, go to the Projects page.
  2. In the project list, select the project you want to delete and click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

What's next

Submit a tutorial

Share step-by-step guides

Submit a tutorial

Request a tutorial

Ask for community help

Submit a request

View tutorials

Search Google Cloud tutorials

View tutorials

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see our Site Policies. Java is a registered trademark of Oracle and/or its affiliates.