This document describes how you deploy a Cloud Logging log sink and a Dataflow pipeline to stream logs from Google Cloud to Datadog. It assumes that you're familiar with the reference architecture in Stream logs from Google Cloud to Datadog.
These instructions are intended for IT professionals who want to stream logs from Google Cloud to Datadog. Although it's not required, having experience with the following Google products is useful for deploying this architecture:
- Dataflow pipelines
- Pub/Sub
- Cloud Logging
- Identity and Access Management (IAM)
- Cloud Storage
You must have a Datadog account to complete this deployment. However, you don't need any familiarity with Datadog Log Management.
Architecture
The following diagram shows the architecture that's described in this document. This diagram demonstrates how log files that are generated by Google Cloud are ingested by Datadog and shown to Datadog users. Click the diagram to enlarge it.
As shown in the preceding diagram, the following events occur:
- Cloud Logging collects log files from a Google Cloud project into a designated Cloud Logging log sink and then forwards them to a Pub/Sub topic.
- A Dataflow pipeline pulls the logs from the
Pub/Sub topic, batches them, compresses them into a payload,
and then delivers them to Datadog.
- If there's a delivery failure, a secondary Dataflow pipeline sends messages from a dead-letter topic back to the primary log-forwarding topic to be redelivered.
- The logs arrive in Datadog for further analysis and monitoring.
For more information, see the Architecture section of the reference architecture.
Objectives
- Create the secure networking infrastructure.
- Create the logging and Pub/Sub infrastructure.
- Create the credentials and storage infrastructure.
- Create the Dataflow infrastructure.
- Validate that Datadog Log Explorer received logs.
- Manage delivery errors.
Costs
In this document, you use the following billable components of Google Cloud:
To generate a cost estimate based on your projected usage,
use the pricing calculator.
You also use the following billable components for Datadog:
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the Cloud Monitoring, Secret Manager, Compute Engine, Pub/Sub, Logging, and Dataflow APIs.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the Cloud Monitoring, Secret Manager, Compute Engine, Pub/Sub, Logging, and Dataflow APIs.
IAM role requirements
-
Make sure that you have the following role or roles on the project: Compute > Compute Network Admin, Compute > Compute Security Admin, Dataflow > Dataflow Admin, Dataflow > Dataflow Worker, IAM > Project IAM Admin, IAM > Service Account Admin, IAM > Service Account User, Logging > Logs Configuration Writer, Logging > Logs Viewer, Pub/Sub > Pub/Sub Admin, Secret Manager > Secret Manager Admin, Storage > Storage Admin
Check for the roles
-
In the Google Cloud console, go to the IAM page.
Go to IAM - Select the project.
-
In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
- For all rows that specify or include you, check the Role colunn to see whether the list of roles includes the required roles.
Grant the roles
-
In the Google Cloud console, go to the IAM page.
Go to IAM - Select the project.
- Click Grant access.
-
In the New principals field, enter your user identifier. This is typically the email address for a Google Account.
- In the Select a role list, select a role.
- To grant additional roles, click Add another role and add each additional role.
- Click Save.
-
Create network infrastructure
This section describes how to create your network infrastructure to support the deployment of a Cloud Logging log sink and a Dataflow pipeline to stream logs from Google Cloud to Datadog.
Create a Virtual Private Cloud (VPC) network and subnet
To host the Dataflow pipeline worker VMs, create a Virtual Private Cloud (VPC) network and subnet:
In the Google Cloud console, go to the VPC networks page.
Click Create VPC network.
In the Name field, provide a name for the network.
In the Subnets section, provide a name, region, and IP address range for the subnetwork. The size of the IP address range might vary based on your environment. A subnet mask of length
/24
is sufficient for most use cases.In the Private Google Access section, select On.
Click Done and then click Create.
Create a VPC firewall rule
To restrict traffic to the Dataflow VMs, create a VPC firewall rule:
In the Google Cloud console, go to the Create a firewall rule page.
In the Name field, provide a name for the rule.
In the Description field, explain what the rule does.
In the Network list, select the network for your Dataflow VMs.
In the Priority field, specify the order in which this rule is applied. Set the Priority to
0
.Rules with lower numbers get prioritized first. The default value for this field is
1,000
.In the Direction of traffic section, select Ingress.
In the Action on match section, select Allow.
Create targets, source tags, protocols, and ports
In the Google Cloud console, go to the Create a firewall rule page.
Find the Targets list and select Specified target tags.
In the Target tags field, enter
dataflow
.In the Source filter list, select Source tags.
In the Source tags field, enter
dataflow
.In the Protocols and Ports section complete the following tasks:
- Select Specified protocols and ports.
- Select the TCP checkbox.
- In the Ports field, enter
12345-12346
.
Click Create.
Create a Cloud NAT gateway
To help enable secure outbound connections between Google Cloud and Datadog, create a Cloud NAT gateway.
In the Google Cloud console, go to the Cloud NAT page.
In the Cloud NAT page, click Create Cloud NAT gateway.
In the Gateway name field, provide a name for the gateway.
In the NAT type section, select Public.
In the Select Cloud Router section, in the Network list, select your network from the list of available networks.
In the Region list, select the region that contains your Cloud Router.
In the Cloud Router list, select or create a new router in the same network and region.
In the Cloud NAT mapping section, in the Cloud NAT IP addresses list, select Automatic.
Click Create.
Create logging and Pub/Sub infrastructure
Create Pub/Sub topics and subscriptions to receive and forward your logs, and to handle any delivery failures.
In the Google Cloud console, go to the Create a Pub/Sub topic page.
In the Topic ID field, provide a name for the topic.
- Leave the Add a default subscription checkbox selected.
Click Create.
To handle any log messages that are rejected by the Datadog API, create an additional topic and default subscription. To create an additional topic and default subscription, repeat the steps in this procedure.
The additional topic is used within the Datadog Dataflow template as part of the path configuration for the
outputDeadletterTopic
template parameter.
Route the logs to Pub/Sub
This deployment describes how to create a project-level
Cloud Logging log sink.
However, you can also create an
organization-level aggregated sink
that combines logs from multiple projects. Set the includeChildren
parameter
on the organization-level sink:
In the Google Cloud console, go to the Create logs routing sink page.
In the Sink details section, in the Sink name field, enter a name.
Optional: In the Sink description field, explain the purpose of the log sink.
Click Next.
In the Sink destination section, in the Select sink service list, select Cloud Pub/Sub topic.
In the Select a Cloud Pub/Sub topic list, select the input topic that you just created.
Click Next.
Optional: In the Choose logs to include in sink section, in the Build inclusion filter field, specify which logs to include in the sink by entering your logging queries.
For example, to include only 10% of the logs with a severity level of
INFO
, create an inclusion filter with severity=INFO AND sample(insertId, 0.1)
.For more information, see Logging query language.
Click Next.
Optional: In the Choose logs to filter out of sink (optional) section, create logging queries to specify which logs to exclude from the sink:
- To build an exclusion filter, click Add exclusion.
- In the Exclusion filter name field, enter a name.
In the Build an exclusion filter field, enter a filter expression that matches the log entries that you want to exclude. You can also use the
sample
function to select a portion of the log entries to exclude.To create the sink with your new exclusion filter turned off, click Disable after you enter the expression. You can update the sink later to enable the filter.
Click Create sink.
Identify writer-identity values
In the Google Cloud console, go to the Log Router page.
In the Log Router Sinks section, find your log sink and then click
More actions.Click View sink details.
In the Writer identity row, next to
serviceAccount
, copy the service account ID. You use the copied service account ID value in the next section.
Add a principal value
Go to the Pub/Sub Topics page.
Select your input topic.
Click Show info panel.
On the Info Panel, in the Permissions tab, click Add principal.
In the Add principals section, in the New principals field, paste the Writer identity service account ID that you copied in the previous section.
In the Assign roles section, in the Select a role list, point to Pub/Sub and click Pub/Sub Publisher.
Click Save.
Create credentials and storage infrastructure
To store your Datadog API key value, create a secret in Secret Manager. This API key is used by the Dataflow pipeline to forward logs to Datadog.
In the Google Cloud console, go to the Create secret page.
In the Name field, provide a name for your secret—for example,
my_secret
. A secret name can contain uppercase and lowercase letters, numerals, hyphens, and underscores. The maximum allowed length for a name is 255 characters.In the Secret value section, in the Secret value field, paste your Datadog API key value.
You can find the Datadog API key value on the Datadog Organization Settings page.
Click Create secret.
Create storage infrastructure
To stage temporary files for the Dataflow pipeline, create a Cloud Storage bucket with Uniform bucket-level access enabled:
In the Google Cloud console, go to the Create a bucket page.
In the Get Started section, enter a globally unique, permanent name for the bucket.
Click Continue.
In the Choose where to store your data section, select Region, select a region for your bucket, and then click Continue.
In the Choose a storage class for your data section, select Standard, and then click Continue.
In the Choose how to control access to objects section, find the Access control section, select Uniform, and then click Continue.
Optional: In the Choose how to protect object data section, configure additional security settings.
Click Create. If prompted, leave the Enforce public access prevention on this bucket item selected.
Create Dataflow infrastructure
In this section you create a custom Dataflow worker service account. This account should follow the principle of least privilege.
The default behavior for Dataflow pipeline workers is to use your project's Compute Engine default service account, which grants permissions to all resources in the project. If you are forwarding logs from a production environment, create a custom worker service account with only the necessary roles and permissions. Assign this service account to your Dataflow pipeline workers.
The following IAM roles are required for the Dataflow worker service account that you create in this section. The service account uses these IAM roles to interact with your Google Cloud resources and to forward your logs to Datadog through Dataflow.
Role | Effect |
---|---|
|
Allows creating, running, and examining Dataflow jobs. For more information, see Roles in the Dataflow access control documentation. |
|
Allows viewing subscriptions and topics, consuming messages from a subscription, and publishing messages to a topic. For more information, see Roles in the Pub/Sub access control documentation. |
|
Allows accessing the payload of secrets. For more information, see Access control with IAM. |
|
Allows listing, creating, viewing, and deleting objects. For more information, see IAM roles for Cloud Storage. |
Create a Dataflow worker service account
In the Google Cloud console, go to the Service Accounts page.
In the Select a recent project section, select your project.
On the Service Accounts page, click Create service account.
In the Service account details section, in the Service account name field, enter a name.
Click Create and continue.
In the Grant this service account access to project section, add the following project-level roles to the service account:
- Dataflow Admin
- Dataflow Worker
Click Done. The Service Accounts page appears.
On the Service Accounts page, click your service account.
In the Service account details section, copy the Email value. You use this value in the next section. The system uses the value to configure access to your Google Cloud resources, so that the service account can interact with them.
Provide access to the Dataflow worker service account
To view and consume messages from the Pub/Sub input subscription, provide access to the Dataflow worker service account:
In the Google Cloud console, go to the Pub/Sub Subscriptions page.
Select the checkbox next to your input subscription.
Click Show info panel.
In the Permissions tab, click Add principal.
In the Add principals section, in the New principals field, paste the email of the service account that you created earlier.
In the Assign roles section, assign the following resource-level roles to the service account:
- Pub/Sub Subscriber
- Pub/Sub Viewer
Click Save.
Handle failed messages
To handle failed messages, you configure the Dataflow worker service account to send any failed messages to a dead-letter topic. To send the messages back to the primary input topic after any issues are resolved, the service account needs to view and consume messages from the dead-letter subscription.
Grant access to the service account
In the Google Cloud console, go to the Pub/Sub Topics page.
Select the checklist next to your input topic.
Click Show info panel.
In the Permissions tab, click Add principal.
In the Add principals section, in the New principals field, paste the email of the service account that you created earlier.
In the Assign roles section, assign the following resource-level role to the service account:
- Pub/Sub Publisher
Click Save.
Create a dead-letter topic
In the Google Cloud console, go to the Pub/Sub Topics page.
Select the checkbox next to your dead-letter topic.
Click Show info panel.
In the Permissions tab, click Add principal.
In the Add principals section, in the New principals field, paste the email of the service account that you created earlier.
In the Assign roles section, assign the following resource-level role to the service account:
- Pub/Sub Publisher
Click Save.
Create a dead-letter subscription
In the Google Cloud console, go to the Pub/Sub Subscriptions page.
Select the checkbox next to your dead-letter subscription.
Click Show info panel.
In the Permissions tab, click Add principal.
In the Add principals section, in the New principals field, paste the email of the service account that you created earlier.
In the Assign roles section, assign the following resource-level roles to the service account:
- Pub/Sub Subscriber
- Pub/Sub Viewer
Click Save.
Enable the Dataflow worker service account
To access the Datadog API key secret in Secret Manager, you must first enable the Dataflow worker service account. Doing so lets the Dataflow worker service account access the Datadog API key secret.
In the Google Cloud console, go to the Secret Manager page.
Select the checkbox next to your secret.
Click Show info panel,
In the Permissions tab, click Add principal.
In the Add principals section, in the New principals field, paste the email of the service account that you created earlier.
In the Assign roles section, assign the following resource-level role to the service account:
- Secret Manager Secret Accessor
Click Save.
Stage files to the Cloud Storage bucket
Give the Dataflow worker service account access to read and write the Dataflow job's staging files to the Cloud Storage bucket:
In the Google Cloud console, go to the Buckets page.
Select the checklist next to your bucket.
Click Permissions.
In the Add principals section, in the New principals field, paste the email of the service account that you created earlier.
In the Assign roles section, assign the following role to the service account:
- Storage Object Admin
Click Save.
Export logs with the Pub/Sub-to-Datadog pipeline
Provide a baseline configuration for running the Pub/Sub to Datadog pipeline in a secure network with a custom Dataflow worker service account. If you expect to stream a high volume of logs, you can also configure the following parameters and features:
batchCount
: The number of messages in each batched request to Datadog (from 10 to 1,000 messages, with a default value of100
). To ensure a timely and consistent flow of logs, a batch is sent at least every two seconds.parallelism
: The number of requests that are being sent to Datadog in parallel, with a default value of1
(no parallelism).- Horizontal Autoscaling: Enabled by default for streaming jobs that use Streaming Engine. For more information, see Streaming autoscaling.
- User-defined functions: Optional JavaScript functions that you configure to act as extensions to the template (not enabled by default).
For the Dataflow job's URL
parameter, ensure that you
select the Datadog logs API URL that corresponds to your
Datadog site:
Site | Logs API URL |
---|---|
US1 |
https://http-intake.logs.datadoghq.com
|
US3 |
https://http-intake.logs.us3.datadoghq.com
|
US5 |
https://http-intake.logs.us5.datadoghq.com
|
EU |
https://http-intake.logs.datadoghq.eu
|
AP1 |
https://http-intake.logs.ap1.datadoghq.com
|
US1-FED |
https://http-intake.logs.ddog-gov.com
|
Create your Dataflow job
In the Google Cloud console, go to the Create job from template page.
In the Job name field, name the project.
From the Regional endpoint list, select a Dataflow endpoint.
In the Dataflow template list, select Pub/Sub to Datadog. The Required Parameters section appears.
Configure the Required Parameters section:
- In the Pub/Sub input subscription list, select the input subscription.
- In the Datadog Logs API URL field, enter the URL that corresponds to your Datadog site.
- In the Output deadletter Pub/Sub topic list, select the topic that you created to receive message failures.
Configure the Streaming Engine section:
- In the Temporary location field, specify a path for temporary files in the storage bucket that you created for that purpose.
Configure the Optional Parameters section:
- In the Google Cloud Secret Manager ID field, enter the resource name of the secret that you configured with your Datadog API key value.
Configure your credentials, service account, and networking parameters
- In the Source of the API key passed field, select SECRET_MANAGER.
- In the Worker region list, select the region where you created your custom VPC and subnet.
- In the Service account email list, select the custom Dataflow worker service account that you created for that purpose.
- In the Worker IP Address Configuration list, select Private.
In the Subnetwork field, specify the private subnetwork that you created for the Dataflow worker VMs.
For more information, see Guidelines for specifying a subnetwork parameter for Shared VPC.
Optional: Customize other settings.
Click Run job. The Dataflow service allocates resources to run the pipeline.
Validate that Datadog Log Explorer received logs
Open the
Datadog Log Explorer,
and ensure that the timeframe is expanded to encompass the timestamp of the
logs. To validate that Datadog Log Explorer received logs, search for logs with
the gcp.dataflow.step
source attribute, or any other log attribute.
Validate that Datadog Log Explorer received logs from Google Cloud:
Source:gcp.dataflow.step
The output will display all of the Datadog log messages that you forwarded from the dead-letter topic to the primary log forwarding pipeline.
For more information, see Search logs in the Datadog documentation.
Manage delivery errors
Log file delivery from the Dataflow pipeline that streams Google Cloud logs to Datadog can fail occasionally. Delivery errors can be caused by:
4xx
errors from the Datadog logs endpoint (related to authentication or network issues).5xx
errors caused by server issues at the destination.
Manage 401
and 403
errors
If you encounter a 401
error or a 403
error, you must replace the
primary log-forwarding job with a replacement job that has a valid API key
value. You must then clear the messages generated by those errors from the
dead-letter topic. To clear the error messages, follow the steps in the
Troubleshoot failed messages section.
For more information about replacing the primary log-forwarding job with a replacement job, see Launch a replacement job.
Manage other 4xx
errors
To resolve all other 4xx
errors, follow the steps in the Troubleshoot
failed messages section.
Manage 5xx
errors
For5xx
errors, delivery is automatically retried with
exponential backoff,
for a maximum of 15 minutes. This automatic process might not resolve all errors.
To clear any remaining 5xx
errors, follow the steps in the
Troubleshoot failed messages section.
Troubleshoot failed messages
When you see failed messages in the dead-letter topic, examine them. To resolve the errors, and to forward the messages from the dead-letter topic to the primary log-forwarding pipeline, complete all of the following subsections in order.
Review your dead-letter subscription
In the Google Cloud console, go to the Pub/Sub Subscriptions page.
Click the subscription ID of the dead-letter subscription
that you created.Click the Messages tab.
To view the messages, leave the Enable ack messages checkbox cleared and click Pull.
Inspect the failed messages and resolve any issues.
Reprocess dead-letter messages
To reprocess dead-letter messages, first create a Dataflow job and then configure parameters.
Create your Dataflow job
In the Google Cloud console, go to the Create job from template page.
Give the job a name and specify the regional endpoint.
Configure your messaging and storage parameters
- In the Create job from template page, in the Dataflow template list, select the Pub/Sub to Pub/Sub template.
- In the Source section, in the Pub/Sub input subscription list, select your dead-letter subscription.
- In the Target section, in the Output Pub/Sub topic list, select the primary input topic.
- In the Streaming Engine section, in the Temporary
location field, specify a path and filename prefix for temporary files in the storage bucket
that you created for that purpose. For example,
gs://my-bucket/temp
.
Configure your networking and service account parameters
- In the Create job from template page, find the Worker region list and select the region where you created your custom VPC and subnet.
- In the Service Account email list, select the custom Dataflow worker service account email address that you created for that purpose.
- In the Worker IP Address Configuration list, select Private.
In the Subnetwork field, specify the private subnetwork that you created for the Dataflow worker VMs.
For more information, see Guidelines for specifying a subnetwork parameter for Shared VPC.
Optional: Customize other settings.
Click Run job.
Confirm the dead-letter subscription is empty
Confirming that the dead-letter subscription is empty helps ensure that you have forwarded all messages from that Pub/Sub subscription to the primary input topic.
In the Google Cloud console, go to the Pub/Sub Subscriptions page.
Click the subscription ID of the dead-letter subscription that you created.
Click the Messages tab.
Confirm that there are no more unacknowledged messages through the Pub/Sub subscription metrics.
For more information, see Monitor message backlog.
Drain the backup Dataflow job
After you have resolved the errors, and the messages in the dead-letter topic have returned to the log-forwarding pipeline, follow these steps to stop running the Pub/Sub to Pub/Sub template.
Draining the backup Dataflow job ensures that the Dataflow service finishes processing the buffered data while also blocking the ingestion of new data.
In the Google Cloud console, go to the Dataflow jobs page.
Select the job that you want to stop. The Stop Jobs window appears. To stop a job, the status of the job must be running.
Select Drain.
Click Stop job.
Clean up
If you don't plan to continue using the Google Cloud and Datadog resources deployed in this reference architecture, delete them to avoid incurring additional costs. There are no Datadog resources for you to delete.
Delete the project
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
What's next
- To learn more about the benefits of the Pub/Sub to Datadog Dataflow template, read the Stream your Google Cloud logs to Datadog with Dataflow blog post.
- For more information about Cloud Logging, see Cloud Logging.
- To learn more about Datadog log management, see Best Practices for Log Management.
- For more information about Dataflow, see Dataflow.
- For more reference architectures, diagrams, and best practices, explore the Cloud Architecture Center.
Contributors
Authors:
- Ashraf Hanafy | Senior Software Engineer for Google Cloud Integrations, Datadog
- Daniel Trujillo | Engineering Manager, Google Cloud Integrations, Datadog
- Bryce Eadie | Technical Writer, Datadog
- Sriram Raman | Senior Product Manager, Google Cloud Integrations, Datadog
Other contributors:
- Maruti C | Global Partner Engineer
- Chirag Shankar | Data Engineer
- Kevin Winters | Key Enterprise Architect
- Leonid Yankulin | Developer Relations Engineer
- Mohamed Ali | Cloud Technical Solutions Developer