A serverless integration solution for Google Marketing Platform

This document introduces the main requirements and challenges of API integration and how to use Google Cloud to solve them. Using Google Marketing Platform as an example, this guide helps data engineers or architects learn about what's involved in building and deploying a serverless, general-purpose integration. It describes how to implement this solution through APIs or other programmatic integration methods such as SFTP Upload.

Many customers ask how to integrate data like customer transactions and product segmentation into Google Marketing Platform. Such data integration offers many benefits to advertisers, for example, in the following scenarios:

  • Telecom companies can identify people who want a new phone model based on browsing histories. Using segment (grouping) data, ad servers can target potential customers with relevant ads.
  • Supermarkets can use the customers' purchase histories that are stored in their sales system to deliver personalized instead of general web ads.
  • Airline companies can include detailed, up-to-the-minute flight pricing to make their ads more informative and attractive.
  • Automobile businesses can seek a lower cost per action (CPA) by incorporating data from offline transactions with customers' online activities.

Most use cases like these require some kind of API integration. This document describes an integration solution that involves the following:

  • Setting up a low-cost, serverless integration solution using the following Google Cloud services:
    • Cloud Functions for processing data in an integration
    • Pub/Sub or Cloud Storage for the data pipeline
  • Integrating data regardless of volume or frequency.
  • Completing the advertising cycle by sending offline conversion data or segment insights using APIs built into the solution.

For more information on how you can extend this solution to integrate with APIs that are not built into this solution, see some examples of API handlers.

The article assumes that you're familiar with the following technologies and concepts:

  • Cloud Functions
  • Pub/Sub
  • Cloud Storage
  • API integration and management

Terminology

Event-driven programming: A programming paradigm in which events, such as user activities or execution results from other programming threads, determine the flow of a program's execution.

Message-oriented middleware (MOM): The software or hardware infrastructure that supports the message-oriented pattern of application communication. That pattern involves passing self-contained units of information (such as messages) between applications by using a communication channel, usually in an asynchronous way.

Ad server: The web server used by ad serving platforms to deliver ad creatives to ad slots on a publisher's properties. Ad servers usually include features that help you select, count, and serve ad creatives.

Conversion: The completion of a meaningful (advertiser-specified) user action as the result of an ad. To an advertiser, a meaningful action could be a purchase, sign up, page impression, or interaction with an ad.

Integration: An action or a set of actions to perform in the target system, such as loading data. This document describes a solution for integrating data with target systems.

Target system and target API: The target system receives data as part of an integration. Examples of target systems include the following:

  • Google Analytics
  • Campaign Manager
  • Search Ads 360
  • Google Ads

A target system can provide several programmatic integration methods and processes to support different integration actions. For simplicity in this document, we call them target APIs. Following are two target systems and their APIs:

  • Google Analytics
  • Campaign Manager
    • DCM/DFA Reporting and Trafficking API: This API provides programmatic access to information from your Campaign Manager account. The conversion service for the DCM/DFA Reporting and Trafficking API lets you provide information about the offline portion of these conversions directly to Campaign Manager.

When to use this solution

Integrating with target systems presents several challenges:

  • Working with external APIs can be an unfamiliar task.
  • Target APIs differ in many ways, for example:
    • In type, such as REST or SOAP
    • In authentication methods
    • In quotas, such as request size or queries per second (QPS)
  • Integration can take longer when data volume increases.
  • Managing different data states (for example, sending, succeeded, and failed) can be complex, especially when redoing a task is not easy because of the large volume of data.

Typically, it takes an advertiser one to three months to implement an API integration. Using the solution described in this document, you can deploy an automated integration in minutes with features that let you do the following:

  • Automatically send data that meets the target system's requirements and quotas for the following supported APIs:
    • Google Analytics Measurement Protocol
    • Google Analytics Management API (Data Import)
    • DCM/DFA Reporting and Trafficking API to upload offline conversions
    • SFTP: Business data uploads to Search Ads 360
    • Sheets API: Google Ads conversions scheduled uploads
    • Search Ads 360 API to upload offline conversions
  • Manage big datasets automatically. No extra effort is required when data volume increases.
  • Extend this solution to other APIs based on the functionality of the built-in APIs. Following are some examples:
    • The DCM/DFA Reporting and Trafficking API is a RESTful API. It has a QPS limit of one, and each request can contain up to 1,000 conversions.
    • Data Import supports file uploads of up to 1 GB in size with a daily limit of 50 files.
    • The Measurement Protocol is a format that lets you send raw user interaction data to Google Analytics using HTTP requests. Each request can contain up to 20 hits.

Challenges and solutions

Traditionally, a system that can integrate data with different target APIs in scaled numbers and volume requires the following:

  • Monitoring for any incoming files. Typical challenges include the following:
    • Keeping system downtime to a minimum while managing the monitoring process.
    • Encountering time delays between the creation of a file and the system check.
  • A way to send data to the target API at the right volume and pace. Typical challenges include the following:
    • Exceeding the payload size limits of the target API by sending datasets that are too large or sending datasets too frequently.
    • Exceeding the pacing quotas of the target API by sending datasets that are too small, which requires too many API calls.
    • Taking too long to complete jobs by sending datasets too infrequently.

To efficiently solve these requirements and challenges, this solution uses several Google Cloud products:

  • Cloud Functions is Google Cloud's serverless solution to build an event-driven application. Cloud Functions supports different triggers, for example, when new files are added to Cloud Storage. When you use event triggers, you automate the monitoring process and remove delays caused by time intervals between scheduled checks.
  • Pub/Sub is Google's enterprise MOM. Using Pub/Sub in this solution eliminates the need for a supervisor process. With Pub/Sub, you can design a solution with a couple of independent microservices.
  • Cloud Storage serves as the data transfer backbone for those APIs that consume whole files instead of specifically defined formats, for example, Data Import or SFTP. Cloud Storage minimizes operations overhead because it's a fully managed and scalable service that hosts object files.
  • Firestore is a fully managed, serverless, cloud-native NoSQL document database that simplifies storing, syncing, and querying data for your mobile, web, and IoT apps at global scale.
  • Identity and Access Management (IAM) offers a service account that uses key files to enable the authentication of target systems for this integration solution.

Architectural overview

The following diagram shows how the components of the integration system outlined in the preceding section work together.

Diagram that shows the architecture for an integration system.

This architecture shows the following steps:

  1. New files come into Cloud Storage and trigger the Cloud Function named Initiator.
  2. The Initiator function loads the input files from Cloud Storage and sends them as multiple messages to the Pub/Sub topic named Stacked data. The Initiator function also gets the target API and configuration name from the input filenames and makes them attributes of the sent messages.
  3. After loading all the data, the Initiator function sends a notification message to a second Pub/Sub topic named Trigger.
  4. The Cloud Function Transporter is invoked based on the notification message.
  5. The Transporter function creates a pull subscriber for the Pub/Sub topic named Stacked data. This subscriber pulls one data message from Stacked data.
  6. The Transporter function sends the data message to a third Pub/Sub topic named Data to send. After that, Transporter removes the subscriber created in step 5 and then quits.
  7. The Pub/Sub event-based Cloud Function API Requester is invoked with the incoming data message sent by Transporter.
  8. API Requester sends the data through a target API to the target system with the defined formats and frequency based on the API and configuration in the attributes.
  9. After all data in this message is sent out, the API Requester function sends a notification message to the Pub/Sub topic named Trigger (like step 3).

This architecture offers the following benefits:

  • Flexibility. When you use the combination of the Initiator and Transporter functions, along with multiple Pub/Sub topics, you create a serverless system that can break big data into smaller pieces as needed. The functions are flexible enough to support other APIs besides the ones covered in this document.
  • Agility. The Cloud Function API Requester sends out a small piece of data to a target API. Because the function is a single-purpose function, it minimizes the effort required to extend to new APIs.

Where to start

An API integration involves three main parts, which are explained in this section:

  • The integration system.
  • The target system and target API, which you prepare in several ways.
  • The data to send, which is the data that is sent to the target system.

The integration system

To see this design implemented, see an open source demonstration on GitHub.

To learn how to install the solution and run your first test, see Deploying a serverless integration solution based on Cloud Functions and Pub/Sub.

The target system and target API

You need to know where the data comes from, where the data is going (the target system), and how the data gets there (the target API). Then you prepare the target API in several ways:

  • Creating accounts in target systems that have permissions to fulfill an integration through target APIs, if necessary.
  • Setting up the target system, for example, creating a Data Import entry in Google Analytics or a Floodlight tag in Campaign Manager.
  • Saving these configurations to where the integration system can access them.
  • Preparing the API Requester functions if you're using target APIs that are not built in to this solution.

The steps for setting up target systems vary from system to system. You can find more details from the GitHub demonstration. This demonstration comes with a couple of APIs. You can use those APIs without writing any code. For the list of supported APIs, see Deploying a serverless integration solution based on Cloud Functions and Pub/Sub. To extend this solution for other APIs, you need to write the code to send the API requests. You can see the examples of supported APIs to create new API requesters.

The files to be sent

To prepare the files to be sent, you need to save the files into the target Cloud Storage folder.

This design offers flexibility by letting you integrate different target APIs with different configurations simultaneously. In order to automate the data integration process, we suggest adopting a filename convention that maps the data to the target API automatically.

We offer a filename convention in the GitHub demonstration. For the details on this convention, see Deploying a serverless integration solution based on Cloud Functions and Pub/Sub.

What's next