Architecture: Marketing Data Warehouse

Last reviewed 2022-03-23 UTC

This document provides a reference architecture that describes how you can build scalable marketing data warehouses. Marketing data warehouse solutions let you deliver timely, targeted, and tailored advertising experiences to your users while respecting their privacy. This document is for data engineers, data scientists, or IT members in a marketing role who support marketing analytics.

Implementing a marketing data warehouse helps you to address the following business needs:

Comprehensive insights: If you use multiple software as a service (SaaS) platforms, you can use this architecture to consolidate marketing and advertising data in BigQuery. If you're a business stakeholder, you can get real-time insights into marketing and business performance.
Marketing innovation: If you're a data scientist or data engineer, you can build machine learning (ML) models for business needs like customer segmentation, lifetime value of the customer, product recommendations, and purchase predictions. You can activate these models across multiple platforms, such as email marketing or advertisement targeting.
Customer experience: A marketing data warehouse provides higher visibility into customer preference, so that you can improve your customers' experience through accurate personalization. Gaining this insight lets you personalize your customers' interaction points, like first-party applications, websites, online advertisements, and email marketing.

Architecture

The following diagram shows a typical marketing analytics reference architecture on Google Cloud that uses multiple data analytics and ML products.

Marketing analytics reference architecture on Google Cloud.

The diagram shows the following stages in a marketing data warehouse workflow that you can configure:

Data ingestion
Data processing
Machine learning
Insights and activation

Architecture components

This section describes the stages in a marketing data warehouse solution, including the necessary technology components.

Data ingestion

The first stage of building a marketing data warehouse is to consolidate your data in a central location. You can ingest data from the following data sources:

Google and SaaS platforms: You can ingest data sources, such as Google Analytics, Google Ads, and Google Marketing Platform, into the Google Cloud marketing data warehouse in BigQuery. To import data from sources like Salesforce, SaaS connectors are available in Google Cloud and through our partners.
Public clouds: You can use BigQuery Data Transfer Service to import data from other public clouds. For example, to move data from Amazon S3 to BigQuery, you can automatically schedule and manage recurring load jobs. You can also use BigQuery Omni, a flexible, multi-cloud analytics solution that lets you analyze data across Google Cloud and Amazon Web Services.
APIs, flat files, and on-premises first-party data: You can ingest data from sources such as customer relationship management (CRM) or point of sale (POS) systems. Usually, you do this data ingestion offline by using the bq command-line tool, the BigQuery API, or the Google Cloud console. You can load data locally or from Cloud Storage. For large datasets, we recommend that you use Cloud Storage to optimize your bandwidth use, network speeds, and product integration. To load data on an event-driven basis into BigQuery, you can set Cloud Function triggers. For example, set triggers based on new data availability.

Most of the preceding ingestion approaches use batch loads. If you want to ingest any streaming datasets into BigQuery, you can use the streaming capabilities of BigQuery. For streaming analytics use cases, see streaming analytics solutions.

Data processing

After you ingest data, you can process the data, if required. This stage is only required when you need to process the data before running queries against it. Data processing includes cleaning and reformatting to provide consistency in big datasets. You can use data processing products available within Google Cloud.

Based on who your users are, select the appropriate Google Cloud product. For example, consider the following user types and recommended products:

Developers who build extract, transform, and load (ETL) data pipelines can use the Cloud Data Fusion data integration product. Cloud Data Fusion has a UI that lets you deploy your ELT (extract, load, and transform) and ETL data pipelines without code.
Data engineering teams that support marketing analytics can use Dataflow. Dataflow lets you ingest and analyze both batch and streaming data sources at scale.
Data analysts can Dataprep by Trifacta, which lets you visually explore, clean, and prepare data for analysis in BigQuery.

Machine learning

After your system ingests and processes the data, you can use Google AI Platform product options for the following use cases:

Descriptive analytics on how frequency affects conversion per user per campaign: This information helps you to adapt the targeting frequency of your remarketing campaigns based on a specific list of users. BigQuery has access to raw Campaign Manager 360 data, which makes this information available.
Predictive analytics on lifetime value for specific users: When you predict the value of specific groups of users, you can run marketing campaigns to increase sales. For example, you might discover that a group of users with limited brand engagement has a high potential of buying if the users are more engaged. You gain this insight through joining data and using ML to build customer segments and predict a lifetime value amount.
Prescriptive analytics on product sentiment: To help prevent inaccurate targeting, you can analyze the evolution of text comments and ratings. This analysis lets you forecast how a user group might receive a product that has certain characteristics. For example, to predict sentiment, you can use sentiment analysis and customer segmentation.

With the consolidated marketing data in BigQuery, you can pick an AI Platform product that suits your needs. Choose one of the following products based on your organization's ML maturity and skill set:

If your organization is unfamiliar with ML, you can build and deploy customer ML models with AutoML. For example, you can use AutoML Tables to build regression and classification models, such as likelihood to churn and lifetime value of the customer.
If your organization has SQL skills, BigQuery ML lets you use SQL constructs to create, evaluate, and predict models, such as audience segmentation models. You can train and deploy many supported models, and execute ML workflows without moving data from BigQuery.
If your organization has a team of data scientists, you can use Vertex AI to build and deploy optimized models at scale. For an example of how to use Vertex AI to solve for customer lifetime value, see Predicting Customer Lifetime Value with AI Platform.

Insights and activation

You can use Google Cloud options to get insights from consolidated advertising and marketing data. You can then bring data (such as differentiated segments) back into platforms like Google Analytics and email marketing. Google Cloud provides multiple ways to act on your data based on your needs. For example, you can take your differentiated segments back into your preferred channels, such as Google Analytics or Salesforce.

Looker for the Google Marketing Platform

You can also review and share insights through Looker, an enterprise business intelligence (BI) platform. You can use Looker to combine multiple datasets, track cross-channel customer behavior, and segment customers by attributes.

You can use Looker to generate the following insights:

Return on investment (ROI) analysis: Understand the spend and revenue generated by campaigns.
Alerting: Set up custom rules to receive email alerts when tactics or advertisements go wrong.
Cross-channel attribution analysis: Identify trends in customer behavior between and across your marketing channels.
A/B testing: Analyze how your variations can influence key user behavior based on statistically significant results.
Acquisition channels: Track where new leads and customers are coming from.
Cohort analysis: Segment your data and analyze how different segments behave over time.

The blocks and actions in Looker provide a foundation of robust and shareable analytics for Google Marketing Platform advertising and web data. These customizable blocks and actions offer interactive data exploration, new slices of data that include light ML predictions, and activation paths back to Google Marketing Platform. Activation paths let you use your first-party data to target audiences effectively.

The following diagram shows how Google products can work with Looker.

How products work together with Looker.

The diagram shows how Looker can use insights from Google Analytics 4, Google Analytics 360, Campaign Manager 360, and any data in BigQuery to create real-time reporting. You can activate first-party data from Looker in Google Marketing Platform with Actions for Ads (through Customer Match) and Analytics (through Data Import). Horizontal services, like IAM, continuously monitor your marketing data pipelines.

Custom integrations

You can also use Google Cloud to build custom integrations to push the data back into platforms of your choice. For example, you can run scheduled queries to generate audience lists with your Analytics data, and then push the data back with API calls. For example, use Cloud Functions to trigger the data push after a new segment is ready in Cloud Storage.

What's next

Learn more about our Marketing Analytics solution.
Learn about BigQuery Data Transfer Service and its standard queries for various Google products.
For more reference architectures, diagrams, and best practices, explore the Cloud Architecture Center.