Data Analytics

How to build customer 360 profiles using MongoDB Atlas and Google Cloud for data-driven decisions

October 31, 2022

Venkatesh Shanbhag

Solutions Architect, MongoDB

Maruti C

Global Partner Architect, Google

One of the biggest challenges for any retailer is to track an individual customer’s journey across multiple channels (Online and In-Store), devices, purchases, and interactions.

This lack of a single view of the customer leads to a disjointed and inconsistent customer experience. Most retailers report obstacles to effective cross-channel marketing caused by inaccurate or incomplete customer data. Marketing efforts are also fragmented since the user profile data does not provide a 360˚view of customer’s experience. Insufficient information leads to lack of visibility into customer sentiment that further hinders customer engagement and loyalty.

Creating a single view of the customer across the enterprise

Helps with customer engagement and loyalty by improving customer satisfaction and retention through personalization and targeted marketing communications.
Helps retailers achieve higher marketing ROI by aggregating customer interactions across all channels and identifying and winning valuable new customers, resulting in increased revenues.

https://storage.googleapis.com/gweb-cloudblog-publish/images/1_MongoDB_Atlas.max-700x700.jpg

360˚ is a relationship cycle that consists of many touch points where a customer meets the brand. The customer 360˚ solution provides an aggregated view of a customer. It collects all your customer data in one place, from customer’s primary contact information to their purchasing history, interactions with customer service, and their social media behavior.

Single view of customer data records and processes:

Behavior Data: Customer behavior data, including the customer's browsing and search behavior online through click-stream data and the customer’s location if the app is location-based.
Transactional Data: The transactional data includes online purchases, coupon utilization, in-store purchases, returns and refunds.
Personal Information: Personal information from online registration, in-store loyalty cards and warranties will be collated into a single view
User Profile Data: Data profiling will be used as a part of the matching and deduplication process and establish a Golden Record. Profile segments can be utilized to enable marketing automation.

An enhanced customer 360˚ solution with machine learning models can provide retailers with key capabilities for user based personalization like generating insights and orchestrate experiences for each customer.

On October 1st 2022, we announced Dataflow templates that simplify the moving and processing of data between MongoDB Atlas and BigQuery.

Dataflow is a truly unified stream and batch data processing system that's serverless, fast, and cost-effective. Dataflow templates allow you to package a Dataflow pipeline for deployment. Templates have several advantages over directly deploying a pipeline to Dataflow. The Dataflow templates and the Dataflow page make it easier to define the source, target, transformations, and other logic to apply to the data. You can key in all the connection parameters through the Dataflow page, and with a click, the Dataflow job is triggered to move the data to BigQuery.

BigQuery is a fully managed data warehouse that is designed for running analytical processing (OLAP) at any scale. BigQuery has built-in features like machine learning, geospatial analysis, data sharing, log analytics, and business intelligence.

This integration enables Customers to move and transform data from MongoDB to BigQuery for aggregation and complex analytics. They can further take advantage of BigQuery’s Built-in ML and AI integrations for predictive analytics, fraud detection, real-time personalization, and other advanced analytics use cases.

This blog talks about how Retailers can use fully managed MongoDB Atlas and Google Cloud services to build customer 360 profiles , the architecture and the reusable repository that customers can use to implement the Reference Architecture in their environments

As part of this reference architecture, we have considered four key data sources - user’s browsing behavior, orders, user demographic information, and product catalog. The diagram below illustrates the data sources that are used for building a single view of the customer, and some key business outputs that can be driven from this data.

https://storage.googleapis.com/gweb-cloudblog-publish/images/2_MongoDB_Atlas.max-1500x1500.jpg

The technical architecture diagram below shows how MongoDB and Google Cloud can be leveraged to provide a comprehensive view of the customer journey.

https://storage.googleapis.com/gweb-cloudblog-publish/images/3_MongoDB_Atlas.max-1900x1900.jpg

The Reference Architecture consists of the following processes:

1. Data Ingestion

Disparate data sources are brought together in the data ingestion phase. Typically we integrate a wide array of data sources, such as Online Behavior, Purchases (Online and In-Store), Refunds, Returns and other enterprise data sources such as CRM and Loyalty platforms.

In this example, we have considered four representative data sources:

User profile data through User Profiles
Product Catalog
Transactional data through Orders
Behavioral data through Clickstream Events

User profile data, product catalog, and orders data are ingested from MongoDB, and click-stream events from web server log files are ingested from csv files stored on Cloud Storage.

The data ingestion process should support an initial batch load of historical data and dynamic change processing in near real-time. Near real-time changes can be ingested using a combination of MongoDB Change Streams functionality and Google PubSub to ensure high throughput and low latency design.

2. Data Processing

The data is converted from the the document format in MongoDB to the row and column format of BigQuery and loaded into BigQuery from MongoDB Atlas using the Google Cloud Dataflow Templates and Cloud Storage Text to BigQuery Dataflow templates to move CSV files to BQ.
Google Cloud Dataflow templates orchestrate the data processing and the aggregated data can be used to train ML models and generate business insights. Key analytical insights like product recommendations are brought back to MongoDB to enrich the user data.

3. AI & ML

The reference architecture leverages the advanced capabilities of Google Cloud BigQueryML and Vertex AI. Once the data is in BQ, BigQueryML lets you create and execute multiple machine learning models, but for this reference architecture, we focussed on the below models.

K-means clustering to group data into clusters. In this case it is used to perform user segmentation.
Matrix Factorization to generate recommendations. In this case, it is used to create product affinity scores using historical customer behavior, transactions, and product ratings.

The models are registered to Vertex AI Model Registry and deployed to an endpoint
for real-time prediction.

4. Business Insights

Using the content provided in github repo, we showcase the Analytics capabilities of Looker, which is seamlessly integrated with the aggregated data in BigQuery and MongoDB, providing advanced data visualizations that enable the business users to slice and dice the data and look for emerging trends. The included dashboards contain insights from MongoDB and from BigQuery, and from combining the data from both sources.

The detailed implementation steps, sample datasets and the Github repository for this reference architecture are available here.

There are many reasons to run MongoDB Atlas on Google Cloud, and one of the easiest is our self-service, pay-as-you-go listing on Google Cloud Marketplace. Please give it a try and let us know what you think. Also, check this blog to learn how Luckycart is able to handle large volumes of data and carry out complex computations it requires to deliver ultra-personalized activations for its customers using MongoDB and Google Cloud.

^{We thank the many Google Cloud and MongoDB team members who contributed to this collaboration. Thanks to the team at PeerIslands for their help with developing the reference architecture.}

Posted in