Google Cloud Platform
The switch to self-service marketing analytics at zulily: best practices for using Tableau with BigQuery
Editor’s note: Today’s post comes from Sasha Bartashnik, zulily’s Marketing Analytics Manager. zulily is a $1.5 billion fast-growing e-commerce company that connects families with special finds like clothing, home decor, and toys.
Every day zulily launches more than 9,000 product styles and 100 new sales, converting thousands of customers and processing millions of user actions. As a company with a customer first mindset, we’re always seeking to better understand our customers and optimize their experience. With so much data streaming in from internal and external sources, quickly accessing the data necessary to make marketing decisions was getting complicated and time-consuming for the average business user. As a result, analysts were spending more time writing code to answer ad hoc data requests than providing true actionable insights to drive the business. We needed to find a way to reduce the time it took to deliver the right message, at the right time, in the right channel, and to every customer.
In order to do this, we created a self-service marketing analytics platform which makes use of the Google BigQuery integration with Tableau. For the first time, this platform gave us a single view of data across various marketing touchpoints including Facebook Ads, Google Ads, and email, allowing us to blend data with in-house sources such as clickstream, product catalog, and order transactions to make faster and better, business decisions. This platform has made BigQuery and Tableau our marketing analytics’ team’s bread and butter—our custom implementation of these two technologies allows us to quickly analyze complex data and make recommendations to drive strategic, real-time decision making. This post will give you guidance on how you can apply some of the same strategies in your organization.
In this post, I’ll walk you through how BigQuery and Tableau have helped us better understand our customer acquisition process, as well as share our team’s top ten tips for using BigQuery data to feed intuitive and actionable visualizations in Tableau.
Moving to a self-service data platformIn my job at zulily, I lead a team of analysts and data scientists to optimize our marketing strategy using data-driven insights. We work closely with a tech team that is made up of data and software engineers. This collaboration is one of the best aspects of working in Marketing Analytics at zulily: I get to work with a talented and creative tech team. Our tech team needs to focus on constantly building new features for an ultra-scalable consumer experience that handles millions of transactions a day, across a variety of platforms, as well as creating the infrastructure and tools used to manage our sophisticated vendor and marketing platforms. This means our tech team doesn’t have time to get bogged down in the mundane operational IT requests that are commonly found in traditional enterprise IT departments. Instead, our tech team builds tools that enable and empower us to do our jobs without being reliant on tech time and prioritization. An example of this is zuFlow, a query workflow and scheduling tool for BigQuery that lets analysts across zulily generate our own complex extract-transform-load (ETL) workflows using a point-and-click web interface.
Our tech team also shares my team’s vision of using data to drive every decision, at every level in the organization. This vision allows our teams to push innovative tools and techniques within zulily that expand the ways our company uses data. So in 2014, when our tech team realized that our existing data platform—a combination of SQL server for structured data and a Hadoop cluster for unstructured data—would soon be too limiting, they had the foresight to build a new big data platform. This platform uses BigQuery as the business data warehouse to provide a highly scalable analytics service and Tableau for data access and visual analytics so we can quickly make decisions based on the output.
In our new data platform, the process is reduced to the following:
- Push all data—structured and unstructured, real-time and batch—into BigQuery.
- Marketing analysts and data scientists join multiple data sources using BigQuery’s SQL.
- Develop models on BigQuery business data marts using a variety of common data science tools (Ex. Anaconda, RStudio, Google Cloud Datalab) across on-site and Google Cloud.
- Tableau is zulily’s reporting and visual analytics platform. Marketing analysts and data scientists use Tableau for self-service analytics on data stored in BigQuery as well as to report insights from the models. This helps us make real-time decisions every day, and offers long-term strategic insights.
Enabling marketing decisions with self-service analyticsThe integration of Tableau and BigQuery enables two key efficiency improvements due to the self-service model:
- Allows the analytics team to move fast on acquiring, ingesting, and using data to build reports and models without needing to involve IT in everyday activities.
- Provides business users with real-time access to key data used to make decisions quickly, without needing to involve analysts to generate basic insights.
To achieve these two above efficiency improvements, the marketing analytics team needed to create a single and centralized view of the customer. First, we combine all the marketing data the tech team drops into BigQuery—including cost and click data for Facebook Ads, Google Ads, email sends, and all other customer marketing touchpoints. Then we tie it to clickstream, order transactions, and other metadata about customer actions at the customer and ad level using BigQuery SQL. Prior to BigQuery, combining such massive amounts of data was unreasonably time-intensive. Now, we get results within minutes or seconds, depending on the query. This speed enables the team to create a data model based on all aspects of customer behavior, including features like customer spend over time, frequency of engagement with email, and browsing habits on site. We can now tie this comprehensive customer data model back to the specific advertising and marketing campaigns that drives the customer behavior.
Unlocking customer acquisition insights with Google BigQuery and TableauWith this self-service approach and customer model in place, not only is our tech team freed to focus on higher value items like building a robust system architecture for optimizing ads in real-time, but marketing analytics also has the opportunity to go deeper in areas that are key to understanding our business. Since Tableau lets our business users ask and answer many of their own questions, or make their own data manipulations, our team is relieved of the daily back and forth of report updates or simple data change requests. Now, we can use the zulily data platform for developing advanced machine learning models that help the business drive customer acquisition and better understand the customer experience.
With all of zulily’s data now in BigQuery, we are able to build a robust machine learning model to predict customer lifetime value (LTV) using a variety of customer behaviors as inputs, and tie the results to specific marketing campaigns to measure long-term performance. To accomplish this, we assign a historical lifetime value to find existing high value customers. We then use gradient boosting to consider over 1,000 transactional and behavioral variables and test hundreds of models. We eventually reduce to around 30 key features that are contributing factors for predicting the lifetime value of a customer.
The end result is a model that can predict with very high accuracy the likelihood that a new zulily shopper will have a high lifetime value. We are then able to tie these predictions to marketing datasets living in BigQuery and provide the combined marketing performance data in highly dynamic and customizable dashboards, internally dubbed the Channel Metrics Dashboard (CMD), on our Tableau server. The CMD allows marketing channel manager and specialists easily generate their own insights using historical and predicted data by selecting the metrics, date ranges, and even granularity they need by creating custom graphs and reports right on the server. This solution enables the zulily marketing team to quickly make decisions that optimize ads, emails, and offers towards customers who best respond to zulily engagements.
When we moved over to this new data platform, we had a lot of help and support from both our zulily tech team as well as Google Cloud engineers and Tableau customer engineers that provided some tips and and tricks for integrating BigQuery with Tableau. Below are some best practices we learned, which you may want to try when building your own self-serve analytics platform with BigQuery and Tableau:
zuily’s top 10 Tableau-on-BigQuery tips
- Reduce latency by using Tableau Server on Google Compute Engine.
Compute Engine also offers helpful features like live migration which can greatly increase Tableau Server’s uptime since your Tableau Server virtual machine keeps running even when a host event occurs. Tableau administrators can also take advantage of custom instance sizing, which means you can right-size your Tableau deployments without having to over-provision to a “one size fits all” instance size.
- Use Federated Sources and point Tableau at BigQuery.
In scenarios, where you want to combine Google BigQuery data with data from non-Google data sources for ad hoc exploration, leverage Tableau’s cross-database joins feature.
- Create tables in BigQuery that match your dashboard requirements.
Keep in mind that in addition to properly structuring the underlying data, the design of the dashboard itself can also play a major role in performance. This whitepaper offers some excellent tips on how you can use clever design principles to create workbooks that are visual, interactive, fast and easy to understand.
- Take advantage of Cloud Datalab to structure data for Tableau.
- Process large datasets with a live connection in BigQuery.
With this live connection, you may also want to consider turning off automatic updates. When building a dashboard, as you place fields on a shelf, Tableau generates the view by automatically querying the data source. If you are creating a dense data view, the queries might be time consuming and significantly degrade system performance. In this case, you can instruct Tableau to turn off queries while you build the view. You can then turn queries back on when you are ready to see the result.
- Take advantage of BigQuery’s caching when analyzing data in Tableau.
BigQuery writes all query results to a table which is either explicitly identified by the user (a destination table) or is a temporary, cached results table. Once data is read, it will be reused for about 24 hours. Take advantage of this in your analysis phase by first running a very wide scoped query that contains all the data you plan to analyze. For example, let’s say you plan to run an analysis across three three types of records: a, b, and c. Instead of running three separate queries for
WHERE col1 = ‘a’,
WHERE col1 = ‘b’, and
WHERE col1=’c’, simply start your analysis process by running a query against your source table and output all the records into a cached destination table.
SELECT col1 FROM source WHERE col1=’a’ OR col =’b’ OR col1=’c’
- Use Tableau’s Custom SQL feature to take advantage of partitions.
- Limit data transfer with predicate pushdown.
If you know that a particular workbook will always have certain filters applied, you can also set those at the data source level.
- Build Tableau extracts outside of Tableau.
Since the Tableau to BigQuery connector is optimized for Live Connections, you may experience slower than usual extract times from within Tableau for larger BigQuery datasets.To vastly improve this, you should setup a process that exports the BigQuery data into something like Google Cloud Storage and then use the Tableau Extract API to write a custom extract from the flat files in Google Cloud Storage.
For an overview of reasons when you may need to use an extract over a live connection, read this post by Tableau customer and evangelist Jonathan Drummey. There is one important point to make about data extracts – they are not a replacement for data that might be in Google BigQuery. While they can be used to collect and aggregate data over time this should be used as a tactical, rather than long term, solution. Incremental updates do not support update or delete actions to records that have already been processed. Data changes other than an append, would require a full reload of the extract.
- Use ODBC if you require service accounts (Tableau server connection).
Simply install the ODBC driver on the Tableau server or workstation of the Tableau desktop client, configure the DNS with the service account information, and use “Other Database (ODBC) in Tableau”.