The switch to self-service marketing analytics at zulily: best practices for using Tableau with BigQuery
By Sasha Bartashnik, Marketing Analytics Manager, zulily
Editor’s note: Today’s post comes from Sasha Bartashnik, zulily’s Marketing Analytics Manager. zulily is a $1.5 billion fast-growing e-commerce company that connects families with special finds like clothing, home decor, and toys.
Every day zulily launches more than 9,000 product styles and 100 new sales, converting thousands of customers and processing millions of user actions. As a company with a customer first mindset, we’re always seeking to better understand our customers and optimize their experience. With so much data streaming in from internal and external sources, quickly accessing the data necessary to make marketing decisions was getting complicated and time-consuming for the average business user. As a result, analysts were spending more time writing code to answer ad hoc data requests than providing true actionable insights to drive the business. We needed to find a way to reduce the time it took to deliver the right message, at the right time, in the right channel, and to every customer.
In order to do this, we created a self-service marketing analytics platform which makes use of the Google BigQuery integration with Tableau. For the first time, this platform gave us a single view of data across various marketing touchpoints including Facebook Ads, Google Ads, and email, allowing us to blend data with in-house sources such as clickstream, product catalog, and order transactions to make faster and better, business decisions. This platform has made BigQuery and Tableau our marketing analytics’ team’s bread and butter—our custom implementation of these two technologies allows us to quickly analyze complex data and make recommendations to drive strategic, real-time decision making. This post will give you guidance on how you can apply some of the same strategies in your organization.
In this post, I’ll walk you through how BigQuery and Tableau have helped us better understand our customer acquisition process, as well as share our team’s top ten tips for using BigQuery data to feed intuitive and actionable visualizations in Tableau.
Moving to a self-service data platform
In my job at zulily, I lead a team of analysts and data scientists to optimize our marketing strategy using data-driven insights. We work closely with a tech team that is made up of data and software engineers. This collaboration is one of the best aspects of working in Marketing Analytics at zulily: I get to work with a talented and creative tech team. Our tech team needs to focus on constantly building new features for an ultra-scalable consumer experience that handles millions of transactions a day, across a variety of platforms, as well as creating the infrastructure and tools used to manage our sophisticated vendor and marketing platforms. This means our tech team doesn’t have time to get bogged down in the mundane operational IT requests that are commonly found in traditional enterprise IT departments. Instead, our tech team builds tools that enable and empower us to do our jobs without being reliant on tech time and prioritization. An example of this is zuFlow, a query workflow and scheduling tool for BigQuery that lets analysts across zulily generate our own complex extract-transform-load (ETL) workflows using a point-and-click web interface.
Our tech team also shares my team’s vision of using data to drive every decision, at every level in the organization. This vision allows our teams to push innovative tools and techniques within zulily that expand the ways our company uses data. So in 2014, when our tech team realized that our existing data platform—a combination of SQL server for structured data and a Hadoop cluster for unstructured data—would soon be too limiting, they had the foresight to build a new big data platform. This platform uses BigQuery as the business data warehouse to provide a highly scalable analytics service and Tableau for data access and visual analytics so we can quickly make decisions based on the output.
In our new data platform, the process is reduced to the following:
- Push all data—structured and unstructured, real-time and batch—into BigQuery.
- Marketing analysts and data scientists join multiple data sources using BigQuery’s SQL.
- Develop models on BigQuery business data marts using a variety of common data science tools (Ex. Anaconda, RStudio, Google Cloud Datalab) across on-site and Google Cloud.
- Tableau is zulily’s reporting and visual analytics platform. Marketing analysts and data scientists use Tableau for self-service analytics on data stored in BigQuery as well as to report insights from the models. This helps us make real-time decisions every day, and offers long-term strategic insights.
Enabling marketing decisions with self-service analytics
The integration of Tableau and BigQuery enables two key efficiency improvements due to the self-service model:
- Allows the analytics team to move fast on acquiring, ingesting, and using data to build reports and models without needing to involve IT in everyday activities.
- Provides business users with real-time access to key data used to make decisions quickly, without needing to involve analysts to generate basic insights.
Because all data lives in BigQuery tables, my team can query data of practically any size and at any level of granularity, or slice it to create features to feed our models as we wish. We can then further explore this data, and build reports and visualizations in Tableau to share data and models with end users. With Tableau, business users are also able to drill down to key metrics in order to understand the performance of all ads, programs, and channels.
To achieve these two above efficiency improvements, the marketing analytics team needed to create a single and centralized view of the customer. First, we combine all the marketing data the tech team drops into BigQuery—including cost and click data for Facebook Ads, Google Ads, email sends, and all other customer marketing touchpoints. Then we tie it to clickstream, order transactions, and other metadata about customer actions at the customer and ad level using BigQuery SQL. Prior to BigQuery, combining such massive amounts of data was unreasonably time-intensive. Now, we get results within minutes or seconds, depending on the query. This speed enables the team to create a data model based on all aspects of customer behavior, including features like customer spend over time, frequency of engagement with email, and browsing habits on site. We can now tie this comprehensive customer data model back to the specific advertising and marketing campaigns that drives the customer behavior.
Unlocking customer acquisition insights with Google BigQuery and Tableau
With this self-service approach and customer model in place, not only is our tech team freed to focus on higher value items like building a robust system architecture for optimizing ads in real-time, but marketing analytics also has the opportunity to go deeper in areas that are key to understanding our business. Since Tableau lets our business users ask and answer many of their own questions, or make their own data manipulations, our team is relieved of the daily back and forth of report updates or simple data change requests. Now, we can use the zulily data platform for developing advanced machine learning models that help the business drive customer acquisition and better understand the customer experience.
With all of zulily’s data now in BigQuery, we are able to build a robust machine learning model to predict customer lifetime value (LTV) using a variety of customer behaviors as inputs, and tie the results to specific marketing campaigns to measure long-term performance. To accomplish this, we assign a historical lifetime value to find existing high value customers. We then use gradient boosting to consider over 1,000 transactional and behavioral variables and test hundreds of models. We eventually reduce to around 30 key features that are contributing factors for predicting the lifetime value of a customer.
The end result is a model that can predict with very high accuracy the likelihood that a new zulily shopper will have a high lifetime value. We are then able to tie these predictions to marketing datasets living in BigQuery and provide the combined marketing performance data in highly dynamic and customizable dashboards, internally dubbed the Channel Metrics Dashboard (CMD), on our Tableau server. The CMD allows marketing channel manager and specialists easily generate their own insights using historical and predicted data by selecting the metrics, date ranges, and even granularity they need by creating custom graphs and reports right on the server. This solution enables the zulily marketing team to quickly make decisions that optimize ads, emails, and offers towards customers who best respond to zulily engagements.
When we moved over to this new data platform, we had a lot of help and support from both our zulily tech team as well as Google Cloud engineers and Tableau customer engineers that provided some tips and and tricks for integrating BigQuery with Tableau. Below are some best practices we learned, which you may want to try when building your own self-serve analytics platform with BigQuery and Tableau:
zuily’s top 10 Tableau-on-BigQuery tips
- Reduce latency by using Tableau Server on Google Compute Engine.
In a traditional model, regions are separate VPCs and you have to cross the Internet to reach them. Running your Tableau server on Compute Engine leverages the same network Google uses for applications like Search, YouTube, and Gmail. That means your Tableau Server talks to BigQuery over Google’s private backbone without going to the Internet, and without any additional setup.
|Traditional VPC||Google Cloud|
Compute Engine also offers helpful features like live migration which can greatly increase Tableau Server’s uptime since your Tableau Server virtual machine keeps running even when a host event occurs. Tableau administrators can also take advantage of custom instance sizing, which means you can right-size your Tableau deployments without having to over-provision to a “one size fits all” instance size.
- Use Federated Sources and point Tableau at BigQuery.
Tableau users often find this model particularly useful when they need to produce manually updated spreadsheet data in conjunction with their their source data systems in reports.In scenarios, where you want to combine Google BigQuery data with data from non-Google data sources for ad hoc exploration, leverage Tableau’s cross-database joins feature.
- Create tables in BigQuery that match your dashboard requirements.
In BigQuery, data storage is inexpensive! Take advantage of this by creating additional tables that can be queried by your dashboard without having to process the data. In other words, if you know what data will go into your dashboard, materialize those `JOIN`s and pre-aggregate anything that your business users won’t need to drill into. You’ll also want to unnest any nested and repeated fields that you might otherwise have trouble getting into Tableau’s flat data reporting structure.
For ongoing updates of these tables, Google Apps Script has access to the BigQuery API and can be a quick and easy way to schedule BigQuery queries on an automated schedule.
Keep in mind that in addition to properly structuring the underlying data, the design of the dashboard itself can also play a major role in performance. This whitepaper offers some excellent tips on how you can use clever design principles to create workbooks that are visual, interactive, fast and easy to understand.
- Take advantage of Cloud Datalab to structure data for Tableau.
When you need to do the data preparation described in the previous step, make use of Google Cloud Datalab. This is a powerful interactive tool based on Jupyter notebooks that can be used to explore, analyze, transform data and build machine learning models. It connects to BigQuery seamlessly and can easily be stopped and started when you need to use it.
- Process large datasets with a live connection in BigQuery.
When connecting to BigQuery from Tableau, you will want to take advantage of BigQuery’s ability to process large datasets and only bring results across the network. You should default into setting your Tableau connection against BigQuery as “Live” unless you have a specific reason to extract the data.
With this live connection, you may also want to consider turning off automatic updates. When building a dashboard, as you place fields on a shelf, Tableau generates the view by automatically querying the data source. If you are creating a dense data view, the queries might be time consuming and significantly degrade system performance. In this case, you can instruct Tableau to turn off queries while you build the view. You can then turn queries back on when you are ready to see the result.
- Take advantage of BigQuery’s caching when analyzing data in Tableau.
Often, before we’re ready to build a dashboard, we use Tableau to do interactive analysis and “play” with the data in order to develop an understanding of it before trying to make it consumable by business users. During this phase, you’ll want to take advantage of BigQuery’s caching.
BigQuery writes all query results to a table which is either explicitly identified by the user (a destination table) or is a temporary, cached results table. Once data is read, it will be reused for about 24 hours. Take advantage of this in your analysis phase by first running a very wide scoped query that contains all the data you plan to analyze. For example, let’s say you plan to run an analysis across three three types of records: a, b, and c. Instead of running three separate queries for
WHERE col1 = ‘a’,
WHERE col1 = ‘b’, and
WHERE col1=’c’, simply start your analysis process by running a query against your source table and output all the records into a cached destination table.
SELECT col1 FROM source WHERE col1=’a’ OR col =’b’ OR col1=’c’
- Use Tableau’s Custom SQL feature to take advantage of partitions.
BigQuery supports partitioning tables by date. However, at time of writing, Tableau does not recognize the pseudo column necessary for properly filtering by the date-based timestamp. To get around this, when working with partitioned tables in Tableau, you should use Tableau’s Custom SQL option to specify the date range of interest, in order to avoid scanning the entire table.
- Limit data transfer with predicate pushdown.
When building your Tableau dashboard, you should add context filters before adding fields to the report. As long as your have “live connection” set to On, this will push the filter (`WHERE` clause) to BigQuery. While this will not change the amount of data that gets scanned in BigQuery, it will greatly reduce the data that needs to come back over the network.
If you know that a particular workbook will always have certain filters applied, you can also set those at the data source level.
- Build Tableau extracts outside of Tableau.
While the vast majority of tasks are better accomplished with Live Connections to Google BigQuery, there may be rare situations which call for extracting the data into a Tableau Data Extract (TDE).
Since the Tableau to BigQuery connector is optimized for Live Connections, you may experience slower than usual extract times from within Tableau for larger BigQuery datasets.To vastly improve this, you should setup a process that exports the BigQuery data into something like Google Cloud Storage and then use the Tableau Extract API to write a custom extract from the flat files in Google Cloud Storage.
For an overview of reasons when you may need to use an extract over a live connection, read this post by Tableau customer and evangelist Jonathan Drummey. There is one important point to make about data extracts – they are not a replacement for data that might be in Google BigQuery. While they can be used to collect and aggregate data over time this should be used as a tactical, rather than long term, solution. Incremental updates do not support update or delete actions to records that have already been processed. Data changes other than an append, would require a full reload of the extract.
- Use ODBC if you require service accounts (Tableau server connection).
Google Cloud Service Accounts are an excellent way to provide application to application security. Authentication between Tableau Server and Google BigQuery is a common use case for this. However, at the time of this writing, service account authentication is not supported in the Tableau BigQuery connector.
However, Tableau does support generic ODBC drivers and you can use the Simba Drivers for Google BigQuery to connect to BigQuery from Tableau using service accounts.
Simply install the ODBC driver on the Tableau server or workstation of the Tableau desktop client, configure the DNS with the service account information, and use “Other Database (ODBC) in Tableau”.
I hope that this post shares the value of what we did—and what you can do—with Tableau and BigQuery in a self-service analytics model. I also hope you’ve found some tips that you can use, whether you are just getting started with Tableau and BigQuery or have already started using this combination on your own journey to self-service analytics.