Built with BigQuery: How True Fit's data journey unlocks partner growth
Senior Software Architect, True Fit
Dr. Ali Arsanjani
Director, Cloud Partner Engineering
Editor’s note: The post is part of a series highlighting our awesome partners, and their solutions, that are Built with BigQuery.
“True Fit is in a unique position where we help shoppers find the right sizes of apparel and footwear through near real time, adaptable ML models thereby rendering a great shopping experience while also enabling retailers with higher conversion, revenue potential and more importantly access to actionable key business metrics. Our ability to process volumes of data, adapt our core ML models, reduce complex & slower ecosystems, exchange data easily with our customers etc, have been propelled multi-fold via BigQuery and Analytics Hub” - says Raj Chandrasekaran, CTO, True Fit
True Fit has built the world’s largest dataset for apparel and footwear retailers over the last twelve years, connecting purchasing and preference information for more than 80 million active shoppers to data for 17,000 brands across its global network of retail partners. This dataset powers fit personalization for shoppers as they buy apparel and footwear online and connects retailers with powerful data and insight to inform marketing, merchandising, product development and ecommerce strategy.
Gathering data, correlating and analyzing it are the underlying foundation to making smart business decisions to grow and scale a retailer’s business. This is especially important for retailers that are utilizing data packages to target which consumers to market their brands and products to. Deriving meaningful insights from data has become a larger focus for retailers as the market grows digitally, competition for share of wallet increases and consumer expectations for more personalized shopping experiences continues to rise.
But, how do companies share datasets regardless of the magnitude amongst each other in a scalable manner by optimizing infrastructure costs and securely sharing data? How can companies access and use the data into their own environment without needing a complicated / time consuming process to physically move the data? How would a company know how to utilize this data to suit their own business needs? True Fit partnered with Google and the Built with BigQuery initiative to solve these questions.
Google Cloud services such as BigQuery and Analytics Hub have become vital to how True Fit optimizes the entire lifecycle of data from ingestion to distribution of data packages with its retail partners. BigQuery is a fully managed, serverless and limitless scale data warehousing solution with tighter integration with several Google Cloud products. Analytics Hub, powered by BigQuery, allows easy creation of data exchanges for producers and simplifies the discovery and consumption of the data for the consumers. Data shared via the exchanges can further be enriched with more datasets available in the Analytics Hub marketplace.
Using the above diagram, let us take a look at how the process works across different stages:
Event Ingestion - True Fit leverages Cloud Logging with Fluentd to stream logging events into BigQuery as datasets. BigQuery’s unique capability in real time streaming allows for real time debugging and analysis of all activity across the True Fit ecosystem.
Denormalization - Scheduled queries are set up to take the normalized data in the event logs and convert them into denormalized core tables. These tables are easy to query and contain all information needed to assist BI analysts and data scientists with their research without the need for complicated table joins.
Aggregations - Aggregations are created and updated on the fly as data is ingested using a mix of scheduled queries and direct BigQuery table updates. Reports are always fresh and can be delivered without ever having to worry about stale data.
Alerting - Alerts are set up all across the True Fit architecture which leverage the real-time aggregations. These alerts not only inform True Fit when there are data discrepancies or missing data but have also been configured to inform our partners when the data they provide contains errors. For example, True Fit will notify a retailer if the product catalog provided drops below specific thresholds we’ve previously seen from them. Alerts range from anything like an email, SMS message, or even a real-time toast message in a True Fit UI that a retailer is using to provide their data.
Secure Distribution - Exchange’s are created in the Analytics Hub. The datasets are published as one or more listings into the Exchange. Partners subscribe to the listing as a linked dataset to get instant access to data and act upon it accordingly. This unlocks use cases that range from everywhere from marketing; to shopper chat bots; and even real-time product recommendations based on a shopper’s browsing behavior. Analytics Hub allows True Fit to expose only the data they intend to share to specific partners using simple to understand IAM roles. Adding the built-in Analytics Hub Subscriber role to a partner’s service account on a specific listing of dataset created just for them makes it so that they are the only one to get access to that data. Gone are the days of dealing with API keys or credential management!
True Fit’s original data lake was built using Apache Hive prior to switching to BigQuery. At roughly 450TiB, extracting data from this data lake became quite a challenge to do in real-time. It took approximately 24 hours before data packages would become available to core retail partners which impacted our ability to produce reports and data packages at scale. Even after the data packages were available, partners had difficulty downloading these data packages and importing them into their own data warehouses to utilize due to the size and formats. The usefulness of the data packages would occasionally get put into question due to the data becoming stale and it was difficult to alert on any data discrepancies because of the time delay before these data packages would be available.
BigQuery has allowed True Fit to produce these same data packages in real time as events occur; unlocking new marketing opportunities. Retail partners have also praised how easily consumable Analytics Hub has made the process because the data “just appears” alongside their existing data warehouse as linked datasets.
True Fit publishes a number of BigQuery data packages for its retail partners via Analytics Hub which allows them to generate personalized onsite and email campaigns for their own shoppers in a manner far beyond the capabilities not available in the past.
Below are just a sample of ways in which True Fit partners personalize their campaigns utilizing the True Fit data packages. Partners have the ability to:
Find the True Fit shoppers of a desired category near real-time who've been browsing extra specific products in the last couple weeks
Enhance their understanding of their shopper demographic data and category affinities
Retrieve size and fit recommendations for specific in-stock products for a provided set of shoppers or have True Fit determine what the ideal set of shoppers for those products would be
Match their in-stock, limited run styles and sizes to applicable True Fit shoppers
Enhance emails and on-site triggers based on products the shopper has recently viewed or purchased across the True Fit network
If you’re a retailer looking to unlock your own growth using real-world data in real-time, be sure to check out the data packages offered by True Fit!
To learn more about True Fit on Google Cloud, visit https://www.truefit.com/business
The Built with BigQuery advantage for ISVs
Through the Built with BigQuery Program launched in April ‘22 as part of the Google Data Cloud Summit, Google is helping data-driven companies like True Fit build innovative applications on Google’s data cloud with simplified access to technology, helpful and dedicated engineering support, and joint go-to-market programs. Participating companies can:
Get started fast with a Google-funded, pre-configured sandbox.
Accelerate product design and architecture through access to designated experts from the ISV Center of Excellence who can provide insight into key use cases, architectural patterns, and best practices.
Amplify success with joint marketing programs to drive awareness, generate demand, and increase adoption.
BigQuery gives ISVs the advantage of a powerful, highly scalable data warehouse that’s integrated with Google Cloud’s open, secure, sustainable platform. And with a huge partner ecosystem and support for multi-cloud, open source tools, and APIs, Google provides technology companies the portability and extensibility they need to avoid data lock-in.
We thank the many Google Cloud and True Fit team members who contributed to this ongoing collaboration and review, especially Raj Chandrasekaran, CTO True Fit and Sujit Khasnis, Cloud Partner Engineering