The L’Oréal Beauty Tech Data Platform - A data story of terabytes and serverless
Editor's note: In Today's guest post we hear from beauty leader L'Oréal about their approach to building a modern data platform on fully managed services: managing the ingest of diverse datasets into BigQuery with Cloud Run, and orchestrating transformations into relevant business domain representations for stakeholders across the organization. Learn more about how businesses have benefited from Cloud Run in Forrester's report on Total Economic Impact.
L’Oréal was born out of science. For over 100 years, we have always shaped the future of beauty, and taken its eternal quest to new horizons. This has earned us our current position as the world’s uncontested beauty leader (~€ 32 B annual sales in 2021), present in 150 countries with over 85,000 employees.
Today, with the power of our game-changing science, multiplied by cutting-edge technologies, we continue our lifelong journey of shaping the future of beauty.
As a Beauty Tech company, we leverage our decades-long heritage of rich data assets to empower our decision-making with instant, sophisticated analysis.
Because we oversee global brands, which must adapt to local requirements, we need to maintain a deep understanding of what a brands' data represents, while managing disparate legal and regulatory requirements for different countries. Our end goal is to run a safe, compliant and sustainable data warehouse as efficiently and effectively as possible.
We sync and aggregate internal and external data from a wide variety of sources across organizations and retail stores. This made the management of our data warehouse infrastructure used to be very complex and hard to manage before Google Cloud. L’Oréal's footprint was so large that we once found it impossible to have a standardized method to handle data. Every process was vendor-specific, and the infrastructure was brittle. We went looking for a solution to our complex data infrastructure needs, and defined the following non-negotiable principles:
No Ops: The job of a developer at L’Oréal is not to manage servers. We need an elastic infrastructure that scales on demand, so that our developers can focus on delivering customized and inclusive beauty experiences to all consumers, rather than focusing on managing servers.
Secure: We have strict security and compliance requirements which vary by country, and we employ a zero-trust security strategy. We must keep both our own internal data and customer data safe and encrypted.
Sustainable : Our data lives in multiple environments, including on-prem data centers and public cloud services. We must be able to securely access and analyze this data while minimizing the complexity and environmental impact of moving and duplicating data.
End-to-end supervision: Because developers shouldn’t be managing servers, we need a “single pane of glass” dashboard to monitor and triage the system if something goes wrong.
Easy-to-deploy: Deploying code safely should not compromise velocity. We are constantly developing innovations that push the boundaries of science and reinvent beauty rituals. We need integrated tools to make our code deployment process seamless and safe.
Event-driven architecture: Our data is used globally by research, product, business and engineering teams with high expectations on data quality and timeliness. Many of our internal processes and analysis are based on near real-time data.
Data products delivered “as a service”: We want to empower our employees to drive business value at record speed. To that end, we need solutions that enable us to remove the developers from the critical path of solution delivery as much as possible.
Extract-load-transform (ELT): Our goal is to implement the pattern to load data as soon as possible into the data warehouse to take advantage of SQL transformations.
After considering multiple vendors on the market, with these principles in mind, we landed on end-to-end Google Cloud serverless and data tooling. We were already using Google Cloud for a few processes, including BigQuery, and loved the experience.
We’ve now expanded our use of Google Cloud to fully support the L’Oréal Beauty Tech Data Platform.
L’Oréal’s Beauty Tech Data Platform incorporates data from two types of sources: directly via API, which is data that adapts easily to our schema and is inserted directly into BigQuery, and bulk data from integrations, which require event-driven transformations using Eventarc mechanisms. These transformations are performed in Cloud Run and Cloud Functions (2nd gen), or directly in SQL. With Google Cloud, we can adapt very quickly.
Today, we currently have 8500 flows for ~5000 users using the native zero-trust capabilities offered by Google Cloud. Indeed, the flows come from Google Cloud and other third-party services.
BigQuery enabled us to adopt standard SQL as our universal language in our data warehouse and meet all expectations for queries and reporting. We were also able to load original data using features like federated queries, and efficiently transitioned from ETL to ELT data ingestion by handling semi-structured data with SQL. This approach of loading original data from sources into BigQuery with non-destructive transformations allows us to reprocess data for new use-cases easily, directly within BigQuery.
Our applications are hosted on multiple environments – on-premises, in Google Cloud, and in other public clouds. This made it difficult for our data engineers and analysts to natively analyze data across clouds until we started using BigQuery Omni. This capability of BigQuery allowed us to globally access and analyze data across clouds through a single pane of glass using the native BigQuery user interface itself. Without BigQuery Omni, it would’ve been impossible for our teams to natively do cross-cloud analytics. Moreover, it eliminated the need for us to move sensitive data, which is not only expensive because of local tax and subsea transport, but also incredibly risky – sometimes even forbidden – because of local regulations.
Today Google Cloud powers our Beauty Tech Data Platform, which stores 100TB of production data in BigQuery and processes 20TB of data each month. We have more than 8000 governed datasets, and 2 millions of BigQuery tables coming from multiple data sources such as Salesforce, SAP, Microsoft, and Google Ads.
For more complex transformations where custom and specific libraries are required, Cloud Workflows help us to manage the complexity very efficiently by orchestrating steps in containers through Cloud Run, Cloud Functions and even BigQuery jobs — the most used way to transform and add value to the L’Oréal data.
Additionally, by using BigQuery and Google Cloud’s serverless compute for API ingestion, bulk data loading, and post-loading transformations, we can keep the entire system in a single boundary of trust at a fraction of the cost. With ingest, queries, and transformations all being fully elastic and on-demand, we no longer have to perform capacity planning for either the compute or analytics components of the system. And of course these services' pay-as-you-go model perfectly aligns with L’Oréal's strategy of only paying for something when you use it.
Google Cloud fulfilled the requirements of our Beauty Tech Data Platform. And as if offering us a no-ops, secure, easy-to-deploy, custom-development free, event-based platform with end-to-end supervision wasn't enough, Google Cloud also helped us with our sustainability efforts.
Being able to measure and understand the environmental footprint of our public cloud usage is also a key part of our sustainable tech roadmap. With Google Cloud Carbon Footprint, we can easily see the impact of our sustainable infrastructure approach and architecture principles. Our Beauty Tech platform is a strategic ambition for L’Oréal: inventing the beauty products of the future while becoming the company of the future.
Sustainable tech is an imperative and a very important step towards this ambition of creating responsible beauty for our consumers, and sustainable-by-design tech services for our employees. We all have a role to play, and by joining forces, we can have a positive impact.
Google Cloud’s data ecosystem and serverless tools are highly complementary, and made it possible to build a next-generation data analytics platform that met all our needs.