How RealTruck drives data reliability and business growth with Masthead and BigQuery
Jobin George
Staff Technical Solutions Architect, Google Cloud
Yuliia Tkachova
Co-founder & CEO, Masthead Data
One of the challenges organizations face today is harnessing the potential of their collected data. To do so, you need to invest in powerful data platforms that can efficiently manage, control, and coordinate complex data flows and access across various business domains.
RealTruck, a leader in aftermarket accessories for trucks and off-road vehicles, stands out for its omnichannel approach, which successfully integrates over 12,000 dealers and a robust online presence at RealTruck.com. Operating from 47 locations across North America, the company initially faced significant data challenges due to its extensive offline network and diverse customer touchpoints. To address these complexities, the data team at RealTruck decided to develop a data platform that could serve as a source of truth for executives and every manager in the organization, providing data to support business decision-making. The goal was to gain visibility into and control over all collected assets, monitor data flows, manage costs, and ensure the high reliability of the data platform.
RealTruck’s data team chose BigQuery as the center element of their data platform for its high security standards, scalability, and ease of use. As a serverless data platform, BigQuery allows the team to focus on strategic analysis and insights rather than on managing infrastructure, thereby enhancing their efficiency in handling large volumes of data.
RealTruck data is gathered from various sources, including manufacturers, dealers, marketing campaigns, web and app customer interactions, and sales transactions. This data, along with the company’s data pipelines, vary in format, structure, and cadence. The diversity and number of external data sources present significant maintenance challenges and operational complexity.
RealTruck also added Masthead Data, a Google Cloud Ready partner for BigQuery, to help its data team identify any pipeline or data issues that affect business users or data consumers. When selecting a partner to integrate with BigQuery, RealTruck needed the ability to monitor for errors in other solutions used to build its data platform, which could result in downtime. This included Cloud Storage, BigQuery Data Transfer Service, Dataform, and other Google Cloud services.
Together, BigQuery and Masthead enabled RealTruck’s data team to deliver on two of its biggest commitments — ensuring the accuracy of the company’s data and resolving any doubts about the performance of data pipelines.
Mastering data platform complexity: Visibility, cost efficiency, and anomaly detection
As RealTruck began building out its data platform with BigQuery, the data team realized that there were still some issues around complexity that needed to be solved.
- Limited visibility of pipeline performance: Ingesting data into the platfrom from numerous sources using various solutions made it difficult to track pipeline failures or data system errors. This limitation hindered RealTruck's ability to maintain reliable data.
- Cost control: BigQuery enabled the data team to develop a decentralized data platform, boosting agility to create data pipelines and assets. However, this approach requires more refined management of resources to ensure cost-effectiveness, given the scalable processing power. To sustain efficiency, the team sought granular visibility into every process and its associated costs.
- Anomaly detection across the data platform: Tables in BigQuery are regularly used as sources for data products, requiring vigilant monitoring for issues like freshness, volume spikes, or missing values. The ability to automatically identify outliers or unexpected behavior is key for building trust in the data platform among business users.
Masthead and BigQuery: Achieving data platform reliability for RealTruck
To overcome these challenges, RealTruck implemented Masthead Data to enhance the reliability of BigQuery data pipelines and assets in its data platform.
Masthead provided visibility into potential syntax errors and system issues caused by using various ingestion tools. Automating observability enabled RealTruck to detect pipeline or data environment issues in real time, allowing the team to address them before they impacted downstream data products or platform users.
For example, Masthead provided real-time alerts and robust column-level lineage and data dictionary features to help troubleshoot downtime in the data platform within minutes. As a result, the RealTruck data team was able to trace an error or anomaly and assess the full impact of it on pipelines or BigQuery tables. Column-level lineage also made it easier for the team to respond quickly and collaborate more effectively when resolving issues.
In addition, Masthead's unique approach of using logs to monitor time-series tables for freshness, volume, and schema changes allowed RealTruck to have an overarching view of the health of all its BigQuery tables without increasing compute costs. Masthead also integrates with Google Dataplex, enabling the RealTruck team to implement rule-based data quality checks to catch any anomalies in metrics.
RealTruck also leveraged Masthead’s Compute Cost Insights for BigQuery to gain granular visibility into BigQuery storage and pipeline costs as well as any third-party solutions used in the data platform. These features have helped the data team identify and cleanup orphan processes and expired assets, making the costs of the data platform more manageable and transparent.
"One of the main reasons RealTruck chose Masthead was its unique architecture, which does not access our data. This was a critical factor in our decision, especially given our ambitious global growth plans and the increasingly complex data privacy regulations worldwide. Masthead, as a Google Cloud Partner, complimentary to Google Cloud BigQuery, is compliant with data privacy and security regulations at the architectural level, ensuring that our data remains secure,aligning perfectly with our strategic objectives.
The ability to achieve comprehensive observability of all our BigQuery data pipelines and tables through a no-code integration, which was set up in just 15 minutes and began delivering value within a few hours, has been transformative. It has enabled the RealTruck team to gain valuable insights into pipeline costs and data flows swiftly across our entire data platform, reinforcing the reliability and strategic value of our data-driven initiatives." – Chris Wall, Director of BI & Analytics, RealTruck
Google Cloud has become the backbone of RealTruck’s data infrastructure, providing efficient data governance and management with minimum configuration required. BigQuery offers Google Cloud’s world-class default encryption and sophisticated user access management features, which allows RealTruck to distribute, store, and process its data with confidence, knowing its data is secure.
Masthead’s approach to processing logs and metadata also aligns well with Google Cloud’s approach to security and privacy, offering a single view of pipeline and data health across RealTruck’s entire data environment. This consolidated view has enabled the data team to shift from ad-hoc solutions to making strategic improvements to the data platform. This enhanced perspective has been vital for building a data platform that business users trust, allowing RealTruck to efficiently tackle data errors and manage costs. The efficient use of BigQuery in combination with Masthead significantly reduced the risk of unnoticed issues impacting business operations, reinforcing the importance of data in decision-making.
If you’re interested in using Masthead Data with BigQuery, visit the Google Cloud partner directory or Masthead Data’s Marketplace offerings. We also recommend checking out Google Cloud Ready - BigQuery to learn more about our Google Cloud Ready partners.