What is data mesh?

Data mesh is an architectural framework for managing data in complex organizations. Unlike centralized models, data mesh decentralizes data ownership to domain-specific teams. This approach can help to eliminate bottlenecks by treating data as a product, but it also introduces new resource requirements. Success with data mesh depends on domain teams possessing specific data engineering skills and governance capabilities. For organizations that have the resources to support distributed teams, data mesh can improve agility. For others, centralized models like data warehouses or data lakes may remain a more efficient solution.

Foundational principles of data mesh

Data mesh isn't just about a new set of tools or technologies; it's a shift in how companies think about their data. There are four core principles that guide the data mesh approach. These principles are what make the approach so effective at solving the problems of traditional, centralized data architectures.

Domain-oriented ownership

In a traditional data architecture, a single central team, like an IT or data engineering team, is responsible for all data. In a data mesh, data ownership is spread out to the business domains that create the data. For example, a sales team would own the customer data they generate, and a marketing team would own the campaign data they create. This makes teams more responsible and accountable for the data they produce.

Data as a product

With domain-oriented ownership, the teams that create data must also treat it like a product. Just as a company would provide a high-quality product to a customer, a data domain team needs to provide high-quality data to other teams that need it. This means the data is easy to discover, understand, and use. It also has to be trustworthy, secure, and well-documented with built-in access controls so the right people only access the data that is intended for their use case.

Self-serve data infrastructure as a platform

To make treating data as a product possible, a data mesh uses a self-serve platform. This platform is a set of tools and services that allows data domain teams to easily create and manage their data products without needing help from a central data team. It can be a simple, easy-to-use platform that automates many of the technical tasks involved in data management, like data storage, security, and governance.

Federated computational governance

Since data is decentralized and spread across many different teams, there needs to be a way to ensure everyone follows the same rules. This is where federated computational governance comes in. It's a model where a small, central team sets the global rules and standards for all data. However, the enforcement of these rules is handled by the data domain teams themselves. This combines the best of both worlds: centralized policies with decentralized execution.

Data mesh frequently asked questions

A data product in a data mesh should be findable, addressable, trustworthy, self-describing, and secure. It should be easy for data consumers to discover the data, understand what it is, and know that it's high quality. It should also have clear and consistent access rules to ensure security.

Starting a data mesh is an incremental process. It's often best to start with a small pilot project and a few willing domain teams. Begin by identifying a business domain that can benefit from greater data autonomy. Then, create a minimal self-serve platform that allows that team to create a data product. As the project succeeds, you can use the results as a proof of concept to get the broader organization on board with the data mesh architecture.

One of the biggest challenges is the cultural shift. It can be difficult for a centralized data team to give up control. There are also technical challenges, such as ensuring data security and managing a distributed system. However, with careful planning and a clear communication strategy, these challenges can be overcome.

Data mesh is designed to work with existing data systems. It doesn't require you to throw out your current data lakes or data warehouses. Instead, it can be implemented on top of them. A data mesh can act as a new layer that provides a unified, self-serve way for teams to access data from different sources.

A common misconception is that data mesh is a product you can buy. It's not. It's a new way of organizing and managing data. Another myth is that it's only for large enterprises. While it's most common in large companies, the principles can be applied to smaller organizations as well.

Measuring the success of a data mesh can be tricky because the benefits are often not financial at first. Instead, you can measure success by looking at things like the speed of data delivery, the number of teams using the data platform, and the trust that teams have in the data they are consuming. Over time, these improvements can lead to better business outcomes and a higher return on investment (ROI).

Data mesh versus traditional data architectures

The data mesh approach was created to solve some of the common problems with traditional data architectures. These models, such as data warehouses or data lakes owned by individual departments or teams, can create data silos and governance risks, especially as a company grows. Data mesh tackles these issues by distributing ownership and empowering individual teams while still maintaining central controls for governing and monitoring the data across domains.

Feature

Data mesh

Traditional architectures

Architectural model

Decentralized and distributed across business domains.

Centralized and monolithic, managed by a single team.

Data ownership

Data is owned by the domain teams that create and use it.

Data is owned and managed by a central data team.

Data access

Teams access data through standardized data products.

Teams must go through a central team to get data.

Scalability

Can scale easily as new domain teams and data products are added.

Can become a bottleneck as the organization and data volume grow.

Data quality

Domain teams are accountable for their own data quality, which can increase trust and accuracy.

Data quality can be inconsistent as the central team may lack the context of each domain.

Data governance

Governance is federated, with global standards and rules set centrally but enforced by domain teams.

Governance is centralized and handled entirely by one team.

Use case

Can be best for large, complex organizations with diverse data and independent business units.

Can be best for smaller organizations or for specific use cases that require a single source of truth.

Technical expertise/ resources needed

Requires distributed technical skills (engineering, governance) within each domain team.

Centralizes technical expertise in one core IT or data engineering team.

Feature

Data mesh

Traditional architectures

Architectural model

Decentralized and distributed across business domains.

Centralized and monolithic, managed by a single team.

Data ownership

Data is owned by the domain teams that create and use it.

Data is owned and managed by a central data team.

Data access

Teams access data through standardized data products.

Teams must go through a central team to get data.

Scalability

Can scale easily as new domain teams and data products are added.

Can become a bottleneck as the organization and data volume grow.

Data quality

Domain teams are accountable for their own data quality, which can increase trust and accuracy.

Data quality can be inconsistent as the central team may lack the context of each domain.

Data governance

Governance is federated, with global standards and rules set centrally but enforced by domain teams.

Governance is centralized and handled entirely by one team.

Use case

Can be best for large, complex organizations with diverse data and independent business units.

Can be best for smaller organizations or for specific use cases that require a single source of truth.

Technical expertise/ resources needed

Requires distributed technical skills (engineering, governance) within each domain team.

Centralizes technical expertise in one core IT or data engineering team.

Use cases for data mesh

The data mesh approach can be particularly useful for large, complex organizations that have multiple business units and a large amount of data. Here are a few common use cases where a data mesh can provide significant value.

A data mesh can help an organization get more value from its data analytics and business intelligence (BI) initiatives. With data products from different domains, data scientists and analysts can get a more complete view of the business. For example, a retail company can combine customer data from its sales domain with web traffic data from its marketing domain to better understand customer behavior.

A customer 360 initiative aims to create a complete view of a customer by combining data from different sources. This can be challenging in a centralized data architecture because data is often siloed in different departments. A data mesh makes this much easier by providing a standardized way to access and combine data products from different domains, such as sales, marketing, and support.

In financial services, a data mesh can be used for real-time monitoring and fraud detection. A bank, for instance, could have a data product for transactions and another for customer login data. A fraud detection system can then access both data products to identify suspicious activity. The decentralized nature of a data mesh can help with the speed and reliability needed for these kinds of applications.

As data privacy regulations become more complex, it can be difficult to ensure compliance in a centralized data model. A data mesh can help with regulatory compliance by allowing domain teams to manage their own data products and ensure they are compliant with local laws. This is particularly important for multinational companies that need to adhere to different data sovereignty rules in different countries.

Advanced AI applications and agents require high-quality, context-rich data to function effectively. In a data mesh, domain teams curate data specifically for consumption, ensuring that it is clean, labeled, and documented. This allows data scientists to train models on reliable inputs without spending excessive time on data preparation. Furthermore, AI agents can access these modular data products via APIs to retrieve real-time information, enabling them to perform complex tasks across different business domains with greater accuracy.

Benefits of adopting a data mesh

Adopting a data mesh can provide significant benefits for an organization. By moving to a decentralized model, companies can overcome the bottlenecks of traditional architectures and achieve better business outcomes.


Agility and scalability

A data mesh can be more agile. Each data domain can work independently, which allows the organization to scale and evolve more quickly. It can make it easier to add new data products and services without causing disruptions.

Data quality and trust

A data mesh can assign accountability to the domain teams that produce the data. Since the domain teams are also the primary consumers of their own data, they have a strong incentive to ensure its quality. This can lead to more trustworthy data.

Cost efficiency

A data mesh can also help a company become more cost efficient. With a centralized data platform, teams often have to wait for a central data team to help them with their data needs. This can lead to delays and wasted resources.

Establish a unified data fabric and centralized governance

Dataplex Universal Catalog acts as a unified data fabric and provides a central governance layer over your data mesh. It can help you discover, manage, and govern your distributed data across various environments, ensuring you have a single source of truth for metadata and policies. To get started, you'll need to create a Dataplex lake. A Dataplex lake is a top-level container that holds your data and is typically mapped to a business domain.

Here are the steps to create a lake:

  1. In the Google Cloud console, navigate to the Dataplex Universal Catalog Lakes page.
  2. After clicking "Create," name your new lake something descriptive, such as "Sales Data Domain" or "Marketing Data Mesh."
  3. Choose a region for your lake.
  4. Once the lake is created, you can add zones. A zone is a subdomain within your lake that represents a specific team or data contract. For example, within the "Sales Data Domain" lake, you might create a "Raw" zone for unprocessed data and a "Curated" zone for cleaned, production-ready data.
  5. After creating zones, you can attach assets to them. An asset is the actual data stored in a service like Cloud Storage or BigQuery. You simply point the Dataplex zone to the location of your data.

Dataplex then automatically scans these assets to discover and catalog metadata.

Accelerate discovery via a data product marketplace

A key part of the "data as a product" principle is making data easily discoverable. BigQuery data sharing allows you to build a data product marketplace. This lets domain teams securely share data products with other teams without copying or moving the data. It can help data consumers find the data they need and provides them with a clear, well-defined interface to access it.

Build and share data products on a serverless platform

Google Cloud's serverless services empower domain teams to create and manage their own data products with minimal overhead. BigQuery is a powerful, serverless data warehouse that allows teams to analyze large datasets quickly and efficiently. Dataflow is a serverless data processing service that can be used to build and automate data pipelines for data products. These services reduce the need for a central data engineering team to manage infrastructure, making domain teams more autonomous and agile.

Ensure compliance with attribute-based access control

Federated computational governance is the principle of having a central team define global rules, but allowing domain teams to enforce them. Google Cloud's Identity and Access Management (IAM) conditions provide the tools to implement this. IAM conditions allow for attribute-based access control (ABAC), where you can set up fine-grained permissions based on data attributes. For example, you can create a policy that only allows a user to access customer data from their specific region, helping ensure compliance with data sovereignty regulations like GDPR.

Solve your business challenges with Google Cloud

New customers get $300 in free credits to spend on Google Cloud.

Take the next step

Start building on Google Cloud with $300 in free credits and 20+ always free products.

Google Cloud