What is data as a product (DaaP)?

Data as a product (DaaP) is a methodology that applies product thinking to data management. It shifts the focus from simply collecting data to serving it. In this model, data is viewed as a standalone asset with specific "consumers" (developers, data scientists, analysts) and a "product owner" responsible for its utility.

This concept is a foundational element of the data mesh architecture. It suggests that the teams most familiar with the data (domains like sales, inventory, or logistics) should be responsible for curating and serving it to the rest of the organization, rather than relying on a central team to manage everything.

Difference between DaaP and data products

While often used interchangeably, these terms represent different aspects of the same framework.

  • Data as a product (DaaP) is the strategy. It is the mindset of treating data with the same rigor as software. It involves applying product management principles—such as versioning, documentation, and user support—to data assets.
  • Data products are the outputs. A data product is a trusted, accessible unit of data designed to solve a specific problem.

For example, the strategy (DaaP) leads a team to build a clean, documented dataset containing "quarterly sales figures." That specific dataset, equipped with access controls and usage guides, is the data product.

Key components of data as a product

For a dataset to function as a product, it generally needs to meet specific usability standards (often derived from data mesh principles). A file simply existing in a storage bucket does not qualify. Key characteristics include:

  • Discoverable: Consumers must be able to find the data easily, typically through a centralized catalog or search engine.
  • Addressable: The data should have a unique, permanent address (URI) so developers can programmatically access it.
  • Trustworthy: The data must be reliable. This is often enforced through service level agreements (SLAs) regarding data freshness and accuracy.
  • Self-describing: The data must carry metadata that explains the schema, origin, and intended usage, reducing the need for consumers to ask the owner questions.
  • Secure: Access policies should be attached directly to the data product, ensuring that security is enforced automatically regardless of who accesses it.
  • Interoperable: The data should adhere to global standards, allowing it to be joined or analyzed alongside data products from other domains.

Use cases of DaaP in enterprises

Adopting a product mindset for data is often done to reduce friction between data producers and consumers.

Marketing and support teams often require data from disparate sources (web logs, CRM, transaction history). Instead of each team building their own extraction pipelines, a central "customer domain" team can release a single, validated "customer profile" data product. Developers building apps can subscribe to this standard product, ensuring consistency across the company.

In logistics, data regarding inventory levels often fluctuates rapidly. A "real-time inventory" data product allows downstream teams (like website frontend or procurement) to query accurate stock levels without needing to understand the complexity of the underlying warehouse management systems.

Data scientists spend a significant amount of time cleaning data before training models. A "feature store" data product provides pre-computed, validated features (like user churn risk or average basket size). This allows data scientists to skip the cleaning phase and focus on model architecture.

Benefits of adopting DaaP

Shifting to a product-based mindset requires effort, but it can offer several structural advantages for technical teams and the broader organization.

Higher quality data

Because data products have assigned owners and SLAs, there is accountability for errors. This typically leads to cleaner, more reliable datasets compared to unowned "data dumps."

Faster development

Developers spend less time searching for data or trying to decipher cryptic column names. With self-service access and clear documentation, they can integrate data into applications more quickly.

Reduced bottlenecks

By decentralizing data ownership to domain teams, organizations remove the bottleneck of a single central data team. This allows multiple teams to build and publish data products in parallel.

Clearer governance

Security and access policies are defined at the product level. This makes it easier to audit who has access to what data, ensuring compliance without slowing down access for legitimate users.

Implementing DaaP with Google Cloud

To help better understand DaaP, let's illustrate how the theoretical concepts of DaaP can be implemented practically using Dataplex and BigQuery as the technological substrate.

The Scenario: A developer needs to build a recommendation system and requires a reliable dataset regarding user viewing history.

Step 1: Discovery (catalog)

In a DaaP environment, the developer avoids manual inquiries. They use Dataplex, which functions as a central platform for data to AI governance. By searching the catalog for "user pageviews," they locate a registered asset named ecommerce-user-activity. This fulfills the discoverable requirement.

Step 2: Assessment (metadata)

Before writing code, the developer reviews the entry. They see:

  • Schema: Defined columns (such as, user_id, item_id, timestamp).
  • Lineage: Information showing where the data originated.
  • Quality metrics: A score generated by automated scans (for example, Dataplex AutoDQ) confirming that 99.9% of rows meet the defined quality rules. This fulfills the trustworthy and self-describing requirements.

Step 3: Access (security)

The developer requests access through the catalog interface. The data owner (a specific domain team) approves the request, granting the "data viewer" role for that specific resource. This fulfills the secure requirement.

Step 4: Consumption (integration)

Once access is granted, the developer queries the data product directly in BigQuery.

  • SQL
Loading...

Because the data is treated as a product, the developer relies on the schema remaining stable (or being notified of version changes), allowing the application to run reliably.

Quick start: building a first data product

For developers looking to transition from ad-hoc data creation to building managed data products, Dataplex provides a dedicated service to package, govern, and share assets. The following workflow provides a baseline for the managed lifecycle.

  1. Identify and curate assets: Select the technical resources—such as BigQuery tables, authorized views, or AI models—that solve a specific consumer need. These assets should be registered in the Dataplex Universal Catalog.
  2. Create the data product: Use the Dataplex console to create a new Data Product resource. This acts as a logical container that packages your curated assets into a single unit for discovery and sharing.
  3. Attach business context (Aspects): Enrich your product with metadata by attaching Aspects. Use the Data Product Template to define the product name, business purpose, ownership, and intended usage, ensuring the product is self-describing for consumers .
  4. Define governance and access: Configure the approval workflows and required roles for the product. This allows you to manage access requests centrally, ensuring that only authorized users can subscribe to and query the product.
  5. Publish and share: Once published, the data product is discoverable in the catalog for AI. Consumers can search for the product, review its documentation and quality metrics, and request access to start querying the data directly in BigQuery.

Solve your business challenges with Google Cloud

New customers get $300 in free credits to spend on Google Cloud.

Additional resources

  • What is data mesh?: This article provides a high-level overview of the data mesh architecture that underpins the data as a product methodology
  • Dataplex documentation: These technical guides offer detailed instructions on how to configure your universal catalog within the Google Cloud console
  • Build a data product with Dataplex: These technical guides provide a step-by-step guide on how to build a data product
  • BigQuery sharing documentation: A data exchange platform that enables the secure sharing of data assets across organizational boundaries

Take the next step

Start building on Google Cloud with $300 in free credits and 20+ always free products.

Google Cloud