What is medallion architecture?

Updated Last: 04/30/26

Medallion architecture is a data design pattern used to organize data in a data lake or lakehouse. Its main job is to improve the quality and structure of data as it moves through different stages. Instead of having one giant pile of data, this framework breaks it into layers. Each layer adds more value, making the data more reliable and easier to use for business decisions.

This is a logical framework rather than one specific piece of software. It’s designed to help teams manage data pipelines while ensuring ACID (atomicity, consistency, isolation, and durability) transactions. By moving data through these validation tiers, companies can trust that their information is accurate and safe, even when dealing with massive amounts of raw information.

Medallion architecture versus traditional data warehousing

In the past, traditional data warehouses used a "schema-on-write" approach. This meant you had to perfectly define the structure of your data before you could save it. While this kept things organized, it was often slow and rigid. If your data format changed slightly, the whole system might stop working until an engineer manually fixed it.

The medallion architecture follows the modern lakehouse paradigm, which embraces "schema-on-read" in the raw data layer. This allows data to land immediately in its raw form. You only worry about the structure as the business value of the data becomes clear. It’s a more flexible way to work that lets companies collect data now and decide exactly how to use it later.

The three layers of the medallion architecture

In a medallion architecture, data flows like a river through three main stages: bronze, silver, and gold. As it moves from one stage to the next, the data becomes cleaner and more organized. This setup allows engineers to fix mistakes early and gives different users the right version of data for their specific tasks.

The bronze layer is the landing zone for all incoming data. This is where you store raw, unprocessed information from sources like web APIs, IoT sensors, or transactional databases. You don’t change the data here; you keep it in its original format, such as JSON, CSV, or Parquet.

Engineers usually add a timestamp to these files to show when they arrived. The bronze layer acts as a permanent, historical archive. If you ever make a mistake in your calculations later on, you can always go back to this raw data and start over. It ensures you never lose the "ground truth" of your business operations.

Think of the silver layer as your "source of truth." In this stage, data from the bronze layer gets a thorough cleaning. Engineers use tools to filter out junk, remove duplicate records, and fix missing values. They also standardize formats, such as making sure every date follows the same pattern.

The silver layer joins different data sets together to create a clear view of core business items, like "customers" or "products." This layer is perfect for self-service analytics. It's clean enough for most people to use, but it's still detailed enough to answer a wide variety of questions without being limited to just one specific report.

The gold layer contains the most refined data. This information is tailored for specific business needs, like a monthly sales report or a machine learning model that predicts customer churn. Instead of raw lists, the gold layer often uses "star schemas" or tables optimized for fast reading.

Because this data is already aggregated and calculated, it loads very quickly in dashboards. A data scientist might use a gold table to see "Daily Active Users per Region" without having to join millions of raw rows themselves. It’s the final product designed to provide immediate value to the company.

three layers of the medallion architecture

Key benefits of the medallion architecture

Using a layered approach helps teams move faster and reduces the risk of making expensive mistakes. It provides a clear roadmap for how data should be handled, which makes the whole system easier to manage as the company grows.

Incremental data quality

Breaking data pipelines into distinct steps makes debugging much simpler. If a dashboard shows a wrong number, an engineer can check the silver layer first. If the silver data is correct, they know the error is in the gold transformation logic. This isolation helps teams find and fix bugs in minutes rather than hours, keeping the data flow steady and reliable.

Time travel and reproducibility

By keeping a full history in the bronze layer, organizations can re-run their entire data process whenever they want. This is helpful if business rules change. For example, if the finance team changes how they calculate "profit," you can re-process all your old data to match the new rule. This "time travel" also helps with audits, as you can prove exactly what the data looked like at any point in the past.

Democratized data access

Different people in a company need different types of data. Data scientists often want the raw data from the bronze layer to train complex AI models. Meanwhile, business analysts just want the clean, finished numbers from the gold layer for their charts. The medallion architecture lets everyone access the specific layer they need without getting in each other's way.

How to implement a medallion architecture

Building this architecture requires a clear plan for moving data between storage and compute tools. You can follow these four steps to set up a reliable pipeline.

Step 1: Ingestion to bronze

First, set up automated pipelines to move data from your sources into a scalable storage area. You can use tools like Dataflow for real-time data or batch loads for daily updates. The goal here is to dump the data into "buckets" without changing it. This ensures no information is dropped or accidentally altered during the move.

First, set up automated pipelines to move data from your sources into a scalable storage area. You can use tools like Dataflow for real-time data or batch loads for daily updates. The goal here is to dump the data into "buckets" without changing it. This ensures no information is dropped or accidentally altered during the move.

Step 2: Transformation to silver

Next, use a processing engine like Spark or SQL to clean the bronze data. During this step, you apply data quality rules. You might convert all timestamps to a standard time zone, like UTC, or remove test accounts from your user lists. Many teams write these results to open table formats like Delta Lake or Apache Iceberg, which help keep the data organized and searchable.

Next, use a processing engine like Spark or SQL to clean the bronze data. During this step, you apply data quality rules. You might convert all timestamps to a standard time zone, like UTC, or remove test accounts from your user lists. Many teams write these results to open table formats like Delta Lake or Apache Iceberg, which help keep the data organized and searchable.

Step 3: Aggregation to gold

Once the data is clean in the silver layer, you can create specialized SQL queries to build your gold tables. These queries join different tables together to create specific metrics, such as "Revenue by Category." These are often stored in high-performance views that can be plugged directly into a visualization tool like Looker.

Once the data is clean in the silver layer, you can create specialized SQL queries to build your gold tables. These queries join different tables together to create specific metrics, such as "Revenue by Category." These are often stored in high-performance views that can be plugged directly into a visualization tool like Looker.

Step 4: Orchestration and governance

Finally, you need a way to schedule these steps so they happen automatically. Tools like Google Cloud Managed Service for Apache Airflow can manage the timing of your jobs. You should also apply strict security policies. For example, you might let everyone read the gold layer, but only a few authorized service accounts should be allowed to write or change data in the silver and gold tiers.

Finally, you need a way to schedule these steps so they happen automatically. Tools like Google Cloud Managed Service for Apache Airflow can manage the timing of your jobs. You should also apply strict security policies. For example, you might let everyone read the gold layer, but only a few authorized service accounts should be allowed to write or change data in the silver and gold tiers.

Solve your business challenges with Google Cloud

New customers get $300 in free credits to spend on Google Cloud.
Talk to a Google Cloud sales specialist to discuss your unique challenge in more detail.

Ready to transform your data into a reliable business asset?

Start building your layered architecture with BigQuery today.

Google Cloud