What is database normalization?

Database normalization is a process used in database design to organize data efficiently. It can help to reduce data redundancy (duplicate data) and improve data integrity (data accuracy and consistency). It's like organizing a messy filing cabinet: instead of having the same information in multiple places, you put each piece of information in one spot and then use a system of cross-references to connect them.

What do we mean by "database"?

A database is simply an organized collection of data, usually stored electronically in a computer system. Think of it as a digital filing cabinet. Instead of paper folders and drawers, you use structured tables (or other data organization methods) that allow you to store, manage, and retrieve information quickly and efficiently.

Modern businesses use databases to track everything from customer orders and inventory levels to user account details and financial transactions, and many choose to run their databases in the cloud.

What is a relational database?

A relational database organizes data into one or more tables of columns and rows. It's called "relational" because it establishes specific, predefined relationships between these tables. The core idea is to break complex information into smaller, manageable pieces, avoiding the need to store the same information multiple times.

Database normalization example

Imagine a simple database for an online store. You'd have a table for Customers (name, address, phone) and another for Orders (date, total). When a customer places an order, you don't have to copy their entire address into the Orders table; you just use their unique Customer ID to connect the order back to the customer's full details.

If the customer moves and changes their address, you only have to update it in one place: the Customers table. If you copied it to 100 order records, you'd have to update all 100, which would likely lead to messy, inconsistent data. This problem of needing to update information in many places is called a data anomaly.

However, you do want to copy the price of a product into the order record at the time of purchase. Why? Because the price of the product might change in the future in your main Products table, but the order record needs to reflect the price the customer actually paid on the date of the transaction. In this case, copying and freezing the data (or creating a snapshot) is the correct design choice.

Normalization is the systematic process of designing your relational tables and the relationships between them to eliminate these inconsistencies and save storage space. The "normal forms" (1NF, 2NF, 3NF, etc.) are a series of prescriptive rules. They are a solution to data redundancy and the data anomalies it creates, providing a clear path to organizing data efficiently and reliably based on your application needs.

Different normal forms (1NF, 2NF, 3NF)

Normalization is a step-by-step guide to structuring your tables, with each step (or "form") building on the last. To be in Third Normal Form (3NF), a table must pass the tests for 1NF and 2NF. Most operational databases are designed to meet at least the 3NF standard because it can provide a balance of data integrity and performance.

The rule for 1NF is about making sure your tables are properly structured from the start, like setting up a clean spreadsheet.

Rule: Every column must have a unique name, and every cell must contain only a single, indivisible value.

What it solves: You can't put a list of items into a single cell. For example, in an "Orders" table, you can't list "Milk, Eggs, Bread" in one cell under a "Products Ordered" column. Instead, each product must get its own row, which ensures the data is searchable and manageable.

The 2NF rule only applies if your table uses a composite key—a primary key made up of two or more columns combined (like an Order ID plus a Product ID). A primary key is the column or set of columns whose values uniquely identify every row in a table. A non-key column is any column that is not part of the primary key.

Rule: A table must already be in 1NF, and all non-key columns must rely on the entire composite key, not just part of it.

What it solves: You should only store data where it completely belongs. If you have a table where the key is (OrderID, ProductID), a column like Product Price shouldn't be in it because the price only depends on the ProductID, not the OrderID. The solution is to move the ProductID and Product Price into a separate Products table, where the ProductID is the single primary key. This prevents the product's price from being unnecessarily repeated for every order that contains that product.

The 3NF rule is the most common target for database design and is about removing indirect relationships between data points.

Rule: A table must be in 2NF, and non-key columns must depend only on the primary key, not on any other non-key column.

What it solves: Avoid having one piece of non-key data determine the value of another non-key piece of data. Consider an "Employees" table that stores an Office ID (a non-key column) and the Office Location (another non-key column). The Office Location is determined by the Office ID, not the employee's ID (the table's primary key). This indirect link is a transitive dependency. To fix it, you create a new Offices table containing only the Office ID and Office Location, and then you link the two tables using the Office ID. This ensures you only have to update the office's location in one place if it ever changes.

Normalization vs. denormalization

Feature

Normalization

Denormalization

Primary goal

Reduce redundancy, improve data integrity.

Improve read performance.

Use case examples

Transactional databases (frequent updates).

Analytical databases and data warehouses (frequent reads); data that must not change after creation (for example, a contract or invoice snapshot).

Result

More tables, less data duplication.

Fewer tables, intentional data duplication.

Feature

Normalization

Denormalization

Primary goal

Reduce redundancy, improve data integrity.

Improve read performance.

Use case examples

Transactional databases (frequent updates).

Analytical databases and data warehouses (frequent reads); data that must not change after creation (for example, a contract or invoice snapshot).

Result

More tables, less data duplication.

Fewer tables, intentional data duplication.

Denormalization is the intentional addition of redundant data to a database, often to improve query performance for reporting or analysis. It's a trade-off: you sacrifice some integrity and increase storage space for faster data retrieval. However, in scenarios like a legal contract, you may want this intentional redundancy to create a snapshot of the data that's independent of future changes. This ensures that the terms, names, and prices recorded at the time of the contract signing remain permanently fixed and available, even if the primary customer or product data is later updated.

Why is database normalization important?

Normalization makes relational databases (like Cloud SQL or Spanner) more efficient, reliable, and easier to manage by using "normal forms" to structure data and avoid common problems. 

Reduce data redundancy

Store each piece of data, such as a customer's address, in only one place to save storage space and increase efficiency.

Eliminate data anomalies

Avoid inconsistencies that can occur with redundant data, such as insertion, deletion, or update anomalies.

Improve data integrity

Ensure data is accurate and consistent across the database by guaranteeing each piece of data is correct and stored in only one location.

If your priority is ultra-high performance, massive scale, or a flexible schema, you might instead choose a non-relational (NoSQL) database, such as Bigtable or Firestore. NoSQL databases are designed with different principles that intentionally include data redundancy to optimize for fast reads and availability.

Solve your business challenges with Google Cloud

New customers get $300 in free credits to spend on Google Cloud.
What problem are you trying to solve?
What you'll get:
Step-by-step guide
Reference architecture
Available pre-built solutions
This service was built with Vertex AI. You must be 18 or older to use it. Do not enter sensitive, confidential, or personal info.

Take the next step

Start building on Google Cloud with $300 in free credits and 20+ always free products.

Google Cloud