2) Normalization and Denormalization - From the Absolute Basics to the Advanced Level

If you haven't read the PART I - Data Types in SQL, read here!

Normalization and Denormalization

Now, let’s break down Normalization and Denormalization from the absolute basics to the advanced level so you understand why, how, and when to use them (and when not to).

We’ll do this in three layers:

  1. Foundations: What they are & why they matter
  2. Step-by-step Normal Forms with examples
  3. Denormalization: When & how to break the rules

1. What is Normalization?

Definition:

Normalization is the process of organizing data in a database to reduce redundancy and improve data integrity.
It’s like decluttering your house, put things where they belong so you don’t keep duplicates all over.

Goals of normalization:

  • Avoid duplicate data (data redundancy)
  • Ensure data dependencies make sense
  • Make data easier to maintain and update

2. What is Denormalization?

Definition:

Denormalization is intentionally adding redundancy to speed up read performance, even if it sacrifices storage efficiency.

In short:
  • Normalization = Clean and efficient data structure
  • Denormalization = Fast access, but can create duplicates

3. Example Setup

Let’s say we have this unorganized table:
unorganized table

Problems here:
  • Alice’s name & address repeated
  • Product price stored in every order (if price changes, multiple rows need updating)

4. The Normal Forms (Beginner → Advanced)

1NF - First Normal Form

Rules:

  1. No repeating groups (columns should be atomic values)
  2. Each column should store a single value
Bad (not 1NF)❌:

Bad (not 1NF)

Good (1NF)✅:

Good (1NF)

2NF — Second Normal Form

Rules:
  • Must be in 1NF
  • No partial dependency (non-key columns must depend on the whole primary key)Fix in our example
Fix in our example:
Split into:
Customers table
Customers table

Orders table

Orders table

Products table
Products table

3NF - Third Normal Form

Rules:

  • Must be in 2NF

  • No transitive dependency (non-key columns must depend only on primary key, not on another non-key column)

Bad❌:
If CustomerAddress is stored in the Orders table, it depends on CustomerID (which depends on OrderID indirectly).
Good✅:
Keep CustomerAddress only in Customers table.

BCNF - Boyce-Codd Normal Form

  • Stricter than 3NF

  • Every determinant (column that defines another) must be a candidate key

Higher Forms (Advanced):

  • 4NF: No multi-valued dependencies

  • 5NF: No join dependencies
  • 6NF: Mostly theoretical, used in temporal databases

5. Denormalization in Practice

We denormalize when:
  • We need faster read performance
  • We’re okay with some duplication to avoid complex joins
Example: Instead of always joining Orders and Customers to get CustomerName,
we might store CustomerName in Orders for quick reporting.

Trade-offs:
✅ Faster queries
❌ More storage
❌ Risk of inconsistent data if updates are not synced

6. Summary Table

Summary Table of Normalization

7. Normalization & Denormalization Practice Set

Now, let’s build a Normalization & Denormalization practice set that checks:

  • Concept understanding (theory questions)

  • Coding ability (SQL table design & queries)

  • Practical decision-making (when to normalize/denormalize)

Easy (Concept Basics)

1. Which normal form eliminates repeating groups in a table?

a) 1NF
b) 2NF
c) 3NF
d) BCNF

Answer: a) 1NF
Explanation:
1NF ensures each cell holds a single atomic value, removing repeating groups and arrays in a single column.

2. True/False:
In 3NF, every non-key column should depend only on the primary key.

Answer: True
Explanation:
3NF removes transitive dependencies so no non-key column depends on another non-key column.

3. Given this table:


Which normal form is violated?

Answer: 2NF
Explanation:
CustomerName is repeated for each order, should be moved to a Customers table. That’s a partial dependency on part of the key.

Medium (SQL Design & Fixing)

4. Rewrite this table in 2NF:

Answer:

Explanation:
We separated student details and course details into different tables to avoid repetition.

5. Why is the following table not in 3NF?


Answer:
Because DepartmentName depends on DepartmentID (a non-key), not directly on EmployeeID. This is a transitive dependency.

6. Create a table in BCNF for storing library book loans.

Answer: 

Explanation:
In BCNF, every determinant is a candidate key, no non-trivial functional dependencies on non-candidate keys.

Hard (Practical Decision Making)

7. Your e-commerce database needs to display order history very quickly on the dashboard. The normalized version requires 4 joins. What’s a practical denormalization strategy?

Answer:
Store customer name, product name, and price directly in the Orders table (even if duplicated) for quick retrieval.
Explanation:
This reduces join complexity and improves query speed, at the cost of redundancy.

8. You find that updating a customer’s address takes too long because it’s stored in 5 different tables. Which principle is being violated and which normalization form would fix it?

Answer:
Violation: Data redundancy → Update anomaly.
Fix: 3NF: Store the address in a single Customers table.

9. Design a denormalized SalesSummary table from these normalized tables for analytics:

Normalized:

Orders(OrderID, CustomerID, OrderDate)

Customers(CustomerID, CustomerName)

OrderDetails(OrderID, ProductID, Quantity)

Products(ProductID, ProductName, Price)

Answer:

Explanation:
We flattened multiple joins into one table for reporting, faster reads, but duplicated CustomerName and ProductName.

10. Advanced challenge: Given the table below, identify the highest normal form it satisfies:


Answer:
2NF only.
Explanation:

  • 1NF: ✅ Atomic values

  • 2NF: ✅ No partial dependencies (EmpName → EmpID, ProjectName → ProjectID)

  • 3NF: ❌ Manager depends on ProjectName (transitive dependency)


Next Part: Indexing in SQL

Comments

Popular Posts