4) Query Optimization - From the Absolute Basics to the Advanced Level

August 12, 2025

4) Query Optimization - From the Absolute Basics to the Advanced Level

Let’s go step-by-step into Query Optimization so you’ll understand how to write fast, efficient SQL and tune it for real-world scenarios.

We’ll start with basic optimization for beginners, move to intermediate strategies, and finish with advanced performance tuning.

1. What is Query Optimization?

Query optimization is the process of improving the speed and efficiency of a SQL query so it uses the least possible time, memory, and resources while still returning correct results.

In small datasets → you may not notice slow queries.
In large datasets → bad queries can take minutes or hours, block other users, or crash servers.

Goal: Reduce execution time, minimize reads/writes, and lower CPU/memory load.

2. Beginner Level – Writing Efficient Queries

2.1 Select Only What You Need

❌ Bad:

SELECT * FROM Orders;

✅ Good:

SELECT order_id, order_date, customer_id FROM Orders;

Why: SELECT * pulls all columns (even unused), causing extra I/O.

2.2 Use WHERE Clauses to Filter Early

❌ Bad:

SELECT * FROM Orders;
-- Application filters data later

✅ Good:

SELECT * FROM Orders WHERE order_date >= '2024-01-01';

Why: Filtering at the DB reduces rows transferred.

2.3 Limit Results

❌ Bad:

SELECT * FROM Orders ORDER BY order_date DESC;

✅ Good:

SELECT * FROM Orders ORDER BY order_date DESC LIMIT 10;

Why: Prevents reading & sorting more rows than needed.

3. Intermediate Level – Using Indexes & Joins Effectively

3.1 Ensure Filtering Columns are Indexed

CREATE INDEX idx_order_date ON Orders(order_date);

Helps:

SELECT * FROM Orders WHERE order_date >= '2024-01-01';

3.2 Avoid Functions on Indexed Columns

❌ Bad:

SELECT * FROM Orders WHERE YEAR(order_date) = 2024;

✅ Good:

SELECT * FROM Orders WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01';

Why: Wrapping columns in functions makes indexes useless.

3.3 Use Proper JOINs

❌ Bad:

SELECT * 
FROM Orders, Customers 
WHERE Orders.customer_id = Customers.customer_id;

✅ Good:

SELECT o.order_id, c.customer_name
FROM Orders o
JOIN Customers c ON o.customer_id = c.customer_id;

Why: Explicit JOIN syntax is clearer & lets the optimizer plan better.

3.4 Avoid Unnecessary DISTINCT

If you have correct joins and keys, DISTINCT should not be used just to “remove duplicates” - fix the join instead.

4. Advanced Level – Execution Plans, Subqueries, and Caching

4.1 Use EXPLAIN to See the Query Plan

Example (MySQL):

EXPLAIN SELECT * FROM Orders WHERE customer_id = 42;

Shows:

Whether indexes are used
Estimated rows scanned
Join method

4.2 Replace Correlated Subqueries with JOINs

❌ Bad:

SELECT c.customer_name,
       (SELECT COUNT(*) FROM Orders o WHERE o.customer_id = c.customer_id) AS total_orders
FROM Customers c;

✅ Good:

SELECT c.customer_name, COUNT(o.order_id) AS total_orders
FROM Customers c
LEFT JOIN Orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_name;

Why: Correlated subqueries run once per row → very slow.

4.3 Use Covering Indexes for Read-Heavy Queries

CREATE INDEX idx_customer_date ON Orders(customer_id, order_date);

Now:

SELECT customer_id, order_date FROM Orders WHERE customer_id = 42;

Reads entirely from the index, avoiding table access.

4.4 Avoid ORDER BY on Large Unindexed Columns

If sorting is necessary, index the column used in ORDER BY.

4.5 Partition Large Tables

If your table has billions of rows, split into partitions:

PARTITION BY RANGE (YEAR(order_date))

Why: Only relevant partitions are scanned.

4.6 Use Query Caching

Some databases (like MySQL query cache, Redis caching layer) store results of frequent queries to serve them instantly.

5. Query Optimization Checklist

✅ Basics:

Avoid SELECT *
Filter early with WHERE
Use LIMIT
Write clean JOINs

✅ Intermediate:

Index filtering/sorting columns
Avoid functions on indexed columns
Eliminate unnecessary DISTINCT
Optimize join order

✅ Advanced:

Use EXPLAIN to analyze
Replace correlated subqueries
Use covering indexes
Partition large tables
Cache results where possible

6. Example: Before & After Optimization

Slow Query:

SELECT * FROM Orders
WHERE YEAR(order_date) = 2024
AND status = 'Shipped'
ORDER BY order_date DESC;

Problems:

Function on indexed column (YEAR(order_date))
No limit
Possibly missing index on (status, order_date)

Optimized:

CREATE INDEX idx_status_date ON Orders(status, order_date);

SELECT order_id, order_date, customer_id
FROM Orders
WHERE order_date >= '2024-01-01'
  AND order_date < '2025-01-01'
  AND status = 'Shipped'
ORDER BY order_date DESC
LIMIT 100;

Gains:

Index used for filter & sort
No unnecessary columns fetched
Fewer rows scanned
Capped results for faster delivery

Next Part: Schema Design

Summary:

Now, here’s a Beginner → Advanced Query Optimization Practice Quiz with answers and explanations so you can test your knowledge and also see the reasoning behind each solution.

📚 SQL Query Optimization Practice Quiz

🟢 Easy Level - Fundamentals

1. Which query is faster and why?
a) SELECT * FROM Customers;

b) SELECT customer_id, customer_name FROM Customers;

Answer: b) — Selecting only needed columns reduces I/O and network transfer.
Explanation: The DB reads fewer bytes per row, so queries are faster, especially with wide tables.

2. True/False: Using LIMIT always speeds up queries.

Answer: False.
Explanation: LIMIT reduces the result set returned, but the DB might still scan/sort the whole dataset before applying it, unless proper indexes exist.

🟡 Medium Level - Indexing & Joins

3. You have this index:

CREATE INDEX idx_status_date ON Orders(status, order_date);

Which of these will fully use the index?
a) WHERE status = 'Shipped'
b) WHERE order_date >= '2024-01-01'
c) WHERE status = 'Shipped' AND order_date >= '2024-01-01'

Answer: a and c.
Explanation: Indexes are used in left-to-right order of columns. Filtering only by order_date won’t use status as the first index key.

4. Rewrite this query for better performance:

SELECT * FROM Orders WHERE YEAR(order_date) = 2024;

Answer:

SELECT * 
FROM Orders
WHERE order_date >= '2024-01-01'
  AND order_date < '2025-01-01';

Explanation: Avoid wrapping indexed columns in functions, it forces a full scan.

5. The query:

SELECT * FROM Orders o
JOIN Customers c ON o.customer_id = c.customer_id
WHERE c.country = 'USA';

is slow. What index should you add?

Answer:

CREATE INDEX idx_country ON Customers(country);

Explanation: This speeds up filtering in the Customers table before the join.

🔴 Hard Level — Execution Plans & Advanced Tuning

6. You run:

SELECT * FROM Sales ORDER BY sale_date DESC LIMIT 10;

It’s slow. Suggest 2 optimizations.

Answer:

Add index:

CREATE INDEX idx_sale_date ON Sales(sale_date DESC);

Select only needed columns instead of *.

Explanation: Index on sale_date helps avoid full sort. Limiting columns reduces I/O.

7. Identify the problem in this query:

SELECT c.customer_name, COUNT(o.order_id)
FROM Customers c
WHERE c.is_active = 1
AND c.customer_id IN (
    SELECT customer_id FROM Orders WHERE total_amount > 500
)
GROUP BY c.customer_name;

Answer: The IN (SELECT ...) subquery is correlated and may run repeatedly.
Optimized Version:

SELECT c.customer_name, COUNT(o.order_id)
FROM Customers c
JOIN Orders o ON c.customer_id = o.customer_id
WHERE c.is_active = 1 AND o.total_amount > 500
GROUP BY c.customer_name;

Explanation: Joins are often faster than subqueries when filtering large datasets.

8. You have:

EXPLAIN SELECT * FROM Orders WHERE customer_id = 42;

Output shows Using full table scan despite having an index on customer_id. Why?

Answer:

Table is small → optimizer chooses scan over index lookup.
Statistics may be outdated → run ANALYZE TABLE Orders; to refresh.

9. Rewrite for better indexing:

SELECT * FROM Products WHERE price * quantity > 1000;

Answer: Move calculation outside the WHERE:

ALTER TABLE Products ADD total_value DECIMAL(10,2) GENERATED ALWAYS AS (price * quantity) STORED;
CREATE INDEX idx_total_value ON Products(total_value);

SELECT * FROM Products WHERE total_value > 1000;

Explanation: Indexed computed columns allow the DB to avoid calculating values for every row.

10. A query on a Logs table (10 billion rows) for the last 24 hours is slow. Suggest two strategies without deleting old data.

Answer:

Partitioning by date → only the recent partition is scanned.
Partial index on recent dates:

CREATE INDEX idx_recent_logs ON Logs(log_date) WHERE log_date >= CURRENT_DATE - INTERVAL '1 day';

Explanation: Reduces the index size and scan range.

✅ Key Takeaways from the Quiz:

Always filter early and only select what you need.
Understand index usage order.
Avoid functions on columns in WHERE.
Use EXPLAIN to find bottlenecks.
Partition or partially index big tables.

Next Part: Schema Design

Search This Blog

Curious Mind