🚀 SQL: Optimization & Design
1. Overview and Problem Statement 📋
SQL optimization and design focus on improving query performance, reducing resource consumption, and establishing efficient database structures. This documentation covers both design-time and runtime optimization strategies.
Business Value
- Reduced query response times
- Lower server resource usage
- Improved application scalability
- Reduced infrastructure costs
- Better user experience
- Higher system throughput
2. Query Optimization Fundamentals 🎯
Key Components
- Query Plan Analysis
- Index Optimization
- Statistics Management
- Join Optimization
- Data Access Patterns
Query Execution Pipeline
FROM -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY -> LIMIT
3. Technical Implementation 💻
Index Design
B-Tree Index Creation
Purpose: Optimize data retrieval for specific columns
Expected Outcome: Faster query execution for indexed columns
-- Single column index
CREATE INDEX idx_customers_email ON customers(email);
-- Composite index
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);
-- Partial index
CREATE INDEX idx_orders_status ON orders(status)
WHERE status IN ('PENDING', 'PROCESSING');
-- Include columns
CREATE INDEX idx_products_category ON products(category_id)
INCLUDE (name, price);
Query Optimization Examples
Join Optimization
Purpose: Improve join performance
Before (inefficient):
SELECT c.name, o.order_date, p.name as product_name
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
LEFT JOIN order_items oi ON o.order_id = oi.order_id
LEFT JOIN products p ON oi.product_id = p.product_id
WHERE o.order_date >= '2024-01-01';
After (optimized):
SELECT c.name, o.order_date, p.name as product_name
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
INNER JOIN order_items oi ON o.order_id = oi.order_id
INNER JOIN products p ON oi.product_id = p.product_id
WHERE o.order_date >= '2024-01-01';
Subquery Optimization
Purpose: Reduce nested query overhead
Before (inefficient):
SELECT
product_id,
name,
(SELECT AVG(price) FROM order_items WHERE product_id = p.product_id) as avg_price
FROM products p;
After (optimized):
SELECT
p.product_id,
p.name,
AVG(oi.price) as avg_price
FROM products p
LEFT JOIN order_items oi ON p.product_id = oi.product_id
GROUP BY p.product_id, p.name;
Statistics Management
-- Update statistics
ANALYZE customers;
-- Update specific columns
ANALYZE customers(email, status);
-- Set statistics target
ALTER TABLE orders ALTER COLUMN status SET STATISTICS 1000;
4. Performance Metrics & Optimization 📊
Key Performance Indicators
- Query execution time
- I/O operations
- CPU usage
- Memory consumption
- Cache hit ratio
Monitoring Queries
-- PostgreSQL query analysis
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders
WHERE order_date >= '2024-01-01';
-- Index usage statistics
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes;
Common Table Design Patterns
Partitioning Example
-- Range partitioning
CREATE TABLE orders (
order_id INT,
order_date DATE,
total_amount DECIMAL(10,2)
) PARTITION BY RANGE (order_date);
CREATE TABLE orders_2024_q1 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
CREATE TABLE orders_2024_q2 PARTITION OF orders
FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');
5. Anti-Patterns 🚫
Query Anti-Patterns
- SELECT * Wrong:
SELECT * FROM customers
JOIN orders ON customers.customer_id = orders.customer_id;
Correct:
SELECT
c.customer_id,
c.name,
o.order_date,
o.total_amount
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id;
- Implicit Type Conversion Wrong:
SELECT * FROM orders WHERE order_id = '1000';
Correct:
SELECT * FROM orders WHERE order_id = 1000;
Index Anti-Patterns
- Over-Indexing Wrong:
-- Creating redundant indexes
CREATE INDEX idx_orders_customer ON orders(customer_id);
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);
CREATE INDEX idx_orders_customer_status ON orders(customer_id, status);
Correct:
-- Create one composite index that can serve multiple queries
CREATE INDEX idx_orders_customer_date_status
ON orders(customer_id, order_date, status);
6. Best Practices & Guidelines 📚
Database Design
- Normalize to appropriate level
- Use appropriate data types
- Implement constraints
- Design for scalability
- Consider data access patterns
Query Optimization
- Use covering indexes
- Avoid function calls on indexed columns
- Use appropriate join types
- Minimize subqueries
- Use parameterized queries
Example of Optimized Table Design
CREATE TABLE orders (
order_id BIGSERIAL PRIMARY KEY,
customer_id INTEGER NOT NULL,
order_date DATE NOT NULL DEFAULT CURRENT_DATE,
status VARCHAR(20) NOT NULL,
total_amount DECIMAL(10,2) NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT fk_customer
FOREIGN KEY (customer_id)
REFERENCES customers(customer_id),
CONSTRAINT chk_status
CHECK (status IN ('PENDING', 'PROCESSING', 'COMPLETED', 'CANCELLED')),
CONSTRAINT chk_amount
CHECK (total_amount >= 0)
);
-- Optimize for common queries
CREATE INDEX idx_orders_customer_date
ON orders(customer_id, order_date DESC);
CREATE INDEX idx_orders_status_date
ON orders(status, order_date DESC);
-- Partial index for active orders
CREATE INDEX idx_active_orders
ON orders(order_date DESC)
WHERE status IN ('PENDING', 'PROCESSING');
7. Troubleshooting Guide 🔧
Common Performance Issues
- Missing Indexes
-- Find missing indexes (PostgreSQL)
SELECT
schemaname || '.' || relname as table,
seq_scan,
seq_tup_read,
idx_scan,
idx_tup_fetch
FROM pg_stat_user_tables
WHERE seq_scan > idx_scan
AND seq_scan > 1000
ORDER BY seq_scan DESC;
- Index Bloat
-- Check index bloat
SELECT
current_database(), schemaname, tablename,
ROUND((CASE WHEN otta=0 THEN 0.0
ELSE sml.relpages::FLOAT/otta END)::NUMERIC,1) AS tbloat,
CASE WHEN relpages < otta THEN 0
ELSE bs*(sml.relpages-otta)::BIGINT END AS wastedbytes
FROM (
SELECT schemaname, tablename, bs,
CEIL((reltuples*((datahdr+ma-
(CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::FLOAT)) AS otta,
relpages
FROM ...
) AS sml;
8. Real-world Optimization Examples 🌐
E-commerce Order Processing
-- Optimized order summary query
WITH daily_stats AS (
SELECT
DATE_TRUNC('day', order_date) as order_day,
COUNT(*) as order_count,
SUM(total_amount) as daily_revenue,
COUNT(DISTINCT customer_id) as unique_customers
FROM orders
WHERE status = 'COMPLETED'
AND order_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY DATE_TRUNC('day', order_date)
),
customer_segments AS (
SELECT
customer_id,
SUM(total_amount) as total_spent,
COUNT(*) as order_count,
MAX(order_date) as last_order_date
FROM orders
WHERE status = 'COMPLETED'
GROUP BY customer_id
)
SELECT
ds.order_day,
ds.order_count,
ds.daily_revenue,
ds.unique_customers,
ROUND(ds.daily_revenue / NULLIF(ds.order_count, 0), 2) as avg_order_value
FROM daily_stats ds
ORDER BY ds.order_day DESC;
9. References and Additional Resources 📚
- Performance Tuning Guides
- Use The Index, Luke!
- PostgreSQL Performance Optimization
- MySQL Query Optimization
- Books
- SQL Performance Explained
- High Performance MySQL
- PostgreSQL 9.0 High Performance
- Tools
- pgHero
- MySQL Workbench
- SolarWinds Database Performance Analyzer
- Community Resources
- Database Administrators Stack Exchange
- PostgreSQL Performance Mailing List