Skip to main content

🚀 SQL: Optimization & Design

1. Overview and Problem Statement 📋

SQL optimization and design focus on improving query performance, reducing resource consumption, and establishing efficient database structures. This documentation covers both design-time and runtime optimization strategies.

Business Value

  • Reduced query response times
  • Lower server resource usage
  • Improved application scalability
  • Reduced infrastructure costs
  • Better user experience
  • Higher system throughput

2. Query Optimization Fundamentals 🎯

Key Components

  1. Query Plan Analysis
  2. Index Optimization
  3. Statistics Management
  4. Join Optimization
  5. Data Access Patterns

Query Execution Pipeline

FROM -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY -> LIMIT

3. Technical Implementation 💻

Index Design

B-Tree Index Creation

Purpose: Optimize data retrieval for specific columns

Expected Outcome: Faster query execution for indexed columns

-- Single column index
CREATE INDEX idx_customers_email ON customers(email);

-- Composite index
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);

-- Partial index
CREATE INDEX idx_orders_status ON orders(status)
WHERE status IN ('PENDING', 'PROCESSING');

-- Include columns
CREATE INDEX idx_products_category ON products(category_id)
INCLUDE (name, price);

Query Optimization Examples

Join Optimization

Purpose: Improve join performance

Before (inefficient):

SELECT c.name, o.order_date, p.name as product_name
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
LEFT JOIN order_items oi ON o.order_id = oi.order_id
LEFT JOIN products p ON oi.product_id = p.product_id
WHERE o.order_date >= '2024-01-01';

After (optimized):

SELECT c.name, o.order_date, p.name as product_name
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
INNER JOIN order_items oi ON o.order_id = oi.order_id
INNER JOIN products p ON oi.product_id = p.product_id
WHERE o.order_date >= '2024-01-01';

Subquery Optimization

Purpose: Reduce nested query overhead

Before (inefficient):

SELECT 
product_id,
name,
(SELECT AVG(price) FROM order_items WHERE product_id = p.product_id) as avg_price
FROM products p;

After (optimized):

SELECT 
p.product_id,
p.name,
AVG(oi.price) as avg_price
FROM products p
LEFT JOIN order_items oi ON p.product_id = oi.product_id
GROUP BY p.product_id, p.name;

Statistics Management

-- Update statistics
ANALYZE customers;

-- Update specific columns
ANALYZE customers(email, status);

-- Set statistics target
ALTER TABLE orders ALTER COLUMN status SET STATISTICS 1000;

4. Performance Metrics & Optimization 📊

Key Performance Indicators

  1. Query execution time
  2. I/O operations
  3. CPU usage
  4. Memory consumption
  5. Cache hit ratio

Monitoring Queries

-- PostgreSQL query analysis
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders
WHERE order_date >= '2024-01-01';

-- Index usage statistics
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes;

Common Table Design Patterns

Partitioning Example

-- Range partitioning
CREATE TABLE orders (
order_id INT,
order_date DATE,
total_amount DECIMAL(10,2)
) PARTITION BY RANGE (order_date);

CREATE TABLE orders_2024_q1 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');

CREATE TABLE orders_2024_q2 PARTITION OF orders
FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');

5. Anti-Patterns 🚫

Query Anti-Patterns

  1. SELECT * Wrong:
SELECT * FROM customers 
JOIN orders ON customers.customer_id = orders.customer_id;

Correct:

SELECT 
c.customer_id,
c.name,
o.order_date,
o.total_amount
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id;
  1. Implicit Type Conversion Wrong:
SELECT * FROM orders WHERE order_id = '1000';

Correct:

SELECT * FROM orders WHERE order_id = 1000;

Index Anti-Patterns

  1. Over-Indexing Wrong:
-- Creating redundant indexes
CREATE INDEX idx_orders_customer ON orders(customer_id);
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);
CREATE INDEX idx_orders_customer_status ON orders(customer_id, status);

Correct:

-- Create one composite index that can serve multiple queries
CREATE INDEX idx_orders_customer_date_status
ON orders(customer_id, order_date, status);

6. Best Practices & Guidelines 📚

Database Design

  1. Normalize to appropriate level
  2. Use appropriate data types
  3. Implement constraints
  4. Design for scalability
  5. Consider data access patterns

Query Optimization

  1. Use covering indexes
  2. Avoid function calls on indexed columns
  3. Use appropriate join types
  4. Minimize subqueries
  5. Use parameterized queries

Example of Optimized Table Design

CREATE TABLE orders (
order_id BIGSERIAL PRIMARY KEY,
customer_id INTEGER NOT NULL,
order_date DATE NOT NULL DEFAULT CURRENT_DATE,
status VARCHAR(20) NOT NULL,
total_amount DECIMAL(10,2) NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,

CONSTRAINT fk_customer
FOREIGN KEY (customer_id)
REFERENCES customers(customer_id),

CONSTRAINT chk_status
CHECK (status IN ('PENDING', 'PROCESSING', 'COMPLETED', 'CANCELLED')),

CONSTRAINT chk_amount
CHECK (total_amount >= 0)
);

-- Optimize for common queries
CREATE INDEX idx_orders_customer_date
ON orders(customer_id, order_date DESC);

CREATE INDEX idx_orders_status_date
ON orders(status, order_date DESC);

-- Partial index for active orders
CREATE INDEX idx_active_orders
ON orders(order_date DESC)
WHERE status IN ('PENDING', 'PROCESSING');

7. Troubleshooting Guide 🔧

Common Performance Issues

  1. Missing Indexes
-- Find missing indexes (PostgreSQL)
SELECT
schemaname || '.' || relname as table,
seq_scan,
seq_tup_read,
idx_scan,
idx_tup_fetch
FROM pg_stat_user_tables
WHERE seq_scan > idx_scan
AND seq_scan > 1000
ORDER BY seq_scan DESC;
  1. Index Bloat
-- Check index bloat
SELECT
current_database(), schemaname, tablename,
ROUND((CASE WHEN otta=0 THEN 0.0
ELSE sml.relpages::FLOAT/otta END)::NUMERIC,1) AS tbloat,
CASE WHEN relpages < otta THEN 0
ELSE bs*(sml.relpages-otta)::BIGINT END AS wastedbytes
FROM (
SELECT schemaname, tablename, bs,
CEIL((reltuples*((datahdr+ma-
(CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::FLOAT)) AS otta,
relpages
FROM ...
) AS sml;

8. Real-world Optimization Examples 🌐

E-commerce Order Processing

-- Optimized order summary query
WITH daily_stats AS (
SELECT
DATE_TRUNC('day', order_date) as order_day,
COUNT(*) as order_count,
SUM(total_amount) as daily_revenue,
COUNT(DISTINCT customer_id) as unique_customers
FROM orders
WHERE status = 'COMPLETED'
AND order_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY DATE_TRUNC('day', order_date)
),
customer_segments AS (
SELECT
customer_id,
SUM(total_amount) as total_spent,
COUNT(*) as order_count,
MAX(order_date) as last_order_date
FROM orders
WHERE status = 'COMPLETED'
GROUP BY customer_id
)
SELECT
ds.order_day,
ds.order_count,
ds.daily_revenue,
ds.unique_customers,
ROUND(ds.daily_revenue / NULLIF(ds.order_count, 0), 2) as avg_order_value
FROM daily_stats ds
ORDER BY ds.order_day DESC;

9. References and Additional Resources 📚

  1. Performance Tuning Guides
  • Use The Index, Luke!
  • PostgreSQL Performance Optimization
  • MySQL Query Optimization
  1. Books
  • SQL Performance Explained
  • High Performance MySQL
  • PostgreSQL 9.0 High Performance
  1. Tools
  • pgHero
  • MySQL Workbench
  • SolarWinds Database Performance Analyzer
  1. Community Resources
  • Database Administrators Stack Exchange
  • PostgreSQL Performance Mailing List