DB Sharding & Partitioning — Splitting Data Across Nodes

Split data horizontally across nodes (sharding) or within one node (partitioning) to distribute load and storage.

When to use

Single-node DB is the bottleneck (write throughput or storage)
Row count exceeds index efficiency

Tradeoffs

Cross-shard queries are expensive (scatter-gather pattern)
Rebalancing shards is operationally hard and risky

Go
Python

const numShards = 4

// Hash shard routing: compute shard ID from key, return correct DB conn
func ShardFor(key string) *sql.DB {
    h := fnv.New32a()
    h.Write([]byte(key))
    shardID := int(h.Sum32()) % numShards
    return dbPool[shardID]
}

func GetOrder(ctx context.Context, orderID string) (*Order, error) {
    db := ShardFor(orderID)
    row := db.QueryRowContext(ctx, "SELECT * FROM orders WHERE id = $1", orderID)
    return scanOrder(row)
}

import hashlib

NUM_SHARDS = 4

def shard_for(key: str) -> object:
    digest = int(hashlib.md5(key.encode()).hexdigest(), 16)
    shard_id = digest % NUM_SHARDS
    return db_pool[shard_id]

def get_order(order_id: str) -> dict:
    db = shard_for(order_id)
    return db.query("SELECT * FROM orders WHERE id = %s", (order_id,))

Gotcha: Never shard prematurely. Exhaust read replicas, caching, connection pooling, and index optimization — then shard as a last resort.