DB Sharding & Partitioning — Splitting Data Across Nodes
Split data horizontally across nodes (sharding) or within one node (partitioning) to distribute load and storage.
When to use
- Single-node DB is the bottleneck (write throughput or storage)
- Row count exceeds index efficiency
Tradeoffs
- Cross-shard queries are expensive (scatter-gather pattern)
- Rebalancing shards is operationally hard and risky
- Go
- Python
const numShards = 4
// Hash shard routing: compute shard ID from key, return correct DB conn
func ShardFor(key string) *sql.DB {
h := fnv.New32a()
h.Write([]byte(key))
shardID := int(h.Sum32()) % numShards
return dbPool[shardID]
}
func GetOrder(ctx context.Context, orderID string) (*Order, error) {
db := ShardFor(orderID)
row := db.QueryRowContext(ctx, "SELECT * FROM orders WHERE id = $1", orderID)
return scanOrder(row)
}
import hashlib
NUM_SHARDS = 4
def shard_for(key: str) -> object:
digest = int(hashlib.md5(key.encode()).hexdigest(), 16)
shard_id = digest % NUM_SHARDS
return db_pool[shard_id]
def get_order(order_id: str) -> dict:
db = shard_for(order_id)
return db.query("SELECT * FROM orders WHERE id = %s", (order_id,))
Gotcha: Never shard prematurely. Exhaust read replicas, caching, connection pooling, and index optimization — then shard as a last resort.