Database Sharding — System Design Pattern

Architecture Diagram

How It Works

Sharding splits data across multiple database instances (shards) so no single machine bears the full data volume or write load. Each shard has the same schema but holds a different subset of the data, determined by a shard key.

Horizontal vs Vertical Partitioning

Horizontal Partitioning (Sharding)

Split rows across databases by a shard key (e.g., user_id % N). Each shard has the same schema, different data. This is what people mean by "sharding."

Vertical Partitioning

Split columns — move large or rarely-used columns to a separate table/database (e.g., user profile vs user activity log). Reduces row size for hot data.

Shard Key Selection (The Critical Decision)

The shard key determines which shard holds each row. A good key gives even data distribution AND keeps most queries on a single shard.

Hash-based — hash(user_id) % N. Even distribution but cross-shard range queries are impossible. Most common.
Range-based — user_id 1–1M → shard 0, 1M–2M → shard 1. Allows range scans but prone to hotspots (new users always hit the last shard).
Directory-based — Lookup table maps each key to a shard. Most flexible but the directory is a SPOF and bottleneck.
Geo-based — Shard by region. All EU users on EU shard. Data locality for GDPR compliance.

The Hot Shard Problem

Even with a "good" shard key, celebrity users or viral content can overwhelm a single shard. When user_id=2 has 100M followers and posts frequently, shard 2 gets 80% of read traffic. Fixes: sub-sharding hot keys, caching hot data, or routing celebrity reads to dedicated replicas.

Key Design Decisions

🔑

Shard key: user_id vs entity_id: user_id keeps all of a user's data on one shard (fast profile page) but creates hotspots for celebrity users. entity_id (e.g., photo_id) distributes evenly but makes "show all my photos" a scatter-gather across all shards. Instagram chose user_id + special handling for hot users.

🔄

Resharding strategy: Adding shards means rebalancing data. With key % N, changing N remaps almost everything. Use consistent hashing or logical shards (4096 logical → N physical) to minimize data movement. Plan resharding from day one.

🆔

ID generation: Auto-increment IDs break across shards. Use Snowflake IDs (timestamp + machine + sequence), ULIDs, or embed shard hints in the ID itself (Instagram's approach: timestamp + shard_id + sequence).

💥

Cross-shard queries: JOINs across shards require scatter-gather — query all shards, merge results. Expensive and slow. Alternatives: denormalize data, use a separate search index (Elasticsearch), or redesign the shard key to keep related data together.

When to Use

Sharding adds massive complexity. Only shard when simpler solutions (read replicas, caching, vertical scaling) aren't enough.

"Data doesn't fit on one machine" — Single-digit TB+ datasets that exceed a single node's storage.
"Writes are too high for a single database" — Read replicas don't help with write bottlenecks.
"Users are globally distributed" — Geo-sharding puts data close to users for lower latency.
"Design Instagram/Twitter/Uber" — Any system with billions of rows needs partitioning.

Interview signal: The interviewer wants to see you discuss shard key selection FIRST, then address cross-shard queries and resharding. Mention what queries become expensive and how you'd handle them.

Real-World Examples

Instagram — Shards by user_id on PostgreSQL. Logical shards mapped to physical servers. Snowflake-style IDs with embedded shard hints.
Vitess (YouTube) — Sharding middleware for MySQL. Handles routing, resharding, and schema migrations transparently. Open-sourced by Google.
MongoDB — Built-in sharding by shard key with automatic chunk balancing. Supports hash-based and range-based partitioning.
DynamoDB — Partition key determines data placement. Automatic splitting when partitions get hot. Adaptive capacity redistributes throughput.

Back-of-Envelope Numbers

Metric	Value
Instagram photo metadata (2B+ photos)	~1TB total, ~500 bytes/row
Growth rate (100M new photos/day)	~50GB/day new metadata
Per-shard data (12 physical servers)	~83GB/server (fits in memory)
Writes per shard	~96 writes/sec (100M/day ÷ 12 servers)
Resharding trigger	Server exceeds 500GB or 5K writes/sec
Cross-shard query overhead	~10–100× single-shard query
Logical shards (Instagram approach)	4096 logical → 12 physical (room to grow)

🗂️ Database Sharding