Database

Database Sharding

DatabaseScalingPartitioningSystem Design

Horizontally partition data across multiple databases. Learn sharding strategies, consistent hashing, and rebalancing techniques.

Sharding Strategies

📏

Range Sharding

Data is split by a contiguous range of key values. Simple to implement and supports efficient range queries.

Range QueriesSimple RouterHot-spot Risk

Use case:Use case: time-series data, ordered user IDs, date-partitioned event logs

#️⃣

Hash Sharding

hash(key) % N assigns each record to a shard. Guarantees even distribution but makes range queries inefficient.

Uniform LoadO(1) RouteReshard Pain

Use case:Use case: user profiles, social media posts, key-value stores

🗂️

Directory Sharding

A shard lookup table maps keys or tenants to specific shards. Maximum flexibility — relocate data without hash changes.

FlexibleMulti-TenantSPOF Risk

Use case:Use case: SaaS multi-tenancy, compliance isolation, VIP customer data placement

🔄

Consistent Hashing

Keys and nodes share a virtual ring. Adding/removing a shard only moves ~K/N keys — minimal data migration.

Elastic ScaleHash RingVNodes

Use case:Use case: Cassandra, DynamoDB, Redis cluster, CDN edge caches

Architecture Overview

How a shard router distributes requests across database shards with replicas

flowchart TD
    Client([👤 Client Request])
    Router{🔀 Shard Router}
    Client --> Router
    Router -->|Range Map| ShardA[(🗄 Shard A\nID 1–1M)]
    Router -->|Hash Slot 0–5460| ShardB[(🗄 Shard B\nSlots 0–5460)]
    Router -->|Directory Lookup| ShardC[(🗄 Shard C\nTenant acme-corp)]
    ShardA --> RA[(🔁 Replica A)]
    ShardB --> RB[(🔁 Replica B)]
    ShardC --> RC[(🔁 Replica C)]

    style Client fill:#1a1f2b,stroke:#7c4dff,color:#fff
    style Router fill:#1a1f2b,stroke:#7c4dff,color:#fff
    style ShardA fill:#12161f,stroke:#00e5ff,color:#fff
    style ShardB fill:#12161f,stroke:#7c4dff,color:#fff
    style ShardC fill:#12161f,stroke:#ff9f1c,color:#fff
    style RA fill:#0d1117,stroke:#00e5ff,color:#aaa
    style RB fill:#0d1117,stroke:#7c4dff,color:#aaa
    style RC fill:#0d1117,stroke:#ff9f1c,color:#aaa

Interactive Sharding Flow

Step through each strategy to see how a write is routed

Waiting...

Range-Based Sharding

Split data by value ranges (e.g., user IDs 1–1M on Shard A)

👤Client

↓

🔀Shard Router

↓

🗄️Shard AIDs 1–1M

🗄️Shard BIDs 1M–2M

🗄️Shard CIDs 2M+

Client sends a write for user_id = 750,000

Router checks range map: 1–1M → Shard A

Write routed directly to Shard A (O(1) lookup)

⚠ Hot-spot risk: if most active users are 1–1M, Shard A gets overloaded

✓ Great for range scans (e.g., "all orders between Jan–Mar"). Needs reshard plan for skew.

Routed Target

Processing

Overloaded

Migrating

Key Concepts

Critical insights every engineer should understand about sharding

Why Shard at All?

A single DB node has hard limits on storage, connections, and write throughput. Sharding horizontally scales the data tier by splitting the dataset across N independent nodes — each owns a fraction of the keyspace.

Rule of thumb: shard when single-node query latency degrades or storage exceeds 70% capacity

Cross-Shard Queries

Queries that span multiple shards require scatter-gather: fan out to all shards in parallel, then merge. This adds latency and complexity — good schema design minimises cross-shard joins.

Strategy: co-locate related data (user + orders) on the same shard using a tenant/user key

Rebalancing & Hotspots

Load skew happens when one shard receives far more traffic than others. Mitigation: consistent hashing with virtual nodes, or pre-splitting with range sharding. Plan for rebalancing from day one.

Hotspot example: viral posts, celebrity users, high-frequency trading keys

Real-World Examples

How popular databases implement sharding

💎Cassandraconsistent

Uses consistent hashing with configurable virtual nodes (vnodes) per node. Fully peer-to-peer — no single coordinator.

🟠DynamoDBconsistent

AWS's managed NoSQL uses a variant of consistent hashing. Partition key maps to a token range; multiple replicas per token.

🐬MySQL (Vitess)range

Vitess shards MySQL with a keyspace range approach. Used by YouTube and Slack for billions of rows across many shards.

🍃MongoDBrange/hash

Supports both range and hash (hashed index) sharding. A config server stores chunk-to-shard mappings.

⚡Redis Clusterhash

Divides keyspace into 16,384 hash slots. Each master owns a subset. Gossip protocol tracks slot ownership.

🐦Twitter Snowflakedirectory

ID generation service distributes ID blocks to hosts — a form of directory-based pre-allocation sharding.

Strategy Comparison

Trade-offs at a glance across all four sharding strategies

Strategy	Hotspot Resistance	Range Query	Reshard Ease	Complexity	Best For
Range					Time-series, ordered IDs
Hash					Uniform key-value stores
Directory					Multi-tenant SaaS, compliance
Consistent Hash					Distributed caches, NoSQL

Resources

Title	Type	Link
Database Sharding Explained by ByteByteGo	youtube	Open
Database Sharding: A System Design Concept	web	Open
Understanding Database Sharding	web	Open
Cassandra Consistent Hashing Docs	web	Open

Guide Contributors

ObydulSenior Software Engineer

Edit this page on GitHub↗

Init guide environment...