Scalability Basics | System Design Library

What is Scalability?

Scalability refers to a system's capacity to handle increased load without degrading performance. Every large-scale system — from Netflix to Twitter — was built with scalability at its core.

There are two primary scaling strategies:

Vertical Scaling (Scale Up)

Add more power to an existing machine: more CPU, more RAM, faster SSDs.

Pros:

Simple — no code changes required
No distributed system complexity
Lower latency (no network hops between services)

Cons:

Hard limit — a single machine can only get so powerful
Single point of failure
Expensive at the high end
Requires downtime to upgrade hardware

Example: Upgrading a database server from 32GB RAM to 256GB RAM.

Horizontal Scaling (Scale Out)

Add more machines to distribute the load across multiple nodes.

Pros:

Theoretically unlimited scale
Increased fault tolerance — losing one node doesn't kill the system
Can use commodity hardware

Cons:

Added complexity (load balancing, distributed state, consistency)
Network overhead between nodes
Harder to reason about correctness

Example: Running 10 application servers behind a load balancer instead of one powerful server.

Key Scalability Concepts

Stateless vs. Stateful Services

Stateless services are far easier to scale horizontally. Each request contains all the information needed to process it — no server-side session state. Stateful services (like databases) require careful coordination when scaled out.

Rule of thumb: Keep application logic stateless, push state to dedicated storage layers.

Elasticity

Elasticity is the ability to automatically scale up or down based on real-time demand. Cloud platforms (AWS Auto Scaling, Kubernetes HPA) make this straightforward. Design systems to take advantage of it.

Bottleneck Identification

Scaling the wrong layer wastes money. Before scaling, profile to find the bottleneck:

CPU-bound: More cores or horizontal scaling helps
Memory-bound: More RAM or caching
I/O-bound: Faster storage, read replicas, caching
Network-bound: CDN, compression, batching

Scalability Patterns

| Pattern | Description | Use Case | |---|---|---| | Load Balancing | Distribute traffic across servers | Any stateless service | | Caching | Serve data from fast in-memory stores | Read-heavy workloads | | Sharding | Partition data across multiple DBs | Large datasets | | Async Processing | Offload work to queues | Long-running jobs | | CDN | Serve static assets from edge nodes | Global user base |

Estimating Scale

Interviewers expect you to reason about scale quantitatively. A quick cheat sheet:

1 server handles ~1,000–10,000 req/sec depending on complexity
MySQL can handle ~1,000–5,000 writes/sec
Redis handles ~100,000–1,000,000 ops/sec
A gigabit network link = ~125 MB/s throughput

Interview Tips

Always clarify scale requirements first: "How many users? Reads vs. writes ratio? Peak vs. average load?"
Start with a simple architecture that works, then evolve it to handle scale
Identify and call out bottlenecks proactively — don't wait for the interviewer to ask
Know the difference between latency optimization (making individual requests faster) and throughput optimization (handling more requests)

System Design FundamentalsScalability Basics