Scalability is a system's ability to handle growing amounts of work by adding resources. Understanding vertical vs. horizontal scaling is the foundation of every system design interview.
Scalability refers to a system's capacity to handle increased load without degrading performance. Every large-scale system — from Netflix to Twitter — was built with scalability at its core.
There are two primary scaling strategies:
Add more power to an existing machine: more CPU, more RAM, faster SSDs.
Pros:
Cons:
Example: Upgrading a database server from 32GB RAM to 256GB RAM.
Add more machines to distribute the load across multiple nodes.
Pros:
Cons:
Example: Running 10 application servers behind a load balancer instead of one powerful server.
Stateless services are far easier to scale horizontally. Each request contains all the information needed to process it — no server-side session state. Stateful services (like databases) require careful coordination when scaled out.
Rule of thumb: Keep application logic stateless, push state to dedicated storage layers.
Elasticity is the ability to automatically scale up or down based on real-time demand. Cloud platforms (AWS Auto Scaling, Kubernetes HPA) make this straightforward. Design systems to take advantage of it.
Scaling the wrong layer wastes money. Before scaling, profile to find the bottleneck:
| Pattern | Description | Use Case | |---|---|---| | Load Balancing | Distribute traffic across servers | Any stateless service | | Caching | Serve data from fast in-memory stores | Read-heavy workloads | | Sharding | Partition data across multiple DBs | Large datasets | | Async Processing | Offload work to queues | Long-running jobs | | CDN | Serve static assets from edge nodes | Global user base |
Interviewers expect you to reason about scale quantitatively. A quick cheat sheet: