System Design FundamentalsScalability Basics

Scalability is a system's ability to handle growing amounts of work by adding resources. Understanding vertical vs. horizontal scaling is the foundation of every system design interview.

What is Scalability?

Scalability refers to a system's capacity to handle increased load without degrading performance. Every large-scale system — from Netflix to Twitter — was built with scalability at its core.

There are two primary scaling strategies:

Vertical Scaling (Scale Up)

Add more power to an existing machine: more CPU, more RAM, faster SSDs.

Pros:

  • Simple — no code changes required
  • No distributed system complexity
  • Lower latency (no network hops between services)

Cons:

  • Hard limit — a single machine can only get so powerful
  • Single point of failure
  • Expensive at the high end
  • Requires downtime to upgrade hardware

Example: Upgrading a database server from 32GB RAM to 256GB RAM.

Horizontal Scaling (Scale Out)

Add more machines to distribute the load across multiple nodes.

Pros:

  • Theoretically unlimited scale
  • Increased fault tolerance — losing one node doesn't kill the system
  • Can use commodity hardware

Cons:

  • Added complexity (load balancing, distributed state, consistency)
  • Network overhead between nodes
  • Harder to reason about correctness

Example: Running 10 application servers behind a load balancer instead of one powerful server.

Key Scalability Concepts

Stateless vs. Stateful Services

Stateless services are far easier to scale horizontally. Each request contains all the information needed to process it — no server-side session state. Stateful services (like databases) require careful coordination when scaled out.

Rule of thumb: Keep application logic stateless, push state to dedicated storage layers.

Elasticity

Elasticity is the ability to automatically scale up or down based on real-time demand. Cloud platforms (AWS Auto Scaling, Kubernetes HPA) make this straightforward. Design systems to take advantage of it.

Bottleneck Identification

Scaling the wrong layer wastes money. Before scaling, profile to find the bottleneck:

  • CPU-bound: More cores or horizontal scaling helps
  • Memory-bound: More RAM or caching
  • I/O-bound: Faster storage, read replicas, caching
  • Network-bound: CDN, compression, batching

Scalability Patterns

| Pattern | Description | Use Case | |---|---|---| | Load Balancing | Distribute traffic across servers | Any stateless service | | Caching | Serve data from fast in-memory stores | Read-heavy workloads | | Sharding | Partition data across multiple DBs | Large datasets | | Async Processing | Offload work to queues | Long-running jobs | | CDN | Serve static assets from edge nodes | Global user base |

Estimating Scale

Interviewers expect you to reason about scale quantitatively. A quick cheat sheet:

  • 1 server handles ~1,000–10,000 req/sec depending on complexity
  • MySQL can handle ~1,000–5,000 writes/sec
  • Redis handles ~100,000–1,000,000 ops/sec
  • A gigabit network link = ~125 MB/s throughput

Interview Tips

  • Always clarify scale requirements first: "How many users? Reads vs. writes ratio? Peak vs. average load?"
  • Start with a simple architecture that works, then evolve it to handle scale
  • Identify and call out bottlenecks proactively — don't wait for the interviewer to ask
  • Know the difference between latency optimization (making individual requests faster) and throughput optimization (handling more requests)