ArchitectureCircuit Breaker Pattern

The circuit breaker pattern prevents cascading failures in distributed systems by stopping calls to a failing service and giving it time to recover. It's a critical resilience pattern in microservices architectures.

The Problem: Cascading Failures

In a microservices architecture, services call each other synchronously. When one service is slow or down:

  1. Callers wait for responses (threads/connections tie up)
  2. Callers start timing out and retrying
  3. The overloaded service gets even more traffic
  4. The caller's thread pool exhausts, causing it to fail too
  5. Its callers fail too — cascade failure

A single struggling service can take down an entire cluster.

Circuit Breaker States

The circuit breaker wraps calls to a remote service and tracks failure rates.

Closed (Normal Operation)

Requests pass through normally. Failures are counted.

If the failure rate exceeds a threshold (e.g., 50% failures in 60 seconds), the circuit opens.

Open (Failing Fast)

Requests are immediately rejected with an error — no call made to the failing service.

The circuit stays open for a timeout period (e.g., 30 seconds), giving the downstream service time to recover.

Half-Open (Testing Recovery)

After the timeout, the circuit allows a limited number of test requests through.

  • If they succeed → circuit closes (recovery confirmed)
  • If they fail → circuit opens again (still broken)
[CLOSED] → failure threshold exceeded → [OPEN]
[OPEN]   → timeout elapsed          → [HALF-OPEN]
[HALF-OPEN] → test succeeds         → [CLOSED]
[HALF-OPEN] → test fails            → [OPEN]

Benefits

  • Fail fast: Instead of waiting for timeouts, callers get immediate errors
  • Protect struggling services: Reduces load on a service that's trying to recover
  • Prevent cascading failures: Stops failure propagation through the call chain
  • Self-healing: Automatically reconnects when the service recovers

Fallback Strategies

When a circuit is open, what does the caller do?

  • Return cached data: Serve the last known good response
  • Return a default: Show generic content instead of failing
  • Degrade gracefully: Disable the feature entirely with a clear message
  • Queue for later: Persist the request and process it once service recovers
// Example: Product recommendations
function getRecommendations(userId) {
  if (circuitBreaker.isOpen('recommendation-service')) {
    return getDefaultRecommendations(); // Fallback
  }
  return recommendationService.get(userId);
}

Related Patterns

Retry with Exponential Backoff

Retry failed requests, but wait progressively longer between attempts (1s, 2s, 4s, 8s...). Add jitter (random delay) to prevent all retries from slamming the service simultaneously.

Bulkhead

Isolate different services into separate thread pools / connection pools. If one service is slow and exhausts its pool, other services are unaffected.

Named after the watertight compartments in a ship — one flooded compartment doesn't sink the ship.

Timeout

Always set timeouts on network calls. Without timeouts, a slow service holds your threads forever. A good default: 500ms for synchronous user-facing calls.

Implementation

Libraries:

  • Resilience4j (Java) — modern circuit breaker library
  • Hystrix (Java) — Netflix's original (now in maintenance mode)
  • Polly (.NET)
  • opossum (Node.js)

Service Meshes like Istio and Linkerd provide circuit breaking as infrastructure — no library code needed.

Interview Tips

  • Always mention circuit breakers in microservices designs — it shows production maturity
  • Discuss the three states (closed, open, half-open) to demonstrate you understand the pattern deeply
  • Combine with: timeout, retry with backoff, and bulkhead. These patterns work together
  • Key phrase: "To prevent cascading failures, I'd wrap calls to the Payment Service in a circuit breaker with a fallback that queues the payment for retry"
  • For the fallback, think about what a degraded but functional experience looks like — graceful degradation over hard failures