The circuit breaker pattern prevents cascading failures in distributed systems by stopping calls to a failing service and giving it time to recover. It's a critical resilience pattern in microservices architectures.
In a microservices architecture, services call each other synchronously. When one service is slow or down:
A single struggling service can take down an entire cluster.
The circuit breaker wraps calls to a remote service and tracks failure rates.
Requests pass through normally. Failures are counted.
If the failure rate exceeds a threshold (e.g., 50% failures in 60 seconds), the circuit opens.
Requests are immediately rejected with an error — no call made to the failing service.
The circuit stays open for a timeout period (e.g., 30 seconds), giving the downstream service time to recover.
After the timeout, the circuit allows a limited number of test requests through.
[CLOSED] → failure threshold exceeded → [OPEN]
[OPEN] → timeout elapsed → [HALF-OPEN]
[HALF-OPEN] → test succeeds → [CLOSED]
[HALF-OPEN] → test fails → [OPEN]
When a circuit is open, what does the caller do?
// Example: Product recommendations
function getRecommendations(userId) {
if (circuitBreaker.isOpen('recommendation-service')) {
return getDefaultRecommendations(); // Fallback
}
return recommendationService.get(userId);
}
Retry failed requests, but wait progressively longer between attempts (1s, 2s, 4s, 8s...). Add jitter (random delay) to prevent all retries from slamming the service simultaneously.
Isolate different services into separate thread pools / connection pools. If one service is slow and exhausts its pool, other services are unaffected.
Named after the watertight compartments in a ship — one flooded compartment doesn't sink the ship.
Always set timeouts on network calls. Without timeouts, a slow service holds your threads forever. A good default: 500ms for synchronous user-facing calls.
Libraries:
Service Meshes like Istio and Linkerd provide circuit breaking as infrastructure — no library code needed.