How to use circuit breaker pattern of Netflix’s Hystrix library to handle Cascading Failures In Microservices

4 min readApr 28, 2021

Problem Statement: one among our Microservices (say X) depend on a 3rd party service (say Y) for its functionalities. We observe when the service Y become unhealthy every request from X involving a call to Y increases reaction time as the service X kept on calling the service Y continuously without handling the failure that was happening . Application X’s threads are busy in processing the high response requests which led to a rise in CPU usage and a decrease within a number of free threads to process other request which eventually led to the service becoming unresponsive on production and further resulting in an outage for our business.

Solution: Use Netflix Hystrix Library to handle external service failure scenarios so our application doesn’t waste its resources on continuously calling the unhealthy external service, it skips the decision supported threshold parameters configured, ensuring that the appliance threads and health are in an efficient and healthy state. We maintain a small hystrix thread pool for external calls with max size of ten threads which limits the impact, just in case the external service is unhealthy, also the breaker is about to open within ten seconds if sixty % of the requests fail, the circuit will remain in open state for five seconds then goes to half-open state and eventually to closed state supported if the next request fails or succeeds.

What Can fail during a Microservice Architecture?

There are variety of moving components during a microservice architecture, hence it’s more points of failures. Failures are often caused by a spread of reasons, — errors and exceptions in code, bad deployments, release of latest code, hardware failures, data center failure, poor architecture, lack of unit tests, dependent services, etc.

Why does one got to Make Services Resilient?

The main problem with distributed applications is that they convey over a network — which is often unreliable. Hence you would like to build your microservices during a manner in order that they’re fault-tolerant and handle failures gracefully. In your microservice architecture, there could be a a lot of services talking with one another, so you have to make sure that one failed service doesn’t bring down the whole system.

Circuit Breaker Pattern

You wrap a protected call during a breaker object, which looks for failures. Once the failures reach the predefined threshold value, the breaker trips, and every one further calls to the breaker return with an exception or with some different service or a default message, without a protected call being made in the least . This may confirm the system is responsive and therefore the threads will not expect an unresponsive call.

The Different States of circuit breaker :The breaker has three different states: Closed, Opened, or Half-Opened:

Closed — When everything is fine, the breaker remains within the closed state and every one calls undergo to the services. When the total amount of failures exceed the predefined threshold the breaker trips, and thus it goes into the Opened state.

Opened — The breaker returns an exception for calls without executing the main function.

Half-Open — After a time-out period, the circuit switches to a half-opened state to check if the underlying problem still exists or not. If even one call fails during this half-opened state, the breaker is tripped once more. If it succeeds, the breaker go back to the traditional , closed state.

What Is Hystrix?

Hystrix might be a Latency and Fault Tolerance Library for Distributed Architectures. It’s a latency and/or fault tolerance library designed to separate points of access to remote system, service, and 3rd-party library during a distributed environment. It helps to prevent cascading failures and enable resilience in your complex distributed microservice architecture.

How Does Hystrix Execute Its Goals?

Hystrix does this with:

Binding all calls to external or internal systems (/ “dependency”) during the HystrixCommand or HystrixObservableCommand objects which usually execute in a separate thread.
Timing out calls that take more time(in milliseconds) than the thresholds you define through HystrixCommand.
Maintaining a little thread-pool (/ semaphore) for each dependency; If it becomes full, request destined for that dependency is going to be immediately rejected instead of queued up.
Tripping the circuit breaker that prevent all requests to a specific service for a period of your time , either manually or automatically if the error % for the service exceeds the defined threshold value.
Perform a fallback logic when a call fail/ is rejected /time-out, /short-circuit.

Using Hystrix with Spring Boot Application: –

Add the below dependencies in the POM file :

<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-netflix-hystrix</artifactId>
    <version>{version}</version>
</dependency>

To choose appropriate verison — Spring Cloud Starter — Netfilx 2.0.1.RELEASE

Include @EnableCircuitBreaker annotation to use hystrix circuit breaker in your microservice.

@SpringBootApplication
@EnableCircuitBreaker
public class Application {
    public static void main(String[] arg) {
        SpringApplication.run(Application.class, arg);
    }
    
}

One way to bind a call with Hystrix Command is the Example below :

public static final String DISPOSE_KEY = "disposeKey";
public static final String DISPOSE_POOL = "disposePool";


@HystrixCommand(commandKey = DISPOSE_KEY, threadPoolKey = DISPOSE_POOL)
public Future<Void> disposeCall(InteractioData interactionData) {

Hystrix Properties

hystrix.command.disposeKey.circuitBreaker.sleepWindowInMilliseconds=5000
hystrix.command.disposeKey.circuitBreaker.requestVolumeThreshold=5
hystrix.command.disposeKey.circuitBreaker.errorThresholdPercentage=60
hystrix.command.disposeKey.execution.isolation.thread.timeoutInMilliseconds=10000

hystrix.threadpool.disposePool.maxQueueSize=10
hystrix.threadpool.disposePool.queueSizeRejectionThreshold=10

Hystrix Dashboard :

Hystrix Dashboard allows us to watch Hystrix metrics in real time.

View the Hystrix Dashboard

How to use circuit breaker pattern of Netflix’s Hystrix library to handle Cascading Failures In Microservices

Written by Arnab Roy