guava ratelimiter current limiter(1)
[Subscribe to the collection of columns , pay attention to the public account, all paid articles of the author can be read (continuously updated)]
This article mainly introduces the concepts and algorithms related to Internet current limiting , and is accompanied by Java code implementation based on guava ratelimiter . Including counter method , sliding window counting method, funnel bucket algorithm , token bucket algorithm . At the end of the article, implement custom current limiting annotations and aop interception interface to implement current limiting.
Before introducing current limiting, let's introduce a few confusing concepts. Including service fuse , service degradation , and service isolation .
Before understanding circuit breaker , understand another concept: the avalanche effect of microservices . Because the circuit breaker mechanism is usually used as a microservice link protection mechanism to deal with the avalanche effect . In a microservice architecture, a microservice is usually an independent application that completes a single business function. The advantage of this is the maximum possible decoupling between various business functions, and each microservice can evolve independently. Usually an application may be composed of many microservices , and the services call each other through RPC. Suppose there is the following service invocation link: A, B depend on C to call E, F. If the E service cannot provide services normally, C's timeout retry mechanism will be implemented. At the same time, new calls are continuously generated, which will lead to a large backlog of calls from C to the E service, resulting in a large number of call waiting and retry calls, which will gradually exhaust C resources, such as memory or CPU, and affect C to adjust F at the same time. The entire app is unavailable. In this example, due to the failure of E on the link, the calls to microservices A and B will occupy more and more system resources, thereby causing the system to crash, which is the so-called " avalanche effect ". The circuit breaker mechanism is a microservice link protection mechanism to deal with the avalanche effect . There are many fuses in life
example of. For example, if the voltage somewhere in the circuit is too high, the fuse will blow to protect the circuit from overload. In the stock market, if the stock index rises or falls too high and hits the set fuse point, trading will be suspended for a certain period of time. The circuit breaker mechanism in the microservice architecture works similarly. When a microservice of the calling link is unavailable, or the response time is too long, or the number of errors reaches a certain threshold, the service will be blown , that is, the response information will be quickly returned. When it is detected that the microservice call on the node responds normally, the normal call link is gradually restored.
Service downgrade mainly refers to that under the condition of a sharp increase in server pressure, some non-core services or pages are not processed or simply processed according to a certain strategy, or a certain service is limited in flow, thereby releasing server resources to ensure the normal operation of core business. or operate efficiently. For example, when JD.com 618 is active, it downgrades services that are not related to transactions , such as closing a service, or viewing historical orders, commodity historical reviews and other non-transaction core businesses, only showing the latest 100 items, etc.
Isolation refers to the isolation of services or resources. Service isolation can limit the scope of impact of a service failure to ensure that other services are still available. Resource isolation generally refers to reducing resource competition between services through isolation. There are many granularities of resource isolation, such as thread isolation, process isolation, and computer room isolation. Thread isolation is the isolation of thread pool resources, and the execution of different services uses different thread pools. The advantage of this is that even if one of the service thread pools is full, it will not affect other services. For example, in the figure below, Tomcat processes requests and allocates a thread pool to each microservice .
Service throttling is to limit the number of requests, that is, the request rate within a certain time window. Once the limit rate is reached, it can refuse service (direct to an error page or inform the system that the system is busy), wait in line (such as seckill, user comments, place an order), downgrade (return to the bottom line or default data).
Service fuse and service degradation are all technical system protection measures adopted from the perspective of system availability to prevent system response delays or even crashes. Service fuse is generally caused by a downstream service failure, and service degradation is generally considered from the overall business load, in order to reduce the system load. The current limit is the limit on the number of requests per unit time. All three use some means to ensure the availability of the system when the traffic is overloaded. Service isolation is to allow different services to use their own independent line resource pools to avoid the impact of resource competition between services.
Common current limiting means are as follows. Limit the total number of concurrency (such as database connection pool, thread pool), limit the number of instantaneous concurrency (such as the limit_conn module of nginx, which is used to limit the number of instantaneous concurrent connections), and limit the average rate within a certain time window (RateLimiter, limit_req of nginx module); in addition, there are also such as limiting the frequency of RPC calls, limiting the consumption rate of MQ, etc.
The counter algorithm uses the counter to accumulate the number of visits in the cycle, and when the set threshold is reached, the current limiting strategy is triggered. At the beginning of the next cycle, it is cleared and counted again. For example, limit the total number of requests to 100 in 1 minute. Returns failure if it exceeds 100.
Simple counting is simple, but there is a fatal problem, that is, the criticality problem. For example, in a scenario where the total number of requests is limited to 100 in 1 minute, 100 requests came in the previous minute until the end of the minute, and 100 requests came immediately at the beginning of the next minute. Although in two different one-minute intervals, in fact, 200 requests came in less than one minute. Therefore, the counter current limit is invalid.
The sliding window algorithm further divides the time period into N small periods, records the number of visits in each small period, and deletes the expired small periods according to the time sliding.
As shown in the figure below, assuming that the time period is 1min, 1min is divided into 6 small periods, and the number of visits in each small period is counted. If the number of visits is 100 in the 6th small cycle, and the number of visits is also 100 in the 7th small cycle, then the limit of the number of visits in the sliding window (outlined by the red dotted line) is triggered. It can be seen that the more unit intervals of the sliding window are divided, the smoother the scrolling of the sliding window will be, and the more accurate the current limiting statistics will be.
The funnel bucket algorithm , as the name suggests, has a container inside the algorithm, similar to the funnel in life. When the request comes in, it is equivalent to pouring water into the funnel, and then flowing out at a constant speed from the lower water outlet. Regardless of how the water inlet rate changes, the water outlet rate remains the same until the leaky bucket is empty. Due to the unknown water inflow rate, the burst flow will accumulate in the bucket before it can be processed. If the bucket capacity is exceeded, it will overflow, that is, discard the request. The schematic diagram of the funnel bucket is as follows:
The token bucket algorithm is an improvement on the funnel bucket algorithm to some extent . Token buckets can limit the average rate of requests while allowing a certain degree of bursty calls. In the token bucket algorithm, there is a bucket to store a fixed number of tokens. The algorithm puts tokens into the bucket at a certain rate. Each request needs to obtain the token in the bucket before proceeding, otherwise it will wait for the available token, or reject it directly.
The action of releasing tokens is continuous. If the number of tokens in the bucket reaches the upper limit, the tokens will be discarded, so a large number of available tokens may be held in the bucket all the time. At this point, when the request comes in, you can directly get the token for execution. For example, if qps is set to 100, then 1 second after the current limiter is initialized, there are already 100 tokens in the bucket. If there are no requests before, suddenly 100 requests come, and the current limiter can withstand the instantaneous 100 requests. It can be seen that the request will wait only when there is no token in the bucket, and the final effect is to execute at a certain rate. The schematic diagram of the token bucket is as follows:
In addition to current limiting at the application layer, you can also limit current at the network layer, such as setting the access limit of a single client IP through the current limiting module of nginx, which will not be repeated in this article.
The next article "RateLimiter Internet Current Limiting (Part 2)" will introduce the commonly used current limiting algorithm java implementation. Including counter method , sliding window counting method, funnel bucket algorithm , token bucket algorithm , as well as custom current limiting annotations and aop interception interface to implement current limiting.
Code download address: https://github.com/bigbirditedu/learn-ratelimit