A rate limiter is a very useful component while building applications. It is useful for throttling unwanted or excess requests that the server recieves. Common rate limiting algorithms are Token Bucket, Leaky Bucket, Fixed Window Counter, Sliding Window Log, Sliding Window Counter. Depending on the usecase, requirements, the algorithms are used accordingly.
So here are few of the algorithms that I tried to implement and understand its workings. I have performance tested all the algorithms in postman with virtual users and tried to understand the patterns of success rates.
Token Bucket algorithm
It seemed the simplest of all algorithms. A bucket of size N is used to store tokens. The bucket is filled with tokens at the rate of R every time period.When a request arrives, a token is removed. If there are no tokens, the request is not accepted. Usually the bucket is assigned per user or IP address.
I tested with fixed 10 virtual users for over a minute, bucket capacity as 10 tokens and refill rate as 1 token per second.
As you can see, initially there are no errors. The bucket had a capacity of 10 tokens, which are exhausted within a second and thus, 90 percent of the request fails as there are 10 users trying to request the API, but only 1 token added per second.
Fixed Window Counter algorithm
The time is divided into fixed windows of N time interval and requests are counted in each window. Each request increases the counter value. If the counter is greater than the threshold, the request is rejected. The counter resets during the start of a new window.
I tested this with 10 virtual users with 60 second window and 60 request threshold.
It clearly shows how the success rates are high at the start of each window followed by a number of rejections.At the start of the window, as the counter is zero, it allows all requests. But as soon as the counter exceeds the threshold value, the requests are rejected until the window resets.
Sliding Window Log algorithm
Each request's timestamp log are stored. Whenever a new request comes, the logs older than the window size are removed. If the number of existing requests are within the threshold, the request is accepted and its timestamp is stored or else rejected. You can observe that the window is not fixed here, as we are comparing each request's timestamp with the requests that fall within the window size.
I tested this also with 10 virtual users with 60 second window and 60 request threshold. An observation that can be made is that it avoids the sudden spike in requests as depicted in the fixed window counter. It exhibits a smoother distribution of error requests.
Sliding Window Counter algorithm
It is a combination of fixed window counter and sliding window log algorithms. Here, we store the no of requests made in the previous and current window. Whenever a request arrives, we consider the time from the current timestamp to the past timestamps which fall within the window size.If the weighted number of requests is within the threshold, the request is accepted and the count is increased; otherwise, it is rejected.
For this too, I tested with 10 virtual users with 60 second window and 60 request threshold. The graph shows smooth distribution of error and success rates. It also avoids the spike in requests as observed in the fixed counter algorithm.
Conclusion
These implementations are just to understand how stuff works. The optimal window size or threshold value greatly depends on the very individual systems and their behaviour. The above values were just for demonstration.