Thumbnail image

Better Rate Limiting With Kong

A More Intelligent Rate Limiting Strategy

There are many methods that you can use to protect your backend servers from being overloaded with too much traffic; from static pages to caching to CDNs. Sometimes though, like when you are serving dynamic or personalized pages, you can’t do anything but actually have requests hit your server.

Most of the time you would have enough resources to be able to process the requests, and if you don’t, your servers will start to overload and perhaps return 500 Errors.

This is where rate limiting can help. You can set your proxy server to return a 429 Response to protect your backend server from going down.

How can Kong help avoid this?

The Rate Limiting Plugin provided by Kong will allow you to limit the amount of requests to send to your backend server. You can set the amount of requests to allow within a specified time window. For example, 1000 requests per second. When clients are within this limit, Kong will pass the request through to the server. However, when the request limit is exceed, Kong will return a 429 status code and not send the request to your backend server, ensuring it is not overwhelmed. Clients should respect this code and back-off, allowing your server to recover. Even if they don’t back off, they won’t get a request through until the time window has expired.

sequenceDiagram participant c as Client participant p as Kong Proxy participant s as Web Server alt Rate limit not exceeded c->>p: HTTP Request Note over p: Increment rate limit counter
Not exceeded rate limit p->>s: Pass through request s-->>p: Server response p-->>c: Proxy passes through response else Rate limit exceeded c->>p: HTTP Request Note over p: Exceeded rate limit p-->>c: Respond with 429 status code end

This will ensure your server won’t go down, but if you are getting the traffic, you really want to be able to serve it. It would be a waste to not be able to serve the requests. So, in your monitoring system, you would want to alert that you are rate limiting your clients, enabling your on-call engineer to scale up your servers to handle the traffic. Of course, you will have to adjust the rate limit settings in your Kong rate limiting plugin to allow more traffic. If you think that that sounds like an accident just waiting to happen, then you are not wrong. There must be a better way to do that without manual intervention.

Enter Kong’s Response Rate Limiting

By using Kong’s Response Rate Limiting plugin we can get the server to tell us when we should rate limit the clients. This gives you more control, and allows you to scale up without Kong needing to be adjusted. The plugin works in a similar way to the normal rate limiting plugin, except, instead of counting requests from the clients, it counts responses from the server that contain a preset header. This means that you can use middleware to detect when your resources are low and add the header to your responses which the plugin will detect and enable the rate limit.

sequenceDiagram participant c as Client participant p as Kong Proxy participant s as Web Server alt Resources within limits c->>p: HTTP Request Note over p: Not exceeded rate limit p->>s: Pass through request Note over s: Resources within limits s-->>p: Server response p-->>c: Proxy passes through response else Resources exceeded Loop Until rate limit exceeded c->>p: HTTP Request p->>s: Pass through request Note over s: Resource limit exceeded s-->>p: Server response with rate limit header Note over p: Increment rate limit counter p-->>c: Proxy passes through response end Loop While in rate limit time window c->>p: HTTP Request Note over p: Rate limit exceeded p-->>c: Respond with 429 status code end end

The trick is in how you setup your autoscaling rules (you are running in the cloud, aren’t you?) to scale up your server before you need to set that header. For example, if you know you can’t handle more than 90% CPU usage, you want to set the header at ~80% and autoscale your server at 70%. That way, you should never see the rate limit being hit. Your on call engineers can rest easy, knowing that, should you receive an increase in traffic, the system will automatically handle it.

comments

comments powered by Disqus