API Rage Limiting, I mean Rate Limiting

Rate limiting is a fact of life when working with 3rd party APIs. If your usage (burst or otherwise) falls below that rate limit, great! However, if you are like me and are creating distributed, elastic, consumers Redis offers some interesting features to help coordinate your workers to keep things running smoothly.
cover

As far as rate limited in APIs go, Google AdWords implementation is not easily predictable. Google has an entire article dedicated to rate limiting, but other than a 10,000 operations per day limit for the Basic tier, the crystal ball is hazy.

In short, Google will limit you at any time based on a number of factors, some of which are entirely out of your control (such as overall usage). Chances are that if your usage of the AdWords API only requires a Basic account, rate limiting isn't much of an issue. For platforms like Sightly's TargetView where scaling (in particular AdWords) is in the DNA of the products use case, rate limiting is a huge concern.

The knee jerk reaction to dealing with rate limiting is just to arbitrarily slow down the code in order to avoid the issue altogether:

var page = budgetService.get(selector);
Thread.Sleep(500);

Done. Right?

Most likely not. We are adding unnecessary latency in the system. The response needs to be more efficient.

So the next step was to try to make Thread.Sleep smarter. This should probably involve some sort of escalated sleep time algorithm triggered from rate limiting exception provided by the API we are accessing. The Google AdWords API provides us with the RateExceededError, and it even calculates a recommended wait time which is escalated in subsequent errors. Why don't we just handle that error, wait for the recommended time, and try again?

private static void ApiCallWithRetries(Action call, ILog log, int maxTries)
{
    bool rateerrored;
    var tries = 0;
    do
    {
        try
        {
            tries++;
            call();
            rateerrored = false;
        }
        catch(AdWordsApiException ex)
        {
            if(!(ex.ApiException is RateExceededError)) throw;
            if(tries == maxTries) throw;
            rateerrored = true;
            var rer = ex.ApiException as RateExceededError;
            var ms = TimeSpan.FromSeconds(rer.retryAfterSecondsSpecified && rer.retryAfterSeconds > 0 
                ? rer.retryAfterSeconds
                : 1);
            Thread.Sleep(ms);
        }
	} while(rateerrored);
}

Here is how you would use the function above:

BudgetPage page = null;

ApiCallWithRetries(() =>
{
    page = budgetService.get(selector);
}, 5);

This worked fine for a single threaded system, but we ran into issues when we tried to horizontally scale our workers.

Rate Limited I thought buses were supposed to make traffic better!

Its obvious that the workers all need a foreman to let them know when to slow down and when it is OK to speed back up. We want this system to be simple and fast so it will add as little overhead as possible. For this I settled on using a Redis Cache.

Redis is a simple key/value cache database. It is fast. Like RAM fast (because the working database is stored in memory). This is fine because I don't need to store a lot of data to make this work. In addition is supports a couple of features that make it interesting for our use case:

Replication to keep our up time higher, since it will now represent another single point of failure in our system.
Keys can be added with an expiration duration.

It is this second feature which is most interesting. We can use this mechanism as part of the timeout process. The coordinated wait time and the expiration of the key can coincide, so that by default the mere existance of the key becomes a traffic signal to the workers.

So let's start.

First we need to setup the Redis Cache. For this example we are going to do it using the new Redis Cache service in Azure.

Setting it up is simple, but only available via the new beta Portal:

Setting Up Redis Cache

Set a name, choose your pricing tier, subscription, and location. In a few moments you have a new cache ready to go.

Once it is setup record the full DNS name from properties:

Redis Cache Properties

Also record your primary key:

Redis Cache Keys

Now you can create a connection in code. I am using the StackExchange Redis library to connect to Redis from C#.

You just need to supply the connection string to the ConnectionMultiplexer class:

var redisConnection = ConnectionMultiplexer.Connect("<dnsname>.redis.cache.windows.net,ssl=true,password=<primary or secondary key>");

The connection string is hard coded here for simplicities sake. Also the connection is meant to be used as a singleton as described in the documentation:

... ConnectionMultiplexer implements IDisposable and can be disposed when no longer required ... it is exceptionally rare that you would want to use a ConnectionMultiplexer briefly, as the idea is to re-use this object.

Now that we have our connection, let's show how I am using it to coordinate several workers.

public class ApiCallWithRateLimits
{
    private readonly ConnectionMultiplexer _redis;
    private readonly string _developerToken;
    private const string KeyNamespace = "ratelimit:";

    public ApiCallWithRateLimits(ConnectionMultiplexer redis, string developerToken)
    {
        _redis = redis;
        _developerToken = developerToken;
    }

    public void Execute(long customerId, Action call, int maxTries = 1)
    {
        maxTries = maxTries > 0 ? maxTries : 1;

        var db = _redis.GetDatabase();

        var accountLimit = (long)db.StringGet(KeyNamespace + customerId);
        var developerLimit = (long)db.StringGet(KeyNamespace + _developerToken);

        var rateLimitUntil = accountLimit >= developerLimit ? accountLimit : developerLimit;

        if (rateLimitUntil > DateTime.UtcNow.Ticks)
        {
            var timeout = new DateTime(rateLimitUntil) - DateTime.UtcNow;
            Thread.Sleep(timeout);
        }

        bool rateerrored;
        var tries = 0;

        do
        {
            try
            {
                tries++;
                call();
                rateerrored = false;
            }
            catch (AdWordsApiException ex)
            {
                if (!(ex.ApiException is ApiException)) throw;
                var apiException = ex.ApiException as ApiException;
                if (!(apiException.errors.All(err => err is RateExceededError))) throw;

                rateerrored = true;

                var rers = apiException.errors
                    .Where(err => err is RateExceededError)
                    .Cast<RateExceededError>()
                    .OrderByDescending(err => err.rateScope)
                    .ToList();

                var sleepTime = 0;

                rers.ForEach(rer =>
                {
                    var rateScope = string.Empty;

                    switch (rer.rateScope.ToUpperInvariant())
                    {
                        case "ACCOUNT":
                            rateScope = KeyNamespace + customerId;
                            break;
                        case "DEVELOPER":
                            rateScope = KeyNamespace + _developerToken;
                            break;
                    }

                    var expireAt = DateTime.UtcNow.AddSeconds(rer.retryAfterSecondsSpecified && rer.retryAfterSeconds > 0
                        ? rer.retryAfterSeconds
                        : 1);

                    var expireTimespan = expireAt - DateTime.UtcNow;

                    db.StringSet(rateScope, expireAt.Ticks, expireTimespan, When.Always,
                        CommandFlags.HighPriority | CommandFlags.FireAndForget);

                    sleepTime = expireTimespan.Milliseconds > sleepTime ? expireTimespan.Milliseconds : sleepTime;

                });
                
                if (tries == maxTries) throw;

                Thread.Sleep(sleepTime);
            }
        } while (rateerrored);
    }
}

We are creating keys based on the scope of the rate limiting provided by Google. In this case it is either DEVELOPER which is a global scope for my workers no matter what work they are doing or ACCOUNT which depends on particular AdWords customer account the work is being done for. The value of the key is the time (in ticks) when we want to try again. When the error is first encountered (via the catch block), the keys are set up (so other workers know about the limit) and the thread sleeps for the recommended time, either trying again or throwing an error as appropriate. If a worker finds an appropriate key within the cache before doing the work, it waits before trying.

Keep in mind that some of this example is domain specific to the Google AdWords API, so you would need to tailor it to your own usage. Let me know what you think @thetinomen or on the Flapstack community Facebook page.

Menu