A Rate-limited Sidekiq Job - Part 2

In the last post, I talked about an initial approach to a rate limited sidekiq job. Sadly, it didn’t scale for us; thousands of jobs were being tried and retried, while all of them exited early as the rate limit had been exceeded.

The implementation we have settled on is slightly different.

Instead of raising an exception when the rate limit had been met (forcing the job back into the queue, only for the next to be tried immediately), we wait.

Our original ratelimited() method changes from

def ratelimited(&:block)
  raise "Ratelimit met" if ratelimit.exceeded?(RL_SUBJECT, interval: RL_INTERVAL, threshold: RL_THRESHOLD)
  ratelimit.add(RL_SUBJECT)
  block.call
end

def ratelimited(&:block)
  ratelimit.exec_within_threshold(RL_SUBJECT, interval: RL_INTERVAL, threshold: RL_THRESHOLD) do
    ratelimit.add(RL_SUBJECT)
    block.call
  end
end

Here, the Ratelimit gem provides a method which takes a block and executes it if the limit has not been met. Otherwise, it calls sleep() for the remaining duration of the RL_INTERVAL.

Sleeping until the rate limit is no-longer exceeded prevents Sidekiq from thrashing through jobs, but has a drawback. If you have other jobs in the queue, these will be blocked too.

We solve this by running with different Sidekiq queues, one for the rate-limited task (in our case, Strava user sync), another for other tasks. However, we still have a problem, as the strava queue will block the Sidekiq process. To solve this, we specify Sidekiq will run multiple processes, each with a different set of queues.

As we deploy with Capistrano, the sidekiq processes are configured using the capistrano-sidekiq gem.

In ‘‘‘config/deploy/production.rb’’’

set :sidekiq_processes, 2
set :sidekiq_options_per_process, [
  "--concurrency 1 --queue strava",
  "--concurrency 10 --queue default --queue some_other_queue"
]

Above, we see capistrano-sidekiq properites being set to ensure two separate sidekiq processes are running. Each will run a different queue, and we ask the strava running process to only bother with one thread. More would be fine, but unless the jobs take a long time, one thread can probably saturate the strava API rate limit.