In nearly any web application above a certain size, eventually you will need to perform background jobs. At iGoDigital, that point came for us when we realized that some of our new user event tracking features needed about a second to run - far too long to run in the foreground, but small and urgent enough that we didn't want to wait and run them in a batch overnight. We already use Twitter's starling/workling in another app, so it made sense to start there. Unfortunately, the traffic level quickly became unwieldy, and starling's reaction to being pushed so far was unpleasant to say the least. After some research, we decided to use resque from github, but to adapt it to use mongodb instead of redis. The advantage here is that we already have a significant investment in mongodb, so we would not be introducing a new type of server to our infrastructure. Mongodb also has some features that redis does not, and we used those to build some interesting new features into resque.
Enough talk, here's some code:
First, you'll need to change your class to tell resque what queue to use. You can use multiple classes in the same queue, or one per queue. Github, for example, suggests using queues based on priority, like [critical, archive, high, low]. We tend to use a queue per class, because we have a small number of queue classes, and some of the advanced features we implemented rely on a bit more isolation. Any consistent strategy will work.
In addition, you'll need to implement self.perform to do your actual work. You can easily make this a find-and-call, or do the actual work in this method. Using our above example, it will look something like this:
You should see the call to Enqueue start putting things into the resque database in mongo, in this case, in the resque.simulation collection:
{ "_id" : ObjectId("4d5165a29426b56825000001"),
"class" : "Simulation",
"args" : [ { "id" : "1" } ],
"resque_enqueue_timestamp" : "Tue Feb 08 2011 10:47:46 GMT-0500 (EST)" }
To process the job, run "QUEUE=simulation rake environment resque:work" at the command line all of the advice on running workers in resque also applies to resque-igo, and I strongly encourage reading through defunkt's writeup. We use god to manage our workers, but there are many strategies that work.
A couple of things to notice about the structure here: first, args is an array, not just a single document. We strongly encourage using a single options hash as your only argument (and all of our implementations follow this convention), but you can pass arbitrary arguments. If you want to take advantage of some of the unique features, you will need to pass a hash as your first argument, as some specific keys need to be set.
Also, mongodb does not know or care about ruby symbols, so your keys will be strings when they come back. If you've written code against the ruby driver before, you are already used to managing this. If you're using MongoMapper or Mongoid (or another ruby-mongo orm), be wary!
I keep mentioning advanced features, and there are only two that are important: unique jobs and scheduled jobs.
To create a unique job, set @unique_jobs = true in your class, and set a _id key in your options hash. Any call to enqueue will update the pending job's arguments, rather than creating a new task. Your _id key will need to use mongodb's _id rules, but as long as it's unique, it should work just fine.
To create a schedule job, set @delayed_jobs = true in your class, and call Resque.enable_delay(:simulation) at some point - we call it at startup, in the initializers. Resque-igo will remember which queues are delayed, so you really should only need to call this once, ever. To enqueue a scheduled job, set :delay_until in your options hash. Using unique jobs and scheduled jobs together allows you do to fancy things like schedule a session expiration task without using sessions in your app.
So, our simulation class, modified to only enqueue one job per simulation, and to run them 30 minutes after enqueuing, looks like this:
That's just about it. There are some limitations to resque-igo of which you should be aware. Because of mongodb's locking implementation, it may have difficulty scaling to very high loads, and will increase the lock time of your database a little more than you might think. If you're running a few hundred jobs a minute, or even a couple thousand, this shouldn't be a problem. As that rate rises, you will want to move your queues to a different database or use a different mechanism altogether. We have discussed this issue with 10gen, and the collection-level locking that should be coming with mongodb 2.0, combined with the currently unused queue-splitting feature in resque-igo, should help this problem. Because mongodb collection.drop() is very fast, it should also be possible to change the underlying implementation to dramatically improve performance by enqueuing to a new queue every hour or so, doing an update instead of a remove on dequeue, and then dropping completed queues after they are done processing - this was 10gen's recommendation.
You can install the gem as resque-igo, or get the source code from github.com/iGoDigital-LLC/resque-mongo. If you have any questions or are using resque-igo in your application, drop me a line on github at github.com/mediocretes. Enjoy!
Enough talk, here's some code:
Let's say you have a model somewhere that does something that takes entirely too long to do at user request time, and it's time to start back grounding. I'll assume you already know how to make mongodb available to your application, and if not, please check out the mongo-ruby-driver on github for specifics.class Simulation < ActiveRecord::Basedef simulatesleep 5self.results = rand 100 + 1return saveendend
First, you'll need to change your class to tell resque what queue to use. You can use multiple classes in the same queue, or one per queue. Github, for example, suggests using queues based on priority, like [critical, archive, high, low]. We tend to use a queue per class, because we have a small number of queue classes, and some of the advanced features we implemented rely on a bit more isolation. Any consistent strategy will work.
In addition, you'll need to implement self.perform to do your actual work. You can easily make this a find-and-call, or do the actual work in this method. Using our above example, it will look something like this:
Or, if you prefer, you can do the work in an instance method:class Simulation < ActiveRecord::Base@queue = :simulationdef simulateResque.enqueue(Simulation, {:id => self.id})enddef self.perform(options)sleep 5sim = Simulation.find options['id']sim.results = rand 100 + 1sim.saveendend
def self.perform(options)sim = Simulation.find options['id']sim.actually_simulateenddef actually_simulatesleep 5self.results = rand 100 + 1saveend
You should see the call to Enqueue start putting things into the resque database in mongo, in this case, in the resque.simulation collection:
{ "_id" : ObjectId("4d5165a29426b56825000001"),
"class" : "Simulation",
"args" : [ { "id" : "1" } ],
"resque_enqueue_timestamp" : "Tue Feb 08 2011 10:47:46 GMT-0500 (EST)" }
To process the job, run "QUEUE=simulation rake environment resque:work" at the command line all of the advice on running workers in resque also applies to resque-igo, and I strongly encourage reading through defunkt's writeup. We use god to manage our workers, but there are many strategies that work.
A couple of things to notice about the structure here: first, args is an array, not just a single document. We strongly encourage using a single options hash as your only argument (and all of our implementations follow this convention), but you can pass arbitrary arguments. If you want to take advantage of some of the unique features, you will need to pass a hash as your first argument, as some specific keys need to be set.
Also, mongodb does not know or care about ruby symbols, so your keys will be strings when they come back. If you've written code against the ruby driver before, you are already used to managing this. If you're using MongoMapper or Mongoid (or another ruby-mongo orm), be wary!
I keep mentioning advanced features, and there are only two that are important: unique jobs and scheduled jobs.
To create a unique job, set @unique_jobs = true in your class, and set a _id key in your options hash. Any call to enqueue will update the pending job's arguments, rather than creating a new task. Your _id key will need to use mongodb's _id rules, but as long as it's unique, it should work just fine.
To create a schedule job, set @delayed_jobs = true in your class, and call Resque.enable_delay(:simulation) at some point - we call it at startup, in the initializers. Resque-igo will remember which queues are delayed, so you really should only need to call this once, ever. To enqueue a scheduled job, set :delay_until in your options hash. Using unique jobs and scheduled jobs together allows you do to fancy things like schedule a session expiration task without using sessions in your app.
So, our simulation class, modified to only enqueue one job per simulation, and to run them 30 minutes after enqueuing, looks like this:
class Simulation < ActiveRecord::Base@queue = :simulation@unique_jobs = true@delayed_jobs = truedef simulateResque.enable_delay(:simulation) #call this elsewhere in a production appResque.enqueue(Simulation, {:_id => self.id, :delay_until => Time.now + 1800})enddef self.perform(options)sim = Simulation.find options['_id']sim.actually_simulateenddef actually_simulatesleep 5self.results = rand 100 + 1saveendend
That's just about it. There are some limitations to resque-igo of which you should be aware. Because of mongodb's locking implementation, it may have difficulty scaling to very high loads, and will increase the lock time of your database a little more than you might think. If you're running a few hundred jobs a minute, or even a couple thousand, this shouldn't be a problem. As that rate rises, you will want to move your queues to a different database or use a different mechanism altogether. We have discussed this issue with 10gen, and the collection-level locking that should be coming with mongodb 2.0, combined with the currently unused queue-splitting feature in resque-igo, should help this problem. Because mongodb collection.drop() is very fast, it should also be possible to change the underlying implementation to dramatically improve performance by enqueuing to a new queue every hour or so, doing an update instead of a remove on dequeue, and then dropping completed queues after they are done processing - this was 10gen's recommendation.
You can install the gem as resque-igo, or get the source code from github.com/iGoDigital-LLC/resque-mongo. If you have any questions or are using resque-igo in your application, drop me a line on github at github.com/mediocretes. Enjoy!
Tagged in: TechnologyDev
Comments for Simple Ruby Workers with Mongodb and Resque