mathematism

Feb 16

Message Queues, Django and Celery Quick Start

Learning about, setting up and making use of a message queue can be an overwhelming experience. In this article I'll show you how to get up and running quickly with RabbitMQ, Django and Celery on a Mac.

What this article is not

Before we get started I’d like to point out the things this article does not provide:

  • A thorough explanation of the concept of message queues
  • An overview of the production architecture needed to use message queues
  • Detailed documentation on Celery

Assumptions

I’ll describe below how to install the necessary components on a Mac. I assume you’ve got some experience with the following:

For my unix and windows friends the only part of this tutorial that is (just slightly) more difficult to do is installing the RabbitMQ server. Check out the RabbitMQ server page to install from source or download unix or windows packages.

Simplification and use cases

So why use a message queue? A simplified explanation is that you need to do something in your application that may be computationally expensive but shouldn’t impact the user experience. A feature that is necessary but doesn’t require immediate execution.

At Discovery one of my responsibilities is maintaining our press site, which contains hundreds of thousands of images. Each image has four versions: high resolution, large slideshow, medium display and thumbnail. When our photo editors upload an image we don’t want to have all four versions generated on the fly because there would be a large delay, possibly a timeout, when they click save.

This is a perfect use case for a message queue. Whenever a photo editor uploads an image we want to quickly save the original image and add a task to the queue for generating the various versions needed for the site. This results in a much snappier experience for the photo editors and a dramatic improvement to their workflow.

Some other possible use cases include:

  • Perform complex math to determine the rank of all users in a system for a leaderboard
  • Re-generate static CSS files by examining when certain items in the admin have been modified
  • Generate graphs based on a large data set for display on a site’s home page every 15 minutes
  • Send blog comments through a spam filter

Now that we have some real-world examples of what a message queue might be used for, let’s install the components we’ll need to work with our message queue: RabbitMQ, Django and Celery.

Installation and setup

We’ll need the following components:

  • RabbitMQ, our message queue server (aka broker)
  • Django, our web framework of choice
  • Celery, our Django app (or standalone python app) and our task handler/message passer

A quick note before we begin: we’re going to be opening up several tabs in terminal. I’ll prefix the instructions with NEW TAB when necessary. So fire up your terminal and let’s get started.

Let’s install RabbitMQ using homebrew:

brew install rabbitmq

Launch the RabbitMQ server:

sudo rabbitmq-server

NEW TAB: Create a RabbitMQ user and vhost (substitute myusername and mypassword with whatever you like):

rabbitmqctl add_user myusername mypassword
rabbitmqctl add_vhost myvhost
rabbitmqctl set_permissions -p myvhost myusername "" ".*" ".*"

Let’s create a new virtual environment and install our dependencies:

mkvirtualenv celery-test
workon celery-test
pip install Django==1.1.1 celery

I’ve set up a sample project that you can use as a starting point. Grab a copy of the project using mercurial or git and sync the database as follows.

Using mercurial:

cd ~/Desktop/
hg clone http://bitbucket.org/richleland/celery-test-project/
cd celery-test-project
./manage.py syncdb

Using git:

cd ~/Desktop/
git clone git://github.com/richleland/celery-test-project.git
cd celery-test-project
./manage.py syncdb

Now we have a RabbitMQ server up and running with a user and vhost, a virtual environment with Django and Celery, and a sample project to work with. Before we dive in to creating tasks and watching the magic take place, let’s have a look at our settings.py file.

Django settings

Within the sample project’s settings.py there are five lines to take note of. First we add celery to INSTALLED_APPS. The other five lines are Celery settings:

BROKER_HOST = "localhost"
BROKER_PORT = 5672
BROKER_USER = "myusername"
BROKER_PASSWORD = "mypassword"
BROKER_VHOST = "myvhost"

Briefly, in the first two settings we tell Django our RabbitMQ server is running locally on port 5672. Replace the remaining settings with the username, password, and vhost you set up previously with the rabbitmqctl commands.

When you’re ready to dig deeper into Celery, the full list of configuration settings available are explained in the Celery documentation. For now let’s move on and define some tasks.

Create some tasks

The main reason for using a message queue in the first place is to be able to execute some task or set of tasks asynchronously. These could be tasks that we want to queue up when a user performs some action (e.g. saving a model instance) or tasks that run periodically. You can create these tasks with Celery using the Task and PeriodicTask classes.

For simplicity’s sake we’ll be doing something rudimentary tasks here with Celery: generating a person’s full name and determining if they are of legal drinking age in the U.S. In practice you’d be creating tasks for something much more complex or expensive.

Open the file people/tasks.py and review the following code:

from datetime import date, timedelta
from celery.task import Task, PeriodicTask
from people.models import Person

class CanDrinkTask(Task):
    """
    A task that determines if a person is 21 years of age or older.
    """
    def run(self, person_id, \*\*kwargs):
        logger = self.get_logger(\*\*kwargs)
        logger.info("Running determine_can_drink task for person %s" % person_id)
        
        person = Person.objects.get(pk=person_id)
        now = date.today()
        diff = now - person.date_of_birth
        # i know, i know, this doesn't account for leap year
        age = diff.days / 365
        if age >= 21:
            person.can_drink = True
            person.save()
        else:
            person.can_drink = False
            person.save()
        return True

class FullNameTask(PeriodicTask):
    """
    A periodic task that concatenates fields to form a person's full name.
    """
    run_every = timedelta(seconds=60)

    def run(self, **kwargs):
        logger = self.get_logger(**kwargs)
        logger.info("Running full name task.")

        for person in Person.objects.all():
            person.full_name = " ".join([person.prefix, person.first_name,
                                         person.middle_name, person.last_name,
                                         person.suffix]).strip()
            person.save()
        return True

Our tasks.py file defines two different types of tasks: CanDrinkTask, which is a task that needs to be manually added to the queue, and FullNameTask, which is a task that is added to the queue once each minute.

To understand how and when these tasks get executed, let’s launch our application and interact with it.

Fire everything up

Let’s get all of our services up and running, ready to pass messages around and perform tasks.

Start a Celery worker:

./manage.py celeryd --verbosity=2 --loglevel=DEBUG

NEW TAB: Start celerybeat to periodically send registered tasks to RabbitMQ:

cd ~/Desktop/celery-test-project
./manage.py celerybeat --verbosity=2 --loglevel=DEBUG

NEW TAB: Start Django’s development server:

cd ~/Desktop/celery-test-project
./manage.py runserver

We’re in great shape now. We’ve got a RabbitMQ server, a Celery worker, celerybeat and our Django application running. Next we’ll perform some actions and watch our message queue do its thang!

The message queue and Celery in action

This is the point where everything really came together for me. We’re going to interact with our Django application and keep an eye on those terminal tabs.

Open your web browser and go to http://localhost:8000/admin/people/person/. Add a new person with a prefix, first name, last name and suffix and click save to return to the list of people. You’ll notice that the list shows only the person’s last name, not their full name and can drink column shows false.

Keep an eye on two tabs in terminal: the celerybeat tab and the celeryd tab. You’ll see output similar to the following in the celerybeat tab:

[2010-02-14 21:51:00,899: DEBUG/MainProcess] ClockService: Waking up in 1 minute.
[2010-02-14 21:52:00,901: DEBUG/MainProcess] Scheduler: Sending due task people.tasks.FullNameTask

Now keep an eye on the celeryd tab and you’ll see output similar to the following when celeryd processes the periodic task:

[2010-02-14 21:58:01,069: INFO/MainProcess] Got task from broker: people.tasks.FullNameTask[72996c6f-b318-4e7a-82fa-ac85605cf4c2]
[2010-02-14 21:58:01,158: INFO/MainProcess] Task people.tasks.FullNameTask[72996c6f-b318-4e7a-82fa-ac85605cf4c2] processed: True

Once you see that output our user’s full name has been generated. Return to the list of people in the admin. You should now see your user’s full name instead of just their last name. Neat, huh?

How about that CanDrinkTask we created earlier? You should be on the list of people in the admin now. Click on the link that reads “Call Celery Task.” This sends a message to the queue that the CanDrinkTask task needs to be run. Take a look at your celeryd tab. Because this isn’t a production environment, the task gets handled immediately and you’ll see output similar to the following:

[2010-02-14 21:58:25,405: INFO/MainProcess] Task people.tasks.CanDrinkTask[d889b48c-ac07-40d2-9911-a2ffee19e5d3] processed: True

Return to the list of people in the admin and if your user’s age is 21 or older you’ll notice that the can drink column shows true. When you click on the “Call Celery Task” the view people.views.call_celery_delay is being executed, which calls the delay method on our CanDrinkTask, adding it to the queue:

def call_celery_delay(request, person_id):
    CanDrinkTask.delay(person_id)
    return HttpResponse("Task set to execute.")

To summarize, this is the sequence of events that occurred:

  • We added a person in the admin
  • Our periodic task, FullNameTask, executed once a minute, generated our user’s full name
  • We clicked on the “Call Celery Task” link in the admin, adding our CanDrinkTask to the queue
  • Our user’s age was calculated and the can_drink field of our model was set to True or False

Takeaways

Although these tasks were very rudimentary I hope they give you an idea of what is possible with RabbitMQ, Django and Celery. From an end-user standpoint we can create a smoother experience while our services work to process some asynchronous tasks behind the scenes.

Further reading and thanks

I’d like to thank Ask Solem for his work in creating Celery.

Below are some more resources for learning about message queues, RabbitMQ and Celery.