Jun 03

Joining the National Geographic Team

It's official. Later this month I'll become a member of the team at National Geographic.

For nearly four years I've had the honor of working with an extremely talented team at Discovery Creative. It's with mixed emotions that I announce I'll be heading downtown to join the National Geographic team on June 21.

Reflection

I can say without a doubt that my time at Discovery made me a better developer in every sense of the word. When I started at Discovery, our team consisted of myself, our creative director, and our web producer. Our size meant each of us would wear many hats from architecture and systems administration to design and development.

Over the next four years we'd produce some great work and the team would grow to a small army of skilled individuals, known within the company for quality and innovation. I wouldn't trade the lessons learned, experience gained or friendships made during my time at Discovery for anything.

What lies ahead

As a member of the National Geographic Global Media group I'll be working on the various natgeo.com properties. The current team has done an amazing job of taking the history-rich content National Geograhphic is known for and putting it online. I'm hoping to bring my background of development, planning, and design to an already skilled team.

My position at National Geographic will be a collection of firsts for me including working for a non-profit, dealing with scalability and working in a non-agency environment, to name a few. I'm looking forward to all of the challenges that lie ahead.

Moving on

I'd like to wish my current colleagues at Discovery nothing but the best. I'll miss the team here for sure. But hey, now we can have cross-company mustache-offs* and make excuses to get together and drink. To my future colleagues at National Geographic: I'm looking forward to joining the crew… see you in a few weeks!

* Due to the slow growth of my facial hair, I won't be able to help you win the mustache-off.

Mar 02

Results from The ambiguity of “web developer”

A presentation and discussion of survey results

In early January I posted an article titled The ambiguity of “web developer”[1] and asked readers to fill out a short survey to provide some industry insight. Before we jump into the actual data I'd like to point out a few things I took away from the results.

  • Web Developer, and its ambiguous nature, is the most common job title
  • More than half of the respondents felt their job title is accurate
  • A majority of respondents would not change their job title
  • It's hard to answer the question "What do you do for a living?"
  • Most respondents describe themselves as generalists and wear the hats to prove it

And now… on to the data!

What is your job title?

Web Developer is clearly the most common job title among respondents. Equally interesting is that no one held the job title of Application Developer or Interactive Developer.

Do you believe your job title accurately describes the work you perform on a daily basis?

Which tasks do you perform on a monthly basis?

Answers to this question seem to contradict the previous question. If the job title Web Developer accurately describes the work performed on a daily basis, does that mean a Web Developer is expected to know system administration, DBA, front-end coding, back-end coding, IA, SEO/SEM, and a few other things?

I'm not arguing for or against a general approach to development. I personally try to learn what will make me a better developer. If that means doing a little bit of each of these tasks, then so be it.

Would you change your job title if you could?

Not much of a shocker here. Those who feel their job title is accurate would not change it. Those who don't feel their job title is accurate would.

If you answered yes to the previous question, what would you change your job title to?

These answers may have been the actual highlight of the survey. Among the responses were Emperor, Wizard of Internet and Yoda. Wands at the ready!

Do you feel that your co-workers have a good understanding of your role?

This is very encouraging. In my opinion, a team that understands one other's roles is critical to success.

When friends and family ask you what you do for a living, do you feel like you have a clear, concise answer?

I'm not sure if this is encouraging or discouraging. It seems like we generally have a hard time explaining what we do. Perhaps the real issue is that we don't have a simple way to encapsulate all that we do in a day into some kind of sentence that means something to those outside the industry.

Does your current position provide you with a clear career ladder?

I find the percentage of individuals that replied N/A of particular interest here. It's almost assumed in our culture that everyone should have a career ladder. I knew there would be a mix of yes and no. But what about people that are perfectly happy in their role and would be happy in that role for the foreseeable future?

If you had to classify yourself as a generalist or specialist, which would you choose?

Interesting insight here, but not too surprising based on the "daily tasks" question above. In today's development culture it's assumed that you're going to be able to wear many hats. It seems most respondents already classify themselves that way.

Closing thoughts

Thank you to everyone that took the time to fill out the survey. The results provided some insight for me into how individuals view themselves and their roles within the industry. I'm still trying to digest what this means for the actual ambiguity of the title of Web Developer. I find myself at odds because it seems like most individuals are happy with Web Developer as their title, but there is clearly confusion within and outside the community. I'd love to hear your thoughts. Please discuss.

Feb 26

django-district February Recap

Notes from our February meetup, where we discussed PyCon.

We had some excellent discussions this month on everything from version control to parabolic reflectors, most of which stemmed from some divergence of a PyCon session topic. Following is a recap of the gathering.

Upcoming Events

DSF Tip Jar

This month I introduced the Django Software Foundation Tip Jar to collect donations to the DSF. Every three months I'll send the DSF a check for the amount donated. Look for it at the March meetup!

PyCon discussion

PyCon was an epic event for all Pythonkind. There were nearly 1,200 attendees—an all-time record for PyCon. Between the incredible talks, open space sessions and poster sessions it was hard to decide what to do in Atlanta. Luckily the A/V crew for PyCon did an amazing job and has already posted video of all talks.

Of the talks that I attended, I would recommend watching the following:

  • The State of Packaging
  • Dude Where's My Database?
  • Understanding the Python GIL
  • The Python and the Elephant: Large Scale Natural Language Processing with NLTK and Dumbo
  • Why not run all your tests all the time? A study of continuous integration systems.

Based on what others mentioned at the meeting, it looks these sessions should be seen as well:

  • The Mighty Dictionary
  • Using Python to Create Robotic Simulations for Planetary Exploration
  • Scaling your Python application on EC2
  • Demystifying Non-Blocking and Asynchronous I/O
  • Small acts make great revolutions: crafting Python and Open Source communities in Rio de Janeiro
  • Python Metaprogramming
  • New and Improved: Coming changes to unittest, the standard library test framework

Be sure to watch the lightning talks too. They were chock full of some great content. If you didn't go this year go to PyCon 2011, which will be in the same hotel in Atlanta.

Lots of tangents

As is the usual case with django-district meetings, some great tangents took place. There are too many to discuss each so I thought I'd just provide a list of the tabs I happened to still have open in Chrome when I got home:

What's next

Our next meeting will take place Thursday, March 25. Steve Holden, Python Software Foundation Chairman, will be visiting us to talk about the PSF and Chris Adams, NASA Developer, will talk testing. I'm also looking into holding various hack nights and tutorials in the months to come.

Thanks to everyone who managed to make it out. You're what makes the meetings a success!

Feb 16

Message Queues, Django and Celery Quick Start

Learning about, setting up and making use of a message queue can be an overwhelming experience. In this article I'll show you how to get up and running quickly with RabbitMQ, Django and Celery on a Mac.

What this article is not

Before we get started I'd like to point out the things this article does not provide:

  • A thorough explanation of the concept of message queues
  • An overview of the production architecture needed to use message queues
  • Detailed documentation on Celery

Assumptions

I'll describe below how to install the necessary components on a Mac. I assume you've got some experience with the following:

For my unix and windows friends the only part of this tutorial that is (just slightly) more difficult to do is installing the RabbitMQ server. Check out the RabbitMQ server page to install from source or download unix or windows packages.

Simplification and use cases

So why use a message queue? A simplified explanation is that you need to do something in your application that may be computationally expensive but shouldn't impact the user experience. A feature that is necessary but doesn't require immediate execution.

At Discovery one of my responsibilities is maintaining our press site, which contains hundreds of thousands of images. Each image has four versions: high resolution, large slideshow, medium display and thumbnail. When our photo editors upload an image we don't want to have all four versions generated on the fly because there would be a large delay, possibly a timeout, when they click save.

This is a perfect use case for a message queue. Whenever a photo editor uploads an image we want to quickly save the original image and add a task to the queue for generating the various versions needed for the site. This results in a much snappier experience for the photo editors and a dramatic improvement to their workflow.

Some other possible use cases include:

  • Perform complex math to determine the rank of all users in a system for a leaderboard
  • Re-generate static CSS files by examining when certain items in the admin have been modified
  • Generate graphs based on a large data set for display on a site's home page every 15 minutes
  • Send blog comments through a spam filter

Now that we have some real-world examples of what a message queue might be used for, let's install the components we'll need to work with our message queue: RabbitMQ, Django and Celery.

Installation and setup

We'll need the following components:

  • RabbitMQ, our message queue server (aka broker)
  • Django, our web framework of choice
  • Celery, our Django app (or standalone python app) and our task handler/message passer

A quick note before we begin: we're going to be opening up several tabs in terminal. I'll prefix the instructions with NEW TAB when necessary. So fire up your terminal and let's get started.

Let's install RabbitMQ using homebrew:

brew install rabbitmq

Launch the RabbitMQ server:

sudo rabbitmq-server

NEW TAB: Create a RabbitMQ user and vhost (substitute myusername and mypassword with whatever you like):

rabbitmqctl add_user myusername mypassword
rabbitmqctl add_vhost myvhost
rabbitmqctl set_permissions -p myvhost myusername "" ".*" ".*"

Let's create a new virtual environment and install our dependencies:

mkvirtualenv celery-test
workon celery-test
pip install Django==1.1.1 celery

I've set up a sample project that you can use as a starting point. Grab a copy of the project using mercurial or git and sync the database as follows.

Using mercurial:

cd ~/Desktop/
hg clone http://bitbucket.org/richleland/celery-test-project/
cd celery-test-project
./manage.py syncdb

Using git:

cd ~/Desktop/
git clone git://github.com/richleland/celery-test-project.git
cd celery-test-project
./manage.py syncdb

Now we have a RabbitMQ server up and running with a user and vhost, a virtual environment with Django and Celery, and a sample project to work with. Before we dive in to creating tasks and watching the magic take place, let's have a look at our settings.py file.

Django settings

Within the sample project's settings.py there are five lines to take note of. First we add celery to INSTALLED_APPS. The other five lines are Celery settings:

BROKER_HOST = "localhost"
BROKER_PORT = 5672
BROKER_USER = "myusername"
BROKER_PASSWORD = "mypassword"
BROKER_VHOST = "myvhost"

Briefly, in the first two settings we tell Django our RabbitMQ server is running locally on port 5672. Replace the remaining settings with the username, password, and vhost you set up previously with the rabbitmqctl commands.

When you're ready to dig deeper into Celery, the full list of configuration settings available are explained in the Celery documentation. For now let's move on and define some tasks.

Create some tasks

The main reason for using a message queue in the first place is to be able to execute some task or set of tasks asynchronously. These could be tasks that we want to queue up when a user performs some action (e.g. saving a model instance) or tasks that run periodically. You can create these tasks with Celery using the Task and PeriodicTask classes.

For simplicity's sake we'll be doing something rudimentary tasks here with Celery: generating a person's full name and determining if they are of legal drinking age in the U.S. In practice you'd be creating tasks for something much more complex or expensive.

Open the file people/tasks.py and review the following code:

from datetime import date, timedelta
from celery.task import Task, PeriodicTask
from people.models import Person

class CanDrinkTask(Task):
    """
    A task that determines if a person is 21 years of age or older.
    """
    def run(self, person_id, \*\*kwargs):
        logger = self.get_logger(\*\*kwargs)
        logger.info("Running determine_can_drink task for person %s" % person_id)

        person = Person.objects.get(pk=person_id)
        now = date.today()
        diff = now - person.date_of_birth
        # i know, i know, this doesn't account for leap year
        age = diff.days / 365
        if age >= 21:
            person.can_drink = True
            person.save()
        else:
            person.can_drink = False
            person.save()
        return True

class FullNameTask(PeriodicTask):
    """
    A periodic task that concatenates fields to form a person's full name.
    """
    run_every = timedelta(seconds=60)

    def run(self, **kwargs):
        logger = self.get_logger(**kwargs)
        logger.info("Running full name task.")

        for person in Person.objects.all():
            person.full_name = " ".join([person.prefix, person.first_name,
                                         person.middle_name, person.last_name,
                                         person.suffix]).strip()
            person.save()
        return True

Our tasks.py file defines two different types of tasks: CanDrinkTask, which is a task that needs to be manually added to the queue, and FullNameTask, which is a task that is added to the queue once each minute.

To understand how and when these tasks get executed, let's launch our application and interact with it.

Fire everything up

Let's get all of our services up and running, ready to pass messages around and perform tasks.

Start a Celery worker:

./manage.py celeryd --verbosity=2 --loglevel=DEBUG

NEW TAB: Start celerybeat to periodically send registered tasks to RabbitMQ:

cd ~/Desktop/celery-test-project
./manage.py celerybeat --verbosity=2 --loglevel=DEBUG

NEW TAB: Start Django's development server:

cd ~/Desktop/celery-test-project
./manage.py runserver

We're in great shape now. We've got a RabbitMQ server, a Celery worker, celerybeat and our Django application running. Next we'll perform some actions and watch our message queue do its thang!

The message queue and Celery in action

This is the point where everything really came together for me. We're going to interact with our Django application and keep an eye on those terminal tabs.

Open your web browser and go to http://localhost:8000/admin/people/person/. Add a new person with a prefix, first name, last name and suffix and click save to return to the list of people. You'll notice that the list shows only the person's last name, not their full name and can drink column shows false.

Keep an eye on two tabs in terminal: the celerybeat tab and the celeryd tab. You'll see output similar to the following in the celerybeat tab:

[2010-02-14 21:51:00,899: DEBUG/MainProcess] ClockService: Waking up in 1 minute.
[2010-02-14 21:52:00,901: DEBUG/MainProcess] Scheduler: Sending due task people.tasks.FullNameTask

Now keep an eye on the celeryd tab and you'll see output similar to the following when celeryd processes the periodic task:

[2010-02-14 21:58:01,069: INFO/MainProcess] Got task from broker: people.tasks.FullNameTask[72996c6f-b318-4e7a-82fa-ac85605cf4c2]
[2010-02-14 21:58:01,158: INFO/MainProcess] Task people.tasks.FullNameTask[72996c6f-b318-4e7a-82fa-ac85605cf4c2] processed: True

Once you see that output our user's full name has been generated. Return to the list of people in the admin. You should now see your user's full name instead of just their last name. Neat, huh?

How about that CanDrinkTask we created earlier? You should be on the list of people in the admin now. Click on the link that reads "Call Celery Task." This sends a message to the queue that the CanDrinkTask task needs to be run. Take a look at your celeryd tab. Because this isn't a production environment, the task gets handled immediately and you'll see output similar to the following:

[2010-02-14 21:58:25,405: INFO/MainProcess] Task people.tasks.CanDrinkTask[d889b48c-ac07-40d2-9911-a2ffee19e5d3] processed: True

Return to the list of people in the admin and if your user's age is 21 or older you'll notice that the can drink column shows true. When you click on the "Call Celery Task" the view people.views.call_celery_delay is being executed, which calls the delay method on our CanDrinkTask, adding it to the queue:

def call_celery_delay(request, person_id):
    CanDrinkTask.delay(person_id)
    return HttpResponse("Task set to execute.")

To summarize, this is the sequence of events that occurred:

  • We added a person in the admin
  • Our periodic task, FullNameTask, executed once a minute, generated our user's full name
  • We clicked on the "Call Celery Task" link in the admin, adding our CanDrinkTask to the queue
  • Our user's age was calculated and the can_drink field of our model was set to True or False

Takeaways

Although these tasks were very rudimentary I hope they give you an idea of what is possible with RabbitMQ, Django and Celery. From an end-user standpoint we can create a smoother experience while our services work to process some asynchronous tasks behind the scenes.

Further reading and thanks

I'd like to thank Ask Solem for his work in creating Celery.

Below are some more resources for learning about message queues, RabbitMQ and Celery.

Feb 15

Lessons Learned from django-cumulus

While building this reusable application for Django I gained some valuable experience. Below is a history of the short life of django-cumulus and insight into what I learned while preparing it for release.

History

In April of 2009 I needed a custom file storage backend for Rackspace's Cloud Files within Django. I used the Googlewebs and couldn't find anything so I decided to build my own, which Django makes pretty darn easy. Within a few hours I had a working application and released version 0.1 of the code.

Over the next week I received some messages about possibly integrating my custom storage into django-storages, an application that pulls together multiple custom storage backends including Amazon S3 and MogileFS among others. I thought about it and it made sense at the time to add my custom storage. In May of 2009 I added the Cloud Files storage backend to django-storages.

Over the next few months a few forks of django-storages popped up and I pulled in some solid tweaks to the applicaiton. I had a pretty busy second half of 2009 at work and admittedly neglected maintenance of my part of django-storages. When 2010 rolled around I decided that one of my resolutions would be to create, contribute to, and maintain several open-source applications. Among them was django-storages.

With my newfound focus I started to dive into the open tickets on django-storages. There were (and remain) quite a few open tickets that posed pretty big issues for users (including a broken setup.py). Over the next week or so I began to start tackling some of these tickets and it quickly became aware that things were at a standstill.

There seemed to be no interaction between the application and its users. I took on and updated several tickets, aiming to prepare django-storages for a 1.1 release. Not one of the ticket creators responded to me. At that point I made a few observations about django-storages:

  1. The interaction was weak between the community and the application authors
  2. There were, in my opinion, too many things going on with django-storages; it seemed to violate Django's reusable app mantra of "do one thing and do it well"
  3. The application was missing a proper setup.py file, documentation and tests

At this point I decided to focus my efforts on pulling my custom storage out of django-storages and back into my own application, django-cumulus. I took some time in late January and early February to get up to speed on the latest Python packaging practices, documentation and testing. In early February I released django-cumulus 0.2, complete with documentation, tests, and a proper setup.py.

Packaging

I'll admit that before django-cumulus I considered myself borderline idiot when it came to packaging. I didn't understand why everyone had different setup.py files and felt like there was no good resource for learning how to properly build a setup.py file. Then I took a look at Distribute because that's what the cool kids told me to do.

Distribute is very well documented and provided me with the first setup.py overview that actually helped me better understand packaging. I used Distribute to build my setup.py for django-cumulus and it worked like a charm. It's what I'll be using for any of my Django apps moving forward. I'll also be updating all of my currently released Django apps to use it as well.

Documentation

I knew of Sphinx and had used it a few times in the past but never really took the time to dig in. This was a good opportunity for me to get all buddy-buddy with properly documenting my work.

Sphinx is insanely easy to learn and use. The documentation is solid. I went through the tutorial once and had a good starting point. From there on I just referred back to their docs for detailed information.

Another tool I'd like to mention is Jannis Leidel's sphinx-pypi-upload. This allows you to easily upload your package's documentation to packages.python.org using the following commands:

python setup.py build_sphinx
python setup.py upload_sphinx

The documentation for django-cumulus was created using Sphinx and uploaded using sphinx-pypi-upload to http://packages.python.org/django-cumulus/.

Thumbnailing

One last item I had to look into was the ability to use popular Django thumbnail applications alongside django-cumulus. The two main thumbnail libraries I focused on were sorl thumbnail and django-imagekit.

First I looked at sorl. Let me preface what I'm about to say by this: I've used sorl in a TON of projects. It's great and does a fantastic job for local thumbnail creation. When it comes to using custom file storage sorl is a mess. First of all there are several repos: subversion (official), git, and mercurial. Then there are the forks. It's like a never-ending spiral of discussions, tickets and code to try and find if there is a solution or someone is working on one. It's also hard to tell if anything is progressing with sorl because commits to the various versions are as old as 11 months.

After hours of searching, reading, and fiddling with various forks of sorl, I decided to postpone support of sorl because it means I'll have to dig into modifying sorl itself.

django-imagekit was another story. It's actively maintained by the author and had already implemented the use of custom storage! So I fired up a test project using django-imagekit and it turned out I had to make some tweaks to django-cumulus to get it to play nice with the temporary files that django-imagekit uses. Those tweaks were incorporated in django-cumulus 0.2.2.

What's next

Building django-cumulus has been a great experience thus far. I learned how important active maintenance, documentation and packaging are to the success of your application. I'm looking forward to maintaining a high level of quality for my releases moving forward by including documentation, tests, and proper packaging.

I'd love to hear of any features you think would be useful in django-cumulus. Feel free to create a ticket or leave a comment below.