import this – a blog about python & more

import this – a blog about python & more

Django/Celery – From Development to Production

daemonization using systemd

Django/Celery – From Development to Production

Subscribe to my newsletter and never miss my upcoming articles

When building web applications with Python, you will most likely find yourself using Celery to handle tasks outside the request-response cycle. In my case, I was building a website which had a "quote of the day" section. I needed a way to automatically change the quote everyday at midnight. It wasn't very hard for me to get up and running with Celery during development, but I had a bit of a tough time to get it to work correctly in production. I decided to document this for future reference, in the hopes that this will not only be helpful to me, but to anyone else out there who may be trying to figure out how to use Celery in production.

So, without further ado, let's dive in. I created a simple Django project called celery_project to accompany this post. The source code is available at github.com/engineervix/django-celery-sample.

Initial project setup

My Django project setup is heavily influenced by Pydanny's cookiecutter-django. The initial project setup is as shown below (see #975b982 on Github):

./django-celery-sample
├── celery_project
│   ├── conftest.py
│   ├── __init__.py
│   ├── static
│   ├── tests
│   │   ├── factories.py
│   │   └── __init__.py
│   └── users
│       ├── admin.py
│       ├── apps.py
│       ├── forms.py
│       ├── __init__.py
│       ├── managers.py
│       ├── migrations
│       │   └── __init__.py
│       ├── models.py
│       ├── tests.py
│       ├── urls.py
│       └── views.py
├── CHANGELOG.md
├── .circleci
│   └── config.yml
├── config
│   ├── celery_app.py
│   ├── __init__.py
│   ├── settings
│   │   ├── base.py
│   │   ├── development.py
│   │   ├── production.py
│   │   └── test.py
│   ├── urls.py
│   ├── wsgi_production.py
│   └── wsgi.py
├── .coveragerc
├── .editorconfig
├── .envs.sample
│   ├── .dev.env.sample
│   ├── .prod.env.sample
│   └── .test.env.sample
├── .flake8
├── .git
├── .gitattributes
├── .github
│   └── dependabot.yml
├── .gitignore
├── gulpfile.js
├── LICENSE
├── manage.py
├── package.json
├── .prettierignore
├── pyproject.toml
├── pytest.ini
├── README.md
├── requirements.in
├── requirements.txt
├── .stylelintignore
├── .stylelintrc.json
└── yarn.lock

If you would like to take a peek at the initial project structure, you can clone the repo and checkout commit 975b982

git clone https://github.com/engineervix/django-celery-sample.git
cd django-celery-sample
git checkout 975b982

I will not go into further details of setting up the project, as this is well covered in the README.md file. It is, however, worth mentioning that,

  • we are using a PostgreSQL database
  • we are using a custom Django User model, as recommended in the Django docs. My custom user model is based on Michael Herman's excellent write-up
  • we have split our settings into multiple files, which should be self explanatory: base.py, development.py, test.py and production.py.
  • we have multiple .env files corresponding to the three environments described above (development, testing and production).
  • we are using Node.js and have wrapped some commands into Yarn scripts. Gulp has been set up as the Javascript task runner.

At this point, you will observe that we have already configured the project to work with Celery. There's a celery_app.py file in the config module, and config/__init__.py has some celery configuration. In addition, we have a bunch of Celery settings in the config/settings/base.py file, including a Celery beat entry, which hitherto hasn't yet been defined anywhere in the project. The last line of config/settings/development.py has one Celery setting: CELERY_TASK_EAGER_PROPAGATES = True.

The daily_quote app

I created a Django app called daily_quote, which provides the main functionality of our simple Django project. The directory structure of the app is as follows:

./daily_quote
├── admin.py
├── apps.py
├── __init__.py
├── migrations
│   ├── 0001_initial.py
│   └── __init__.py
├── models.py
├── quotes.json
├── quotes.py
├── tasks.py
├── templates
│   └── daily_quote
│       └── home.html
├── tests.py
└── views.py

Okay, so this is a simple app that mimicks fetching a quote from some API. I know there are several quote APIs out there (most of them are not completely free), but because I had specific quote requirements and didn't have the money to pay for a subscription, I quickly put together a bunch of quotes in a JSON file (a temporary measure until I settle for a good quote API such as They Said So®), and defined a celery task to grab one quote everyday at midnght and save it to the database. This quote is displayed on the homepage. Once again, you can take a peek at the code either on Github (see #997bc88 or on your machine:

# skip the cloning part and `cd`ing into the cloned directory if you have already done so
git clone https://github.com/engineervix/django-celery-sample.git
cd django-celery-sample
git checkout 997bc88

At this point, we are ready to run our app, including the celery task task_fetch_daily_quote. Once again, see the README.md file for more details. I have observed that many tutorials online tend to suggest opening two (or three) terminal windows – one for your ./manage.py runserver, the other for celery and the other for celery beat. I find this rather too cumbersome, so I use npm scripts in my package.json file, powered by concurrently, which, as the name suggests, allows one to run multiple commands concurrently. So, I only run one command

yarn dev:celery

which does so many things concurrently, in ONE terminal window:

  • export ENV_PATH=.envs/.dev.env – tell Django that this is the .env file we're using
  • ./manage.py runserver_plus – run the Django dev server (you've gotta love django-extensions!)
  • gulp – run browser-sync and automatically compile SCSS files, transpile ES6, minify CSS / JS files and reload the browser whenever there are changes.
  • maildev -o – a simple way to test your project's generated emails during development with an easy to use web interface. Pydanny's cookiecutter-django uses mailhog, which requires one to download the binary and place it somewhere probably in your PATH. I prefer maildev because it leverages Node.js and I don't have to download additional binaries.
  • celery -A config worker -l info -E -B – run your celery worker and celery beat in one command. In production, it is recommended to split the two.

Running yarn dev will do all the above except the celery part.

Okay, so if we run yarn dev:celery we should see something like this:

screenshot

We can run tests via yarn test, and the tests should pass. At the time of publishing this post, test coverage stood at 93%.

Enough with development, time to go into production

Well, at this point, we're ready to deploy our project. With so many deployment options available, the question is, "which option works best for me"? I typically use the "traditional" approach – a GNU/Linux server in "the cloud". This entails

  • setting up a GNU/Linux server (Digital Ocean, Linode, AWS EC2, etc.). I prefer using the latest Ubuntu LTS version.
  • Securing your server
  • Installing and configuring Nginx, PostgreSQL, uWSGI/Gunicorn, Redis, Certbot and other dependencies

I am exploring other deployment options, and will consider documenting my journey in a future post

I will not cover setting up a GNU/Linux server for deployment in this post. Because setting up servers can be a tedious task, it is better to automate this process, and I found Jason Hee's setup script to be awesome. It automates the setup and provisioning of Ubuntu servers. I forked it to add my own customizations and additional packages.

I mostly use uWSGI as the application server for my Django/flask projects, and systemd to manage not only the uWSGI server but also other services such as redis. It therefore seemed fitting to maintain systemd as the means of daemonizing celery in production. The official celery docs give different ways of daemonizing celery:

While the docs provide good examples to get one started with systemd, I struggled to turn these examples into something that actually worked. I had a situation where celery and celery beat "worked on my machine", but didn't work in production! So, I searched the web, and came across this excellent article on the Will & Skill website. The article gives a practical guide on how to set up Celery in your Django project and daemonize it on Ubuntu (It doesn't cover celery beat though). Even after following the article, I still didn't quite get things right the first time. The problems I encountered had more to do with UNIX permissions and roles, as well as the intricacies of systemd and celery. In my troubleshooting, I found the following resources helpful:

  • This and this is how I fixed the celery beat production configuration, which gave me a hard time to get right.
  • Here's some useful information on systemd's Restart= option under the [Service] stanza. Also see this post for reference.

Okay, enough with the talking, show me the code already! Alrighty then, here is my setup ...

As described in the Will & Skill article, (1) create a new user named celery, then (2) create the necessary pid and log folders and set the right permissions. For the latter, I tried the systemd-tmpfiles approach, but it didn't seem to have worked, so I did it manually. I also had to run sudo mkhomedir_helper celery after the useradd command, because I realised that a home directory was not created for the celery user.

sudo useradd celery -d /home/celery -b /bin/bash
sudo mkhomedir_helper celery

sudo mkdir /var/log/celery
sudo chown -R celery:celery /var/log/celery
sudo chmod -R 755 /var/log/celery

sudo mkdir /var/run/celery
sudo chown -R celery:celery /var/run/celery
sudo chmod -R 755 /var/run/celery

Next, we create a file /etc/conf.d/celery-project with the following content (take note of the comments):

# /etc/conf.d/celery-project

# See
# http://docs.celeryproject.org/en/latest/userguide/daemonizing.html#usage-systemd
# and https://www.willandskill.se/en/celery-4-with-django-on-ubuntu-18-04/

# App instance to use
# comment out this line if you don't use an app
CELERY_APP="config"

# Name of nodes to start
# here we have a single node
CELERYD_NODES="celeryproject"
# or we could have three nodes:
#CELERYD_NODES="celeryproject1 celeryproject2 celeryproject3"

# Extra command-line arguments to the worker
CELERYD_OPTS="--time-limit=300 --concurrency=8"

# Absolute or relative path to the 'celery' command:
# I'm using virtualenvwrapper, and celery is installed in the 'celery_project' virtual environment
CELERY_BIN="/home/username/Env/celery_project/bin/celery"

# How to call manage.py
# CELERYD_MULTI="multi"

# - %n will be replaced with the first part of the nodename.
# - %I will be replaced with the current child process index
#   and is important when using the prefork pool to avoid race conditions.
CELERYD_PID_FILE="/var/run/celery/%n.pid"
CELERYD_LOG_FILE="/var/log/celery/%n%I.log"
CELERYD_LOG_LEVEL="INFO"

# The below lines should be uncommented if using the celerybeat-project.service
# unit file, but are unnecessary otherwise

CELERYBEAT_PID_FILE="/var/run/celery/celeryproject_beat.pid"
CELERYBEAT_LOG_FILE="/var/log/celery/celeryproject_beat.log"

Next, we create /etc/systemd/system/celery-project.service with the following content:

[Unit]
Description=Celery Service for celeryproject.example.com
After=network.target

[Service]
Type=forking
User=celery
Group=celery
Environment="ENV_PATH=.envs/.prod.env"
EnvironmentFile=/etc/conf.d/celery-project
WorkingDirectory=/path/to/your/django-project
ExecStart=/bin/sh -c '${CELERY_BIN} multi start ${CELERYD_NODES} \
  -A ${CELERY_APP} --pidfile=${CELERYD_PID_FILE} \
  --logfile=${CELERYD_LOG_FILE} --loglevel=${CELERYD_LOG_LEVEL} ${CELERYD_OPTS}'
ExecStop=/bin/sh -c '${CELERY_BIN} multi stopwait ${CELERYD_NODES} \
  --pidfile=${CELERYD_PID_FILE}'
ExecReload=/bin/sh -c '${CELERY_BIN} multi restart ${CELERYD_NODES} \
  -A ${CELERY_APP} --pidfile=${CELERYD_PID_FILE} \
  --logfile=${CELERYD_LOG_FILE} --loglevel=${CELERYD_LOG_LEVEL} ${CELERYD_OPTS}'
Restart=always

[Install]
WantedBy=multi-user.target

We then create /etc/systemd/system/celerybeat-project.service with the following:

[Unit]
Description=Celery Beat Service for celeryproject.example.com
After=network.target

[Service]
Type=simple
User=celery
Group=celery
Environment="ENV_PATH=.envs/.prod.env"
EnvironmentFile=/etc/conf.d/celery-project
WorkingDirectory=/path/to/your/django-project
ExecStart=/bin/sh -c '${CELERY_BIN} -A ${CELERY_APP} beat  \
    --pidfile=${CELERYBEAT_PID_FILE} \
    --logfile=${CELERYBEAT_LOG_FILE} \
    --loglevel=${CELERYD_LOG_LEVEL} \
    --schedule=/home/celery/celerybeat-schedule'
Restart=always

[Install]
WantedBy=multi-user.target

With this setup, we can have multiple Django projects, each with its own set of the above three configuration files (of course, they should be aptly named to distinguish between your projects).

Okay, now that everything has been configured, we can fire up celery and celery beat via systemd:

sudo systemctl start celery-project.service
sudo systemctl start celerybeat-project.service

Check to ensure that everything is working correctly:

sudo systemctl status celery-project.service
sudo systemctl status celerybeat-project.service

If you don't encounter any errors, you can then "persist" your services so that they automaticaaly run on boot:

sudo systemctl enable celery-project.service
sudo systemctl enable celerybeat-project.service

If you encounter problems, check your uWSGI/Gunicorn logs, and also check the log files defined in the above configuration files (see CELERYBEAT_LOG_FILE and CELERYBEAT_LOG_FILE).

For convenience, I have added these three configuration files in the .envs.example/celery directory on the Github repo.

Well, that's all folks! Happy deploying 🚀!


Cover image by Ineta Lidace from Pixabay

#django#python#deployment
 
Share this