python-cicero: A Python Packaging Tutorial

python-cicero: A Python Packaging Tutorial

As mentioned in my last Labs post, last month I released python-cicero, a Python wrapper library for Azavea’s Cicero API. You might recall me mentioning in that post that the Python packaging process is a bit of a mess. You might also remember the talk I gave about the project at the GeoPhilly Meetup group drew praise from some of the attendees for its conveyance of man versus machine conflict inherent in the packaging journey. If you took a look at those slides, you may have noticed pictures of rubber ducks and horses holding cats, which I shamelessly stole from another fantastic but exasperated-feeling programming talk which accurately captured my sentiments towards Python packages at the time:

Wat – wæt: n. The only proper response to something that makes absolutely no sense.

We wrote our Python wrapper. We have docs. Even unit tests. One would think we’d be past most of the hurdles that stand between us and shipped Python code, but there is one final harder-than-it-should-be section of the journey to overcome: How do we make it so other people can install and use our wrapper?

The answer is to turn our wrapper into a Python package, and upload it to the Python Package Index. Why is this hard? One issue is the lack of clear, authoritative documentation on the process – part of the reason I’m writing this post. So it should come as no surprise our first obstacle is one of vocabulary.

Modules, Packages, and Distributions

At first, our wrapper is just a Python “module” – just some .py source files. Modules can be just one .py file, or “multi-file modules” like in our case – several .py files in a directory (in our case, “cicero“) with a special __init__.py file that tells Python to treat the whole directory as one module. As a multi-file module, users will be able to import everything necessary for the wrapper with a simple “from cicero import *”, or even “from cicero import CiceroRestConnection”.

To make it so others can easily download and install it, with either Python’s “easy_install” command or the far superior “pip”, we have to make our module a proper Python “package” and upload a version of that package (called a “distribution” file) to the Python Package Index.

Those three terms bear repeating. A module is one or many Python source files. A package is is one or many modules, as well as some supporting files which we’ll get into below. A distribution is an archive file (think tarball) of a package that is uploaded to PyPI and what your users will actually download from the internet and install with easy_install or pip.

Having gone through this process, I believe the Python community does not take sufficient care to distinguish among these three terms when discussing packaging. Often, Pythonistas will refer to pretty much everything as a “package”. This results in unnecessary confusion and contradiction for newcomers as they try to understand the already messy packaging process. “pip” stands for Pip Installs Packages, when really it’s often downloading/installing distribution files. The Python Package Index is not called the Python Distribution Index, when it probably should be. Folks will refer to a directory of Python files as a package, when they probably really mean a multi-file module.

The Packaging Process

With our terminology settled, what are the “supporting files” I mentioned that go into a package? I’m glad you asked! Here’s a list of the key ones:

  • The modules to be packaged
  • A changelog – CHANGES.txt is the convention
  • A license if the package is open source – LICENSE.txt is the convention
  • A readme file written in reStructuredText, and that’s more than just a convention (see below)
  • A MANIFEST.in file
  • A setup.py file
  • Other non-essential but related files: documentation, example scripts, tests, etc

I’ll assume you know about changelogs and licenses and readme files – if not, they’re easy to find out about and no specific formatting is required for your package, it’s just “A Good Idea TM” to have them. However, the reason you should write your readme file in reStructuredText if you can is because it will form the basis of your project’s page on PyPI. PyPI will automatically read and format reStructuredText with headings and all that good jazz. You can write your readme file in Markdown or just plain text, but it won’t look as nice.

Finally, we already have a module[s], and a “docs” folder that Pycco generated with our documentation files, as well as a “cicero_examples.py” file. So let’s move on to the two files we haven’t encountered yet: MANIFEST.in and setup.py.

MANIFEST.in

Whichever Python Packaging utility (more on that in a moment) that you use to create your distribution file and submit your software to PyPI will include some files by default – the .py source files it can find, for one. Invariably, however, those will not be the only files you want to include as part of your package and/or distribution! Documentation, the changelog, and example files are all commonly overlooked by the packaging utilities but in fact critical parts of your finished package and distribution. The MANIFEST.in file’s job is to identify all these extra files to be included. To take python-cicero’s MANIFEST.in as an example:

[github file = “/azavea/python-cicero/blob/master/MANIFEST.in”]

You can just put all the files you want to include in your package/distribution in this file, with a preceding “include” statement. If you have a whole directory you want to include, save yourself some typing and use a “recursive-include” statement and asterisk to include all that directory’s files, like I do above for “docs”.

setup.py

This is the real glue that finally puts your package together. It’s actually a short Python program that is run when you first register your package on PyPI, again when you build a distribution file, and finally when you upload that distribution to PyPI. It’s usually pretty simple, with just an import statement to bring in your packaging utility and a call to that packaging utility’s setup() function, with many metadata parameters passed to that function:

[github file = “/azavea/python-cicero/blob/master/setup.py”]

Sidebar: what’s this “packaging utility” I’ve been referring to? I used a utility called “setuptools.” If you just want to get up and running, I recommend you use setuptools as well. If you’re using pip and virtualenv, you surely already have it in your virtualenv. Unless you have strange edge cases, it will also probably work to package your package. But there are other alternative packaging utilities out there with different edge cases and compatibilities, and this is one of the reasons Python packaging is so confusing. If you see references to other utilities by the names of distutils, distribute, distutils2, or even “bento” – don’t fret. They all accomplish roughly the same thing as setuptools. The first and second answers to this stackoverflow post give a great overview of what all these other utilities are and some of the open source community minutiae reasons why they exist and even why they are merging back with each other. Again, no need to stress over it, and just go with setuptools for now if you can.

Back to setup.py: There’s only two setup() parameters that are really essential: “name” and “packages”. “name” tells setuptools what the name of your package is, and “packages” tells setuptools what packages (really, multi-file modules and modules – again with Python’s terminology inconsistency!) are included in the package you’re creating. If you don’t have many packages, you can just list them. If you have a lot, or want a shortcut, you can import and use setuptools’ “find_packages()” function like I did, which searches the directories under setup.py recursively for all Python multi-file modules. In my case, it found both my “cicero” module and my “test” module under it.

All the other parameters I used, while not essential, are really really useful for both listing on PyPI and your users. Let’s go over a few:

  • version – As you fix bugs and add new features, you’ll likely upload and release new versions of your package. So give it a version number!
  • author and maintainer and email fields – You wrote it, give yourself credit! And if you’d like, give your email so your users can contact you with questions.
  • url – your project’s PyPI page is likely not the only or even the best location for information about your package. Put your extra URL’s if you have any here.
  • description and long_description – Your PyPI listing will be built from these. You can use Python to open and read your README file directly – again, if you wrote it in RST format, your PyPI page will be nicely formatted.
  • extras_require and/or install_requires – Use these if your project has other Python packages as dependencies. In the case of python-cicero, the wrapper itself is implemented entirely with the standard library, so nothing else is required. But if someone anticipates wanting to edit the documentation, they should install Pycco too. And this is what our extras_require entry would allow them to do:
    $ pip install python-cicero['docs']

    If you anticipate your users using pip to install your package, then you might also want a requirements.txt file. More information on handling requirements is available here and here.

  • classifiers – PyPI has an extensive list of classifiers for package listings. These are sort of like tags, and will help people find your project and understand a bit about it. Pick a few like a development status, license, and topic from this list exactly as they appear.

The list of options that can go into setup.py is quite extensive; look at the official docs for more but the above is certainly enough to get you started.

Submission to PyPI

We’ve made it to our last step! Our package and all its files are written, and we’re ready to register the project with PyPI and upload a distribution for others.

First, make accounts at both the test PyPI and the real PyPI. Especially for your first time, you’ll want to try this process out first on the test site – it gets cleaned out and reset every so often so there’s no risk if you mess up. You’ll want to make sure you’ve given your package a name that is not already taken on the real PyPI before you try and upload there, too. Once you take up a name on the live PyPI, you’ve taken that name as a possibility from other users forever.

Next, create a ~/.pypirc file in your home directory (Windows users – you’ll need to set a HOME environment variable to point to the location of this file):

[pypirc]
index-servers =
    test
    pypi

[test]
repository: https://testpypi.python.org/pypi
username:your_pypitest_username
password:your_pypitest_password

[pypi]
repository: https://pypi.python.org/pypi
username:your_pypi_username
password:your_pypi_password

With your login info saved in .pypirc, we have a few simple commands left:

$ python setup.py register -r test

The above should have registered your project with the test PyPI and created a page for it. See if you can get there by going to https://testpypi.python.org/pypi/name_of_your_package. If it worked, now you can build a source distribution file (sdist) and upload it to the test PyPI:

$ python setup.py sdist upload -r test

Look at your package’s test page – is there a tar.gz file listed near the end to download? Great! Now we can do the same process for real:

$ python setup.py register -r pypi
$ python setup.py sdist upload -r pypi

And we’re finally done. Your users should now be able to install your package easily with:

$ pip install your_package
$ #OR
$ easy_install your_package

Overview

Congratulations, you’ve just released some Python software! Now you know about:

  • The differences between a Python module, multi-file module, package, and distribution, and how they’re frequently confused
  • The Python Package Index
  • Creating key files like MANIFEST.in and setup.py which, in addition to Python modules, make up your Python package
  • The steps needed to upload and submit your package to both the PyPI test and PyPI Live instances

If you’re lost or curious, I found these resources incredibly helpful when going through this process for the first time:

Additionally, you can look to the packages Azaveans have contributed to PyPI as examples – django-queryset-csvpython-cicero, and python-omgeo. By all means, pip install them and try them out!