Making your python tool as easy to install as possible

Have you ever tried to use someone else’s code and spent a whole day trying to install it? Have you ever decided not to use a tool because installing it was a massive pain? Both of those have happened to me and, to be honest, it is a massive shame. The authors may spend large amounts of time developing these tools and in the end, no one uses them because they can’t get them to work. So I have decided to try and make all code I develop as easy and painless as possible to install and use.

Say you have just developed a new deep-learning tool in python (lets call it ABlooper). To maximize usage, you want to make it as easy to install as possible. What could be easier than this? 😍

$ pip install ABlooper

Okay, so now that we know where we want to get to, the question is how to get there.

Python actually makes it quite straightforward for us to achieve this with packages such as setuptools, build and twine. It even provides a server for people to easily share python packages.

Package structure and required files

The first thing we need to do is organise our code in the right way. I would recommend the following structure for a python package. This is just a skeleton however and I am sure many people would disagree on some of the details. But this will work, so unless you have a better way of doing it, it is good enough.

ABlooper/
├── ABlooper/
│   ├── some_code.py
│   ├── more_code.py
│   ├── __init__.py
│   └── data/
│       └── model_weights
├── LICENSE
├── README.md
├── MANIFEST.in
└── setup.py

If you are planning on sharing your code you should probably already have some code files, a README.md and a LICENSE. If the code you are sharing is machine learning based, you will probably have weights stored in a non-python file. We will have to explicitly tell setuptools to include these in the package.

The key file that will be doing most of the work is setup.py. Here is an template of what it should look like:

from setuptools import setup, find_packages

with open("README.md", "r", encoding="utf-8") as readme:
    long_description = readme.read()
setup(
    name='ABlooper',     # Name of the package. This is what people will be installing
    version='1.0.0',     # Version. Usually if you are not planning on making any major changes after this 1.0.0 is a good way to go.
    description='Set of functions to predict CDR structure',     # Short description
    license='BSD 3-clause license',  
    maintainer='Brennan Abanades',
    long_description=long_description,     # This just makes your README.md the description shown on pypi
    long_description_content_type='text/markdown',
    maintainer_email='youremail@stats.ox.ac.uk',
    include_package_data=True,     # If you have extra (non .py) data this should be set to True 
    packages=find_packages(include=('ABlooper', 'ABlooper.*')),     # Where to look for the python package
    install_requires=[     # All Requirements
        'numpy',
        'torch>=1.6',
    ],
)

Not all of the parameters used here a necessary, and there are many additional keywords that can be added to the setup function. A full list of keywords can be found here. For example, you can make a python function runnable from the command line using the entry_points keyword:

entry_points={'console_scripts': ['ABlooper=ABlooper.more_code:main']},

This way users will not even need to know any python to use your code!

There are two other files left to discuss:

  • __init__.py – Is used to let python know that the directory is a python package and can be an empty file.
  • MANIFEST.in – Enumerates all additional (non-python) data needed to run the package. In our case this would contain one line: include ABlooper/data/*.

Uploading to PyPi

At this point, you nearly have all you need to publish your python package. The last missing step is to create a PyPi account. This takes two seconds and can be done from here. Once that is done, you just have to run the following commands from the package base directory (the first ABlooper) to build and upload your package:

$ pip install --upgrade build twine
$ python3 -m build
$ twine upload dist/*

The dist directory will be generated by build and you will be asked to fill in your username and password when running twine.

Reap the rewards

And we are done!! Now anyone, anywhere, (assuming they have python installed and a good internet connection) will be able to easily install your package! All they have to do is: 😍

$ pip install ABlooper

And if you added an entry point, they can also run your python functions by simply calling:

$ ABlooper antibody.pdb

Now no one can say that they didn’t use your code because they couldn’t get it to work. If you got this far, I hope this was not a complete waste of your time.

Author