Testing python (or any!) command line applications

Through our work in OPIG, many of our projects come in the form of code bases written in Python. These can be many different things like databases, machine learning models, and other software tools. Often, the user interface for these tools is developed as both a web app and a command line application. Here, I will discuss one of my favourite tools for testing command-line applications: prysk!

Why test?

Testing is an important part of the code development process for several reasons. First, we can ensure the correctness of our code by setting up inputs and expected outputs and then comparing these to the outputs our code produces. Second, testing allows our code to be adaptable. When we (or a completely different developer) need to change something in the code, pre-existing tests give us confidence that the changes we make are not disrupting any of the expected behaviours of the code. Testing also provides developers with examples of how to use the software- they show what inputs a piece of code takes in, what it produces, and potentially show the different ways the code can be used. In a way, tests can form part of the documentation. Finally, (in my opinion at least) testing generally produces better quality code. If something is hard to test, it usually means it is not a great piece of software- the inputs are too complicated, the outputs are difficult to parse, there are too many modalities, etc.

A lot of the resources online focus on unit testing, and writing tests for individual functions and classes within our code base, and these are great! This post is not to say don’t unit test, it is to highlight an additional type of testing we can do: command-line application testing.

Prysk: Test your CLIs (not just unit test!)

It can be really useful to test your entire user interface and not just the code within it. This can help ensure that the code works as intended on different machines and conditions. My favourite library for writing command line tests is prysk. It is based on an earlier library called cram but more effort goes into plugin development and upkeep. Prysk integrates nicely with pytest using the prysk-pytest plugin.

We write prysk tests in .t files that verbosely run commands and compare them to the expected outputs in the file. There is very little to prysk/cram tests:

  • lines without spaces in the front are considered comments
  • lines with two spaces and a $ are considered commands
    • lines with two spaces and a > are continuations of a command on the previous line
  • lines with two spaces are considered outputs (either stdout or stderr)

For a command to pass, the output underneath a command should match the output that is generated from running the command. In this way, tests are composed using multiple unix operations and CLIs.

An Example

I will take an example of a CLI test, from a recent project. The test directory looks as follows:

> tree tests/apps/sample_ots
.
├── data
│   ├── SRR13113582_1_Paired_All.csv
│   ├── SRR13113801_1_Paired_All.csv
│   └── SRR24837722_1_Paired_All.csv
├── reference
│   └── sample.csv
└── test_sample_ots.t

The data folder is a common convention for the input data for the test and the reference folder is a common convention for the expected outputs of the test but this is not required. The test_sample_ots.t file is where the actual test is:

> cat tests/apps/sample_ots/test_sample_ots.t
Test app.
  $ python -m tcr_pmhc_interface_analysis.apps.sample_ots \
  > --seed 123 \
  > -n 1 \
  > --sample-size 5 \
  > -o test.csv \
  > $TESTDIR/data

  $ diff test.csv $TESTDIR/reference/sample.csv

The test works by running a custom CLI (python -m tcr_pmhc_interface_analysis.apps.sample_ots ) on the input data and then comparing the outputs to the reference data with the diff command. Here, the two data sources should be the same so there is no output after the diff command. If there is an error, when we run the test using pytest (pytest tests/apps/sample_ots/ ) we get:

> pytest tests/apps/sample_ots
======================================================================== test session starts ========================================================================
platform linux -- Python 3.10.14, pytest-7.4.4, pluggy-1.5.0
plugins: anyio-3.6.2, prysk-0.2.0
collected 1 item                                                                                                                                                    

tests/apps/sample_ots/test_sample_ots.t F                                                                                                                     [100%]

============================================================================= FAILURES ==============================================================================
_____________________________________________________________________ [prysk] test_sample_ots.t _____________________________________________________________________
--- /ceph/project/koohylab/bmcmaste/projects/tcr-pmhc-interface-analysis/tests/apps/sample_ots/test_sample_ots.t
+++ /ceph/project/koohylab/bmcmaste/projects/tcr-pmhc-interface-analysis/tests/apps/sample_ots/test_sample_ots.t.err
@@ -7,3 +7,8 @@
   > $TESTDIR/data
 
   $ diff test.csv $TESTDIR/reference/sample.csv
+  6c6
+  < NIATNDY,GYKTK,LVGYNNNDMR,MNHEY,SVGAGI,ASSPGRGKYEQY,TRAV4*01,TRBV6-5*01,TRAJ43*01,TRBJ2-7*02
+  ---
+  > NIATNDY,GYKTK,VGYNNNDMR,MNHEY,SVGAGI,ASSPGRGKYEQY,TRAV4*01,TRBV6-5*01,TRAJ43*01,TRBJ2-7*02
+  [1]

====================================================================== short test summary info ======================================================================
FAILED tests/apps/sample_ots/test_sample_ots.t::test_sample_ots.t
======================================================================== 1 failed in 11.22s =========================================================================

When we fix the bug in our CLI and re-run we get:

> pytest tests/apps/sample_ots
==================================================================================== test session starts ====================================================================================
platform linux -- Python 3.10.14, pytest-7.4.4, pluggy-1.5.0
rootdir: /ceph/project/koohylab/bmcmaste/projects/tcr-pmhc-interface-analysis
plugins: anyio-3.6.2, prysk-0.2.0
collected 1 item                                                                                                                                                                            

tests/apps/sample_ots/test_sample_ots.t .                                                                                                                                             [100%]

===================================================================================== 1 passed in 0.86s =====================================================================================

And that is all there is to it!

A word of caution on CLI tests

Unit tests are great because they are quick to run and debug. Often when writing command line tests I find my lazy habit is too include too much data to mock the application inputs and outputs. This can both increase the time it takes to run the test and can effect things like git if the file sizes are too big.

It is fine for CLI tests to take longer than unit tests, often our CLIs are comprised of many different functions and classes under the hood, but they shouldn’t take more than a few minutes to run. If it becomes too lengthy to run a test, we inevitably end up skipping them. As with any test, try to strip your inputs and outputs down incorporate the minimally required components and include options in your applications that allow for shorter execution pathways (I.e. only one training loop for an ML model).

I would also recommend only storing plain text files as mock data. Binary encoded files create a lot of bloat in a git repository and can cause issues when size limits are reached. Get creative on how to store and compare the outputs if your applications rely on binary file formats.

Happy testing!

Author