Slurm and Snakemake: a match made in HPC heaven

Snakemake is an incredibly useful workflow management tool that allows you to run pipelines in an automated way. Simply put, it allows you to define inputs and outputs for different steps that depend on each other, Snakemake will then run jobs only when the required inputs have been generated by previous steps. A previous blog post by Tobias is a good introduction to it – https://www.blopig.com/blog/2021/12/snakemake-better-workflows-with-your-code/

However, often pipelines are computationally intense and we would like to run them on our HPC. Snakemake allows us to do this on slurm using an extension package called snakemake-executor-plugin-slurm.

pip install snakemake-executor-plugin-slurm

Once set up, each part of the pipeline can be sent off as an individual slurm job. Therefore, the jobs can be run across as many CPUs or GPUs as you have access to.

As a toy example, we have two scripts that download a random number of SMILES strings from the ZINC database. Once these smiles have been accessed, a second script uses these SMILES and RDKit to generate 3D conformations for these compounds. In this example, the running of the second script depends on the running of the first script. We can define this in our .smk (the Snakemake file format) file:

import random
# Generate 10 random numbers of SMILES between 5 and 50
counts = random.sample(range(5, 51), 10)

# Export counts so Snakemake knows the valid values
SMILES_COUNTS = [str(n) for n in counts]

# Does the final checks to ensure that the outputs of the pipeline are generated
rule all:
input:
expand("conf_{num}/done.txt", num=SMILES_COUNTS)

# Does the check that individual parts of the pipeline are run or need to be run
rule get_smiles:
output:
"smiles_{num}.txt"
shell:
"python -m scripts.get_smiles --num {wildcards.num} --output {output}"

rule generate_conformers:
input:
smiles="smiles_{num}.txt"
output:
done="conf_{num}/done.txt"
shell:
"python -m scripts.generate_conf --input_file {input.smiles} --outdir conf_{wildcards.num}"

The second script writes a done.txt file so that Snakemake can know that all conformers are generated. 

This Snakemake pipeline can be run locally using the simple command:

snakemake -j 4 -s {snakemake_filename}.smk

However, to run this run on Slurm, we need a config.yaml file that we can define our variables to:

max-status-checks-per-second: 0.01
executor: slurm
conda-prefix: {CONDA_ENVS_PATH}
jobscript: slurm_job.sh

default-resources:
 runtime: 1h
 time: 30:00:00
 slurm_account: {USERNAME}
 mem: 4G
 cpus_per_task: 1
 slurm_partition: {PARTITION}
 clusters: {CLUSTER}

printshellcmds: true

This defines the default resources for each slurm job and also defines the path for the conda-prefix. To allow the Snakemake pipeline to know which conda environment to use we additionally have to add the conda env yaml file to each rule. I haven’t managed to get the conda environment to be automatically downloaded (which Snakemake claims is possible) but it would be possible to add it as a rule within the pipeline. I just installed the environment using the command line. An example of a single rule is below:

rule get_smiles:
conda: {PATH_TO_ENV_YAML}
output:
"smiles_{num}.txt"
shell:
"python -m scripts.get_smiles --num {wildcards.num} --output {output}"

Furthermore, resources can be defined for specific jobs, useful if some jobs need GPUs but others only need CPUs. For example, we can change one rule to need only 1G of memory:

rule get_smiles:
conda: {PATH_TO_ENV_YAML}
output:
"Smiles_{num}.txt"
resources:
mem=”1G”
shell:
"python -m scripts.get_smiles --num {wildcards.num} --output {output}"

We can run this all using the following command:

snakemake -s generate_random_mols.smk –configfile config/config.yaml –jobs 100 –latency-wait 10 –keep-going –rerun-incomplete

We can define the maximum number of jobs to run, and how long Snakemake waits to check for files’ existence. The last two ensure Snakemake keeps going even if a single job fails and reruns any that have been interrupted in previous iterations.

Happy snake-making on slurm!

Author