Environment modules is a great tool for high-performance computing as it is a modular system to quickly and painlessly enable preset configurations of environment variables, for example a user may be provided with modulefile for an antiquated version of a tool and a bleeding-edge alpha version of that same tool and they can easily load whichever they wish. In many clusters the modules are created with a tool called EasyBuild, which delivered an out-of-the-box installation. This works for things like a single binary, but for conda this severely falls short as there are many many configuration changes needed.
Activating conda
Conda is a curious fish to start with. It is not distributed in Linux package managers. It does have a licence that needs accepting (automatically accepted with the -b flag), but so do many linux packages. Out of the box it needs activating. If installed without the -b flag (batch) or conda init is run, a bunch of messy bash commands get appended to .bashrc, primarily a bunch of failsafes to the command.
eval $("$CONDA_PREFIX/bin/conda shell.bash hook 2> /dev/null")
Namely, conda binary is run ($(...)
) with the arguments bash and hook, wherein its error messages get sent to the null bucket, while the output is a shell script, which gets evaluated.
In bash running the command source
(or ./
or eval
) and bash
(or exec
kind of) can have different effects: source runs the shell script using the same shell while bash run it in a different one. The environment initialisation needs to be sourced therefore —an obvious but important detail.
There are three ways to initialise conda.
- Allowing the messy snippet to be added to
$HOME/.bashrc
and sourcing it, which happens on logging in. I.e. one is sourcing/evaluating the output ofconda bash hook
- sourcing the
$CONDA_PREFIX/etc/profile.d/conda.sh
script, which is basically the same evaluation. Personally, I prefer this option. - Adding the variables manually
Conda variables
Once this is done one can activate the base environment, via conda activate
or conda activate base
, or a virtual environment, via conda activate ENVNAME
. Double misleadingly, conda activate
without base
when run on an already activated base environment will fail telling you that the environment variables are missing, which is a lie.
Various environment variables get set in doing so. printenv
prints your environment variables, which makes it really handy.
- The key one is the adding to PATH the bin folder of the conda folder. This can be tampered with outside of conda. Files prepended at the front get priority, appended to the back are the last resort.
- CONDA_ROOT —the folder where conda lives. say
$HOME/.conda
- CONDA_PREFIX —the folder of the current environment, for base
$CONDA_ROOT
=$CONDA_PREFIX
- CONDA_EXE —
$CONDA_PREFIX/bin/conda
- CONDA_PYTHON_EXE —
$CONDA_PREFIX/bin/python
- PKG_CONFIG_PATH —this is a system package alternative path, nothing do with python packages. But you might have a
$CONDA_PREFIX/lib/pkgconfig
folder - CONDA_SHLVL —conda shell level. 1 is base.
- CONDA_PROMPT_MODIFIER — the text that gets prepended to
$PS1
, which is the text that appears before your cursor. In my .bashrc I haveexport PS1="[\u@\h \W]\$"
, which makes my prompt remind me of my username ($USER
) and the hostname ($HOST
) and my working directory ($PWD
). - CONDA_DEFAULT_ENV —your environment name
- There are many possible conda environment variables as any (on paper) config in a
.condarc
file can be used as an environment variable by going uppercase and underscored. For examle$CONDA_SOLVER
,$CONDA_CHANNELS
,$CONDA_YES
,$CONDARC
and$CONDA_ENVS_PATH
etc.
- There are many possible conda environment variables as any (on paper) config in a
A big caveat needs raising regarding the last one. Always check. For example, $CONDA_CHANNELS
does not work in all versions.
Modulefile
Now that we have gone over how to get conda activated, we need to configure the various environment variables for the module command to use.
A module file is a file that tells the module command how to load it. It is written in TCL. There is generally a panel of modulefile written by the sys-admin of your cluster, but you can add your own by appending the folder of your modulefiles to $MODULEPATH
, e.g. export MODULEPATH="$MODULEPATH:my-path-with-modulefiles"
. The main commands to remember are setenv
, set-alias
and prepend-path
, system
and puts stderr/stdout
.
#%Module proc ModulesHelp { } {} module-whatis {} puts stderr "This is shown to the user on module load and unload" # set variables within this code: env variables from shell are called via <code>$env(...)</code> set root path-where-conda-lives set userhome $env(HOME) set userconda $env(HOME)/.conda conflict conda # conda deactivate on unload if {[module-info command unload]} {system AUTO_ACTIVATE_BASE=false $root/bin/conda deactivate} # add envs (on unload they will be unset or replaced) prepend-path MANPATH $root/man prepend-path MANPATH $root/share/man # See footnote? # prepend-path PATH $root/bin # prepend-path PATH $root/sbin prepend-path PKG_CONFIG_PATH $root/lib/pkgconfig # `python install -u` by default setenv PYTHONUSERBASE $userhome/.local # user created envs go here: setenv CONDA_ENVS_PATH $userconda/envs setenv JUPYTER_CONFIG_PATH $userhome/.jupyter # base config for jupyter. setenv JUPYTER_CONFIG_DIR $root/.jupyter # CONDA_ENVS_DIR is not the base config # if not using a $CONDA_ENVS_PATH environment variables can do the job setenv CONDARC $userhome/.conda setenv CONDA_SOLVER libmamba setenv CONDA_YES true setenv CONDA_CHANNELS "conda-forge nvidia bioconda" # etc. if {[module-info command load]} { # give hits to the user system touch $env(HOME)/.condarc system mkdir -p $userconda/envs system mkdir -p $userhome/.local system mkdir -p $userhome/.jupyter # enable puts stdout "source $root/etc/profile.d/conda.sh ;" }
The footnote is that one could add to one environment that way. Conda has its own system which allows one even to “subclass” one virtual environment into another.
conda env config vars set PATH=$PATH:/Users/matteo/.conda/bin:some-other-env-bin
In the above the virtual environment will search its own folders, then the /usr/local/bin etc. and lastly the other environment.
The big catch to this variable is stored in the conda-meta/state file in the environment and not an environment specific .condarc
, which is not a thing as instead conda env export
output is generated on the fly. This means that the user would need to be made aware of the alteration as there is no way they are going checking, so the modulefile way (along with a puts stderr
call maybe) is way more clear.
Footnote: Can I borrow this?
As mentioned, adding to $MODULEPATH
results in the modulefiles therein to be visible when running module avail
. This means one can have one’s personal modulefile collection. One could copy these from other clusters: the headers on the output of module avail tells you the path. In the file there will be written where the binary files are.
Then one can share in turn. Except permissions get in the way. To make a folder visible to someone with some with a common group one needs to first change the group ownership to that group chgrp -R COMMONGROUPNAME FILEPATH
and change group permissions chmod -R g+rX FILEPATH
. Uppercase X means give x if user has it (equivalent to g=u,g-w basically). To make a folder visible to all chmod -R o+rX FILEPATH
.