Category Archives: How To

How to do things. doh.

On being cool: arrows on an R plot

Recently I needed a schematic graph of traditional vs on-demand computing (don’t ask) – and in this hand waving setting I just wanted the axes to show arrows and no labels.  So, here it is:

x <- c(1:5)
y <- rnorm(5)
plot(x, y, axes = FALSE)
u <- par("usr") 
arrows(u[1], u[3], u[2], u[3], code = 2, xpd = TRUE) 
arrows(u[1], u[3], u[1], u[4], code = 2, xpd = TRUE)

And here is the output

Arrowed plot

(I pinched this off a mailing list post, so this is my due reference)

Next thing I am toying with are these xkcd like graphs in R here.

A free, sweet, valid HTML4 “Site Maintenance” page

So today we have moved servers from the cloud to a physically local server and we needed a “Site Maintenance” page.  A few google searches turned up a simple HTML5 template which I converted to HTML4 and is reproduced hereunder (could not find the original source, aargh):

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
        "http://www.w3.org/TR/html4/loose.dtd">

<html>

<head>
    <meta http-equiv="Content-type" content="text/html;charset=UTF-8">
    <title>Site Maintenance</title>
    <style type="text/css">
      body { text-align: center; padding: 150px; }
      h1 { font-size: 50px; }
      body { font: 20px Helvetica, sans-serif; color: #333; }
      #article { display: block; text-align: left; width: 650px; margin: 0 auto; }
      a { color: #dc8100; text-decoration: none; }
      a:hover { color: #333; text-decoration: none; }
    </style>

</head>
<body>
    <div id="article">
    <h1>We&rsquo;ll be back soon!</h1>
    <div>
        <p>Sorry for the inconvenience but we&rsquo;re performing some maintenance at the moment. If you need to you can always contact us on <b>opig AT stats.ox.ac.uk</b>, otherwise we&rsquo;ll be back online shortly!  Site should be back up on Friday 1st March 2013, 16:00 GMT.</p>
        <p>&mdash; OPIG</p>
    </div>
    </div>
</body>
</html>

And here is what it looks like… (nothing glamorous, you have been warned)

SiteMaintenance

How to install RDKit on Ubuntu 12.04 / 12.10 / 13.04 / 13.10 / 14.04 / 14.10 (with InChI support)

I make extensive use of this brilliant piece of cheminformatics software (RD)kit, and it has saved me writing my own Molecule, Atom, Bond, Conformer, Fingerprint, SimilarityMetric, Descriptor etc. classes time and time again.  It is really neat, and works with C++ and python (and Java I think).  Here are the instructions on how to install it on a relatively recent Ubuntu version (12.04 / 12.10 / 13.04 / 13.10 / 14.04 / 14.10).

Pre-requisite software (this is why I love Debian based systems)

sudo apt-get install flex bison build-essential python-numpy cmake python-dev sqlite3 libsqlite3-dev libboost-dev  libboost-python-dev libboost-regex-dev

Get the latest RDKit goodie from here (watch out – version number has been replaced by ‘X’ below)

wget http://downloads.sourceforge.net/project/rdkit/rdkit/QX_20XX/RDKit_20XX_XX_X.tgz

Unzip the beast, save it to /opt

sudo tar xzvf RDKit_20XX_XX_X.tgz -C /opt

Add some environment salt, vim ~/.bashrc

export RDBASE=/opt/RDKit_20XX_XX_X
export LD_LIBRARY_PATH=$RDBASE/lib:$LD_LIBRARY_PATH
export PYTHONPATH=$RDBASE:$PYTHONPATH

Resource your .bashrc

. ~/.bashrc

if you want the InChI stuff (trust me you do), first:

cd $RDBASE/External/INCHI-API/
./download-inchi.sh

Build (compile), install & test

cd $RDBASE
mkdir build
cd build
cmake .. # if you do not care for InChI support OR
cmake -DRDK_BUILD_INCHI_SUPPORT=ON .. # to install InChI generation code 
make # -j 4 to use multiple processors
make install
ctest

If all your tests passed successful you are good to go.  Otherwise, get in touch via the post comments below.

 

 

How to make a custom latex bibliography style

Imagine you are writing up your latest thrilling piece of science in your favourite odt or docx format. Nothing comes from nothing so you need to cite the 50 or so people whose ideas you built on, or who came to conclusions that contradict yours. Suddenly you realize that your second sentence needs a reference… and this will require you to renumber the subsequent 50. What a drag! There goes 5 minutes of your life that could have been better spent drinking beer.

Had you written your research in latex instead, this drudgery would have been replaced by a range of much more interesting and intractable difficulties. In latex it is easy to automagically renumber everything just by recompiling the document. Unfortunately, it is often hard to direct latex’s magic… just try moving a picture an inch to the right, or reformatting a reference.

Moving figures around is still a black art as far as I’m concerned… but I’ve recently found out an easy way to reformat references. This might be especially handy when you find out that your sort of proteins fall out of the scope of the International Journal of Eating Disorders and you now want to submit to a journal that requires you to list authors in small-caps, and the dates of publication in seconds from the Unix epoch.

A good way of including references in latex is with a “.bib” file and a “.bst” file. An example of the end of a document is shown below.


\bibliographystyle{myfile}
\bibliography{mycollection}

\end{document}

What’s happening here? All my references are stored in bibtex format in a database file called “mycollection.bib”. A separate file “myfile.bst” says how the information in the database should be presented. For example, are references in the text of the form (Blogs et al 2005) or are they numbered (1)? At the end of the text are they grouped in order of appearance, by date of publication or alphabetically? If alphabetically does “de Ville” come under “d” or “v”? To reformat a reference, we simply need to change “myfile.bst”.

Most latex distributions come with a set of bibliography styles. Some examples can be found here (a page which also explains all of the above much better than I have). However, it is very easy to produce a custom file using the custom-bib package. After a one-click download it is as simple as typing:


latex makebst.ins
latex makebst.tex

Here’s a screenshot to prove it. At the bottom is the first of thirty or forty multiple-choice questions about how you want your references to look. If in doubt, just close your eyes and press return to select the default.

Screen shot 2013-02-17 at 00.34.45

The problem with a multiple-choice fest is that if you make a poor decision at question 28 you have to go through the whole process again. Fortunately, this can be circumvented — as well as generating a pretty “myfile.bst” file, the custom-bib package generates an intermediate file “myfile.dbj”. Changing your multiple-choice answers is just a matter of commenting out the relevant parts and typing “latex myfile.dbj”. A snippet of a “dbj” file is below:

Screen shot 2013-02-17 at 00.41.42

Selected options are those without a “%” sign on the left hand side. Who would have thought that Latex could be so cuddly?

How to make environment variables persist in your web app (Apache2)

Last Sunday, with the Memoir gang, we were talking about using this blog as a technical notebook. This post is in that spirit.

We have moved most of the validation of Memoir before submitting the job to the queue.  This has the advantage that the user knows immediately the submission has failed (instead of waiting for the job to be processed in the queue).  The only issue with this is that the apache user (on most systems www-data) is going to run the validation pipeline.  Memoir requires a ton of PATH variables addition.

In most cases you cannot add variables in .bashrc as the apache user does not have a home directory.  The easiest way how to add environment variables for the apache user is:

sudo vim /etc/apache2/envvars

And add the following line at the bottom (these have to be customized according to the server):

export PATH=$PATH:/opt/tmalign:/opt/muscle:/opt/joy-5.10:/opt/joy-5.10/psa:/opt/joy_related-1.06:/opt/joy_related-1.06/sstruc:/opt/joy_related-1.06/hbond:/opt/medeller-dev/bin:/opt/usearch6.0.307:/opt/ncbi-blast-2.2.27+:/opt/MPT/bin

After this restart apache – and you should be laughing. Or, as is often the case, apache will be laughing at you.

sudo service apache2 restart

The apache user should now “see” the amended path variable.

Here it is this technical note – so in a few weeks time when I forget all of the above I can just copy and paste it …

(This post is short, also because there is the superbowl!!)

Anatomy of a blog post

Now, shall I use the first person singular or plural to write this?  Active or passive voice? …

It doesn’t really matter.  This isn’t a formal article, and you can even use abbreviations.  This group blog, like anything else during our time in Oxford, is an experiment.  We will give it a few months and see what happens.  If it pans out, we will have a, more or less, detailed research journal for the group.  Not to mention a link with the outside world (prospective students? employers?) and proof that we can “communicate” with others. And since this is an exploratory exercise we should have freedom to explore what we want to write about.

We should have plenty of fodder.  Let us face it, if we do not do some mildly interesting science every week then we are probably not having enough fun.  But even if you are working on a hushed up, undercover project (e.g. the next blockbuster drug against Malaria) – there are still so many interesting bits of our D.Phil. which would otherwise never see the light of day.

For inspiration, have a look at other popular scientific blogs – the chembl one is both educational and humourous in equal measures (Post Idea #1: list of bio/cheminformatics blogs which every grad student should read).  Blogs are a great way to survey literature without actually doing any reading (Post Idea #2: tricks to increase grad student productivity… what do you mean you don’t use Google Alerts to surprise your supervisor with a link to a paper published the day before?); and for a TL;DR version there is twitter (Post Idea #3: Idea #1 but for twitter instead).  I only found out about the four stranded DNA in human cells by following @biomol_info.

And of course, we are mostly a computational group – so software is what we churn out on a daily basis.  How much of the software we write ends up resting forever on our disks, never to be used again.  The masses want splitchain!  (Idea #4: post software you wrote).  And there is benefit in not only giving out software, but also explaining the internals with snippets  (Idea #5: a clever algorithm explained line-by-line).

And then there is the poster you hung up once (Idea: #6) or the talk you gave and prepared for hours on your disposable, use-once-only slides (Idea: #7).  There is the announcement of publishing a paper – that solemn moment in academia when someone else thinks what you have done is worthy (Idea: #8 – btw well done to our own Jamie Hill for his recent MP-T work).

And if your an athelete, like Anna (Dr. Lewis) who crossed the atlantic in a rowing boat or Eleanor who used to row for the blues – what can I say, this is how we roll, or row [feeble attempt at humour] – thats a non-scientific but unique and interesting experience too (Idea #8).  .

If you’ve read a paper and you think it’s interesting comment on it – people will follow your posts just because it acts like a literature filter (Idea #9).  You can probably even have a rant (Idea #10); as long as its more positive and less bitter than Fred Ross’ Farewell to Bioinformatics.

Finally, this post is long and tedious for the reader.  But that is ok too – like everything else here, it is a learning experience and the more I write the more I will improve.  So hey, I’m also doing this to write a better thesis (i.e. to make the writing less painful).

An addendum; my initial intention was to discuss the bits which make a good blog post.  You can find lots of articles about this – so it is less interesting; but here are the main points

cover_real_conformers

If a picture is really worth a thousand words, 30 of these is all I need for my thesis.