I recently spent some time working out how to include mini inset plots within ggplot2 facets, and I thought I would share my code in case anyone else wants to achieve a similar thing. The resulting plot looks something like this:
Continue readingCategory Archives: Code
How to Iterate in PyMOL
Sometimes pointing-and-clicking just doesn’t cut it. With PyMOL’s built-in Python interpreter, repetitive actions are made simple.
Continue readingTrying out some code from the Eighth Joint Sheffield Conference on Chemoinformatics: finding the most common functional groups present in the DSPL library
Last month a bunch of us attended the Sheffield Chemoinformatics Conference. We heard many great presentations and there were many invitations to check out one’s GitHub page. I decided now is the perfect time to try out some code that was shown by one presenter.
Peter Ertl from Novartis presented his work on the The encyclopedia of functional groups. He presented a method that automatically detects functional groups, without the use of a pre-defined list (which is what most other methods use for detecting functional groups). His method involves recursive searching through the molecule to identify groups of atoms that meet certain criteria. He used his method to answer questions such as: how many functional groups are there and what are the most common functional groups found in common synthetic molecules versus bioactive molecules versus natural products. Since I, like many others in the group, are interested in fragment libraries (possibly due to a supervisor in common), I thought I could try it out on one of these.
Continue readingSearching through large databases with bloom filter
Searching through large databases is often a linear time problem. Here I compare the performance of applying a bloom filter and using the regular std::find command in C++:Codes are from: https://codereview.stackexchange.com/questions/179135/bloom-filter-implementation-in-c
Constrained Embedding with RDKit
This blog post explores the RDKit function ConstrainedEmbed.
Continue readingA Brief Introduction to ggpairs
In this blog post I will introduce a fun R plotting function, ggpairs, that’s useful for exploring distributions and correlations.
Should scientists learn C++?
Conventional wisdom dictates that compiled languages are slow to develop, can be slow to compile, but are fast to run. Interpreted languages are easy to use and do not require compilation but have sluggish performance. Like most people in scientific computing, the first two languages I learned were C++ and Python; I use Python every day but when, if ever, would I use C++?
Continue readingQuick Python tricks
It’s always fun when you stumble across something in your programming toolkit that you had never noticed. Here are three things I’ve recently enjoyed learning.
- Ternary syntax
a = int(raw_input()) is_even = True if a % 0 == 0 else False
- Enumerate
I’ve been looping over the length of my list, all these years, like a chump. It turns out you can do this:
for index, item in enumerate(some_list): # now the index of each item is available as well as the item
# Don't do do this for index in range(len(some_list)): item = some_list[index]
- for… else
Every so often, you really need to know that a for loop has run to completion. That’s what for…else is for!
for item in iterable:
if item % 0 == 0:
first_even_number = item
else:
raise ValueError('No even numbers')
Property based testing in Python with Hypothesis : how to break your own code before someone else does
Traceback (most recent call last):
ZeroDivisionError: integer division or modulo by 0
We’ve all been there. You’ve written your code, tested it out on some toy data and then when you make the move to the real data, there was something you didn’t expect.
Maybe some samples have been truncated to zero. Maybe the input arrays are the wrong shape. Suddenly your code comes crashing down around you, and you’re left thinking: well how could I have known that was going to happen? I can’t test everything
Continue readingSome useful tools
For my blog post this week, I thought I would share, as the title suggests, a small collection of tools and packages that I found to make my work a bit easier over the last few months (mainly python based). I might add to this list as I find new tools that I think deserve a shout-out.
Biopandas
Reading in .pdb files for processing and writing your own parser (while being a good exercise to familiarize yourself with the format) is a pain and clutters your code with boilerplate.
Luckily for us, Sebastian Raschka has written a neat package called biopandas [1] which enables quick I/O of .pdb files via the pandas DataFrame class.
Continue reading