Category Archives: Code

Trying out some code from the Eighth Joint Sheffield Conference on Chemoinformatics: finding the most common functional groups present in the DSPL library

Last month a bunch of us attended the Sheffield Chemoinformatics Conference. We heard many great presentations and there were many invitations to check out one’s GitHub page. I decided now is the perfect time to try out some code that was shown by one presenter.

Peter Ertl from Novartis presented his work on the The encyclopedia of functional groups. He presented a method that automatically detects functional groups, without the use of a pre-defined list (which is what most other methods use for detecting functional groups). His method involves recursive searching through the molecule to identify groups of atoms that meet certain criteria. He used his method to answer questions such as: how many functional groups are there and what are the most common functional groups found in common synthetic molecules versus bioactive molecules versus natural products. Since I, like many others in the group, are interested in fragment libraries (possibly due to a supervisor in common), I thought I could try it out on one of these.

Continue reading

Should scientists learn C++?

Conventional wisdom dictates that compiled languages are slow to develop, can be slow to compile, but are fast to run. Interpreted languages are easy to use and do not require compilation but have sluggish performance. Like most people in scientific computing, the first two languages I learned were C++ and Python; I use Python every day but when, if ever, would I use C++?

Continue reading

Quick Python tricks

It’s always fun when you stumble across something in your programming toolkit that you had never noticed. Here are three things I’ve recently enjoyed learning.

  • Ternary syntax
    a = int(raw_input())
    is_even = True if a % 0 == 0 else False
  • Enumerate

I’ve been looping over the length of my list, all these years, like a chump. It turns out you can do this:

for index, item in enumerate(some_list):
    # now the index of each item is available as well as the item    
# Don't do do this
for index in range(len(some_list)):
    item = some_list[index]
  • for… else

Every so often, you really need to know that a for loop has run to completion. That’s what for…else is for!

for item in iterable:
if item % 0 == 0:
first_even_number = item
else:
raise ValueError('No even numbers')

Property based testing in Python with Hypothesis : how to break your own code before someone else does

Traceback (most recent call last):
ZeroDivisionError: integer division or modulo by 0

We’ve all been there. You’ve written your code, tested it out on some toy data and then when you make the move to the real data, there was something you didn’t expect.

Maybe some samples have been truncated to zero. Maybe the input arrays are the wrong shape. Suddenly your code comes crashing down around you, and you’re left thinking: well how could I have known that was going to happen? I can’t test everything

Continue reading

Some useful tools

For my blog post this week, I thought I would share, as the title suggests, a small collection of tools and packages that I found to make my work a bit easier over the last few months (mainly python based). I might add to this list as I find new tools that I think deserve a shout-out.

Biopandas

Reading in .pdb files for processing and writing your own parser (while being a good exercise to familiarize yourself with the format) is a pain and clutters your code with boilerplate.

Luckily for us, Sebastian Raschka has written a neat package called biopandas [1] which enables quick I/O of .pdb files via the pandas DataFrame class.

Continue reading