I really hope my compounds get the green light

As a cheminformatician in a drug discovery campaign or an algorithm developer making the perfect Figure 1, when one generates a list of compounds for a given target there is a deep desire that the compounds are well received by the reviewer, be it a med chemist on the team or a peer reviewer. This is despite scientific rigour and training and is due to the time invested. So to avoid the slightest shadow of med chem grey zone, here is a hopefully handy filter against common medchem grey-zone groups.

Continue reading

Roche Continents 2024

This July I had the opportunity to be part of the Roche Continents programme [1]. The programme was organised by Roche and LUMA Arles and took place in the beautiful city of Arles in the south of France. Together with 40 students from various disciplines and European universities we discussed and explored the connection between arts, science, and sustainability. The theme of the week was resourcefulness.  

For students considering applying to Roche Continents next year, I’d like to offer some insights on what to expect, as well as share a few of my personal highlights from the experience. 

Continue reading

The wider applications of nanobodies

This week, it was my turn to give the short talk at our group meeting. I chose to present a recently published paper on thermostability prediction for nanobodies. The motivation for this work, at least in part, is the need for thermostability in the diverse applications of nanobodies. At OPIG, our research primarily revolves around the therapeutic uses of nanobodies, but their potential extends beyond this. I thought it would be interesting to highlight some of these broader applications here:

Continue reading

Making your code pip installable

aka when to use a CutomBuildCommand or a CustomInstallCommand when building python packages with setup.py

Bioinformatics software is complicated, and often a little bit messy. Recently I found myself wading through a python package building quagmire and thought I could share something I learnt about when to use a custom build command and when to use a custom install command. I have also provided some information about how to copy executables to your package installation bin. **ChatGPT wrote the initial skeleton draft of this post, and I have corrected and edited.

Next time you need to create a pip installable package yourself, hopefully this can save you some time!

Continue reading

Five-word stories about a world where AI dominates the world

Creative AI writing 🤖🖊️

For sale: baby shoes, never worn.” ~ Ernest Hemingway??

This is a six-word story famously misattributed to Ernest Hemingway. According to Wikipedia, this story first appeared in 1906, when Hemingway was 7 years old, and later attributed to him in 1991, 30 years after his death. So, no chance it was his.

Regardless of its origin, I found this type of story very creative.

In this blog post, as the title says, I will dare to push the boundary to present 5-word stories on the topic of AI taking over the world, BUT with a humorous spin.

Continue reading

My CCDC Science Day Experience

In June, I had the opportunity to visit the Cambridge Crystallographic Data Centre (CCDC) for Science Day to give a lightning talk on my rotation project with OPIG. The day was packed with presentations from researchers and PhD students collaborating with the CCDC, offering a great opportunity to hear about some of the fascinating work happening there in the fields of Structural and Computational Chemistry.

We kicked off with a dinner at the University Arms in Cambridge. This was a great opportunity to meet people who were attending Science Day in a relaxed environment, complemented by the lovely food and drink.

The next day was all about the talks. The first part of the day was filled with longer talks by more senior PhD students and CCDC researchers, followed by lightning talks from first-year PhD or master’s students. These shorter presentations provided a fast-paced overview of each project.

Continue reading

Happily hallucinating (for humans)

Many of us in academia face worries about an uncertain future. As an undergraduate, exams, assignments, exchanging information via auditory and visual cues with other members of the species1, then as one moves through the pipeline there’s funding, publications, the expectation that you know something about something, what will I be when I eventually grow up2, and I haven’t even mentioned the perennial question that is, what am I going to cook tonight?!

I have faced all of these worries and more, and will no doubt continue to, but through talking to my peers, mentors and family, I’ve learnt a few lessons that have proved invaluable for me, and perhaps will be for you as well.

Continue reading

Memory-mapped files for efficient data processing

Memory management is a key concern when working with large datasets. Many researchers and developers will load entire datasets into memory for processing. Although this is a straightforward approach that allows for quick access and manipulation of data, it has its drawbacks. When the dataset size approaches or exceeds the available physical memory, performance degrades rapidly due to excessive swapping, leading to increased latency and reduced throughput. Memory-mapped files are an alternative strategy to access and manipulate large datasets without the need to load them fully into memory.


A background on memory-mapped Files

Memory mapping is the process of mapping a file or a portion of a file directly into virtual memory. This mapping establishes a one-to-one correspondence between the file’s contents on disk and specific addresses in the process’s memory space. Instead of relying on traditional I/O operations, such as read() an write(), which involve copying data between kernel space and user space, the process can access the file’s contents directly through memory addresses. Then, page faults are used to determine which chunks to load into physical memory. However, this chunks are significantly smaller than the whole file contents. This direct access reduces overhead and can significantly speed up data processing, especially for large files or applications that require high-throughput I/O operations.

Continue reading

OPunting 2024

This week (2024-08-07) instead of our usual group meeting, OPIG took to the high seas. The OPIGlets pooled our resources and procured punts from many different berths. Organised by Admiral Nele, we departed from the Cherwell boathouse and shipped out the 0.5 nautical miles (3.28801867e-6 light seconds for those playing along in metric) upriver to the Vicky Arms.

Despite visiting the odd bush on the way, scurvy scallywags one and all were herded in a generally upstream direction with Matt and Eoin leading the way. With the first two punts having safely reached dry land and refuelled their ethanol fuel cells, the question remained where on earth everyone had got to. Sagely concluding they’d probably all sunk another pint was had in their honour.

Continue reading

Converting or renaming files, whilst still maintaining the directory structure

For various reasons we might need to convert files from one format to another, for instance from lossless FLAC to MP3. For example:

ffmpeg -i lossless-audio.flac -acodec libmp3lame -ab 128k compressed-audio.mp3

This could be any conversion, but it implies that the input file and the output file are in the same directory. What if we have a carefully curated directory structure and we want to convert (or rename) every file within that structure?

find . -name “*.whateveryouneed” -exec somecommand {} \; is the tool for you.

Continue reading