Tag Archives: How to

Navigating Hallucinations in Large Language Models: A Simple Guide

AI is moving fast, and large language models (LLMs) are at the centre of it all, doing everything from generating coherent, human-like text to tackling complex coding challenges. And this is just scratching the surface—LLMs are popping up everywhere, and their list of talents keeps growing by the day.

However, these models aren’t infallible. One of their most intriguing and concerning quirks is the phenomenon known as “hallucination” – instances where the AI confidently produces information that is fabricated or factually incorrect. As we increasingly rely on AI-powered systems in our daily lives, understanding what hallucinations are is crucial. This post briefly explores LLM hallucinations, exploring what they are, why they occur, and how we can navigate them and get the most out of our new favourite tools.

Continue reading →

Memory-mapped files for efficient data processing

Memory management is a key concern when working with large datasets. Many researchers and developers will load entire datasets into memory for processing. Although this is a straightforward approach that allows for quick access and manipulation of data, it has its drawbacks. When the dataset size approaches or exceeds the available physical memory, performance degrades rapidly due to excessive swapping, leading to increased latency and reduced throughput. Memory-mapped files are an alternative strategy to access and manipulate large datasets without the need to load them fully into memory.

A background on memory-mapped Files

Memory mapping is the process of mapping a file or a portion of a file directly into virtual memory. This mapping establishes a one-to-one correspondence between the file’s contents on disk and specific addresses in the process’s memory space. Instead of relying on traditional I/O operations, such as read() an write(), which involve copying data between kernel space and user space, the process can access the file’s contents directly through memory addresses. Then, page faults are used to determine which chunks to load into physical memory. However, this chunks are significantly smaller than the whole file contents. This direct access reduces overhead and can significantly speed up data processing, especially for large files or applications that require high-throughput I/O operations.

Continue reading →

Streamlining Your Terminal Commands With Custom Bash Functions and Aliases

If you’ve ever found yourself typing out the same long commands over and over again, or if you’ve ever wished you could teleport directly to your favourite directories, then this post is for you.

Before we jump into some useful examples, let’s go over what bash functions and aliases are, and how to set them up.

Bash Functions vs Aliases

A bash function is like a mini script stored in your .bashrc or .bash_profile file. It can accept arguments, execute a series of commands, and even return a value.

Continue reading →

Making your python tool as easy to install as possible

Have you ever tried to use someone else’s code and spent a whole day trying to install it? Have you ever decided not to use a tool because installing it was a massive pain? Both of those have happened to me and, to be honest, it is a massive shame. The authors may spend large amounts of time developing these tools and in the end, no one uses them because they can’t get them to work. So I have decided to try and make all code I develop as easy and painless as possible to install and use.

Continue reading →

Linux Horror Stories and Protection Spells (Volume I)

Don’t get me wrong. I love Linux. After many years of using it, I ended up appreciating how flexible, potent, and even beautiful it is. However, using Linux has never been a bed of roses and every single Linux user that I know has had to deal with many problems since the very beginning. Indeed, I still remember how frustrating installing my first Linux machine was, especially after realizing that my network card was not working. Had I given up, I would never have written this post.

Although many of the problems that I faced while using Linux are related to updates and drivers (how painful NVidia drivers updates can be, I will write another post about that in the future), I must recognize that on many other occasions I was the only one responsible for such problems. Consequently, I want to warn the reader against a couple of those mistakes I made in the past and provide some tips about how to deal with them.

My worst nightmare: rm –r *

Continue reading →

Oxford Protein Informatics Group

or "OPIG" to friends

Tag Archives: How to

Navigating Hallucinations in Large Language Models: A Simple Guide

Memory-mapped files for efficient data processing

Streamlining Your Terminal Commands With Custom Bash Functions and Aliases

Bash Functions vs Aliases

Making your python tool as easy to install as possible

Linux Horror Stories and Protection Spells (Volume I)