Technical | Oxford Protein Informatics Group

Memory management is a key concern when working with large datasets. Many researchers and developers will load entire datasets into memory for processing. Although this is a straightforward approach that allows for quick access and manipulation of data, it has its drawbacks. When the dataset size approaches or exceeds the available physical memory, performance degrades rapidly due to excessive swapping, leading to increased latency and reduced throughput. Memory-mapped files are an alternative strategy to access and manipulate large datasets without the need to load them fully into memory.

A background on memory-mapped Files

Memory mapping is the process of mapping a file or a portion of a file directly into virtual memory. This mapping establishes a one-to-one correspondence between the file’s contents on disk and specific addresses in the process’s memory space. Instead of relying on traditional I/O operations, such as read() an write(), which involve copying data between kernel space and user space, the process can access the file’s contents directly through memory addresses. Then, page faults are used to determine which chunks to load into physical memory. However, this chunks are significantly smaller than the whole file contents. This direct access reduces overhead and can significantly speed up data processing, especially for large files or applications that require high-throughput I/O operations.

Continue reading →

In this blog post, I’ll show you guys how to make your own shiny container for your tool! Zero fuss(*) and in FOUR simple steps.

As an example, I will show how to make a singularity container for one of our public tools, ANARCI, the antibody numbering tool everyone in OPIG and external users are familiar with – If not, check the web app and the GitHub repo here and here.

(*) Provided you have your own Linux machine with sudo permissions, otherwise, you can’t do it – sorry. Same if you have a Mac or Windows – sorry again.
BUT, there are workarounds for these cases such as using the remote singularity builder here, for which you only need to sign up and create an account, and the use of Virtual Machines (VMs), as described here.

Continue reading →

Oxford Protein Informatics Group

or "OPIG" to friends

Tag Archives: Technical

Memory-mapped files for efficient data processing

How to make your own singularity container zero fuss!