Author Archives: Adelaide Punt

AI in Academic Writing: Ethical Considerations and Best Practices

I don’t need to tell you how popular AI, in particular LLMs, have become in recent years. Alongside this rapid growth comes uncharted territory, especially with respect to plagiarism and integrity. As we adapt to a rapidly changing technological climate, we become increasingly reliant on AI. Need some help phrasing an email? Ask ChatGPT. Need to create a packing list for an upcoming camping trip? Get an AI-based task manager. So naturally when we’re faced with the daunting, and admittedly tedious task of writing papers, we question whether we can offload some of that work to a machine. As with many things, the question is not simply whether or not you should use AI in your writing process, it’s how you choose to do so.

When To Use AI

  1. Grammar and readability
    Particularly useful for those who are writing in a second language, or for those who struggle to write in their first language (my high school English teachers would place me firmly in this category), LLMs can be used beneficially to identify awkward phrasing, flag excessively complex sentences, and identify and fix grammatical errors.
  2. Formatting and structure
    LLMs can take care of some of the tedious, repetitive work with respect to formatting and structure. They are particularly useful for generating LaTeX templates for figures, tables, equations, and general structure. You can also use them to check that you’re matching a specific journal’s standards. For example, you can give an LLM a sample of articles from a target publication, and ask it to note the structure of these papers. Then, give it your work and ask it to make general, larger-scale suggestions to ensure that your work aligns structurally with articles typical of that journal.
  3. Reference management
    Although the references should be read and cited by an author, various management tasks like creating properly formatted BibTeX entries can be handled by LLMs. Additionally, you can use LLMs to do a sanity check to ensure that your references are an accurate reflection of the source material they are referring to. However, they should not to be used to summarise the source and create references on their own. 
  4. Summarising large volumes of literature
    If you’re reviewing large volumes of literature, LLMs can help summarise papers efficiently and point you in the right direction. Although you should always cite and refer back to the original source, LLMs can distill key points from long, dense papers, organise notes, and extract important takeaways from datasets and figures.

Regardless of how you use AI, it is importance to keep a record of all instances of such AI use throughout your research, including use during coding, Some journals will make you explicitly declare the use of AI tools, but even if it is not required this kind of record-keeping is considered good practice. 

When Not to Use AI

  1. Big-picture thinking and narrative development
    Academic papers are not solely about presenting information, they are about constructing an argument, building a narrative flow, and presenting a compelling case. LLMs are not particularly good at replicating human creativity, that work is best left to the authors. Additionally, it is dishonest to claim these important aspects of writing as your own if they are not written directly by you.
  2. Direct copy-paste
    Although AI tools may suggest minor edits, you should never directly copy-and-paste larger selections of AI-generated text. If the ethical concerns described in (1) do not persuade you as they should, there are now plenty of tools being used to detect AI-generated text by various academic institutions and journals. Although some scholars do tend to lean on AI as a more collaborative tool, transparency is key. 
  3. Source of knowledge
    LLMs don’t actually “know” anything; they generate responses based on probability. As a result, they have a tendency to “hallucinate,” or present false information as fact, misrepresent or oversimplify complex concepts, and do not have precise technical accuracy. They may also be biased based on the sources they were trained on. Peer-reviewed sources should be the go-to for actual information. If you use LLMs to summarise something, always refer back to the original text when using that information in your work.
  4. Full citation generation
    As discussed above, although AI can be used to summarise sources, it is not a reliable source of direct citations. All references should be created by hand and verified manually.
  5. General over-reliance
    From the research design process to the final stages of writing and editing, you should generally refrain from an over-reliance on AI. Although LLMs can be powerful tools that can be used to automate various lower-level tasks, they are not a substitute for critical thinking, originality, or domain expertise, and they are not a full-fledged co-author of your work. The intellectual contribution and ownership of ideas remains in the hands of the human authors. 

For further and more official guidance, check out the ethical framework for the use of AI in academic research published in Nature Machine Intelligence. This framework outlines three criteria for the responsible use of LLMs in scholarship, summarised as follows:

  1. Human vetting and guaranteeing the accuracy and integrity 
  2. Substantial human contribution across all areas of the work
  3. Acknowledgement and transparency of AI use

Memory-mapped files for efficient data processing

Memory management is a key concern when working with large datasets. Many researchers and developers will load entire datasets into memory for processing. Although this is a straightforward approach that allows for quick access and manipulation of data, it has its drawbacks. When the dataset size approaches or exceeds the available physical memory, performance degrades rapidly due to excessive swapping, leading to increased latency and reduced throughput. Memory-mapped files are an alternative strategy to access and manipulate large datasets without the need to load them fully into memory.


A background on memory-mapped Files

Memory mapping is the process of mapping a file or a portion of a file directly into virtual memory. This mapping establishes a one-to-one correspondence between the file’s contents on disk and specific addresses in the process’s memory space. Instead of relying on traditional I/O operations, such as read() an write(), which involve copying data between kernel space and user space, the process can access the file’s contents directly through memory addresses. Then, page faults are used to determine which chunks to load into physical memory. However, this chunks are significantly smaller than the whole file contents. This direct access reduces overhead and can significantly speed up data processing, especially for large files or applications that require high-throughput I/O operations.

Continue reading

Exploring multilingual programming

Python is a prominent language in the ML and scientific computing space, and for good reason. Python is easy-to-learn and readable, and it offers a vast selection of libraries such as NumPy for numerical computation, Pandas for data manipulation, SciPy for scientific computing, TensorFlow, and PyTorch for deep learning, along with RDKit and Open Babel for cheminformatics. It is understandably an appealing choice for developers and researchers alike. However, a closer look at many common Python libraries reveals their foundations in C++. 

Revisiting C++ Advantages

Many of Python libraries including TensorFlow, PyTorch, and RDKit are all heavily-reliant on C++. C++ allows developers to manage memory and CPU resources more effectively than Python, making it a good choice when handling large volumes of data at a fast pace. A previous post on this blog discusses C++’s speed, its utility in GPU programming through CUDA, and the complexities of managing its libraries. Despite the steeper learning curve and verbosity compared to Python, the performance benefits of C++ are undeniable, especially in contexts where execution speed and memory management are critical.

Rust: A New Contender for High-Performance Computing

Continue reading