Same science, different stories: writing papers vs writing grants

As a PhD student, you will write a lot of papers, and at a certain point, you will also start writing grants. It took me a moment to realise that these are not the same skill. While they draw on the same science and sometimes the same project, they are different genres with different rules. Treating a grant proposal like a mini-paper is one of the most common and avoidable ways in which people damage their own applications. Here’s what I’ve learnt so far, mostly through trial and error.

Step one: know the landscape before you write a word

Before you open your text editor, you need to research the funding opportunity you’re interested in. Find out when the deadline is, which documents are required, what makes you eligible or ineligible, and what are the mandatory requirements versus those desirable. It may sound obvious, but it’s surprisingly easy to miss a deadline or, even worse, to base your entire career plan on a fellowship that you don’t qualify for or don’t have the right documents for.

Not all funders play by the same rules. Read each charter carefully before assuming that your project fits.

Make sure you read the call text properly. There is a large difference between different funding entities, particularly in terms of their funding criteria. Some have narrow, clearly defined objectives, while others have specific request, such as the nominee of multiple mentors or are only obliged to fund things “of general interest”. Be aware of mobility requirements too (your first postdoc may need to be somewhere you don’t intend to stay! Think two steps ahead), as well as PhD-age limits. If there is a maximum age limit, make sure you apply while you still can. If there is a minimum age or number of years’ experience and the call is annual, use the waiting time to work on the weakest part of your profile, so that you become a stronger candidate by the time you are eligible. Bear in mind that bureaucracy takes time, too. If you need documents from your current or future institution, request them well in advance, as some of these can easily take a month or more to get to you.

Academic writing and grant writing pull in opposite directions

The objective is the same, but the approach is very different: using the active voice, specific numbers and clear causality can turn vague intentions into a concrete plan.

We spend years training ourselves to write papers in the past tense, adopting a neutral tone and using field-specific jargon to explain what we did. Grant applications require almost the opposite approach: they should be written in the future tense, be persuasive and personal enough to convey real enthusiasm, and be written for a broader audience than usual. Evaluators aren’t only assessing your previous work; they’re being asked to invest in your future. If your proposal reads like a methods section, you’ve lost sight of your objective before you’ve even made your case.

It helps to remember who you are actually writing for. A proposal has several stakeholders: the funder, who wants their R&D objectives met; the evaluators, who are the gatekeepers you need to convince; your future supervisor or department; your collaborators; and, eventually, society itself. Each of them is reading your text looking for something slightly different, and a good proposal addresses all these groups without losing sight of its main objective.

Remember that evaluators are often bus. They look for any excuse to stop reading, particularly if the first page of your work does not grab their attention. They will keep reading if your project is significant and original, and if your research plan is internally consistent and your deliverables are smart. They will also keep reading if your writing doesn’t waste their time. That’s it. Everything in your proposal should work towards achieving these things.

Write a structure that actually holds together

A good proposal has a solid structure to hold everything together. Begin with the overall problem and explain why it is important. Then break it down into sub-problems and, for each one, include the following: a state-of-the-art overview, an explanation of the gap, a clear objective to close the gap, a work package to deliver the objective, concrete deliverables and a contingency plan in case things go wrong. Everything funnels back up into impact. If a sentence in your proposal cannot be traced back to one of these categories, ask yourself why it is there.

Skeleton structure of a basic proposal.

When writing the text, ensure that the order of your sections matches that of the items in the guide for applicants, and use the same wording as in the call for your headings. Remember that evaluators are looking for specific information; make it easy for them to find what they need in your text.

Last but not least, don’t forget to add a visual timeline. A Gantt chart showing work packages, deliverables and milestones over time is much more convincing to evaluators than a paragraph; it’s often the fastest way for a busy evaluator to check that your plan is feasible and that you are aware of the workload each package requires.

The biggest difference is made at the sentence level.

This is where most people miss out on the most value, because it’s mechanical and can be fixed. A few habits:

  • Write active sentences and use “I”, not “we”. Funders are investing in you, not your collaborators: stating that ‘this will be done in collaboration with X’ can undermine your case.
  • Use verbs, not nominalisations. ‘I will decide’ is better than ‘I will make a decision’.
  • Avoid words such as ‘understand’, ‘explore’ and ‘investigate’ in your objectives. Evaluators are often explicitly not in the business of funding intellectual exploration as they want defined outcomes.
  • Topic sentences first. The main point of a paragraph should be included in the first one or two sentences, rather than being buried at the end.
  • Get feedback, including on your English, before you submit.
Academic writing explains what you’ve done. Grant writing sells what you’re going to do.

Grammatically, the bad version of each example I have shown you above is fine. This is the kind of thing I’d have written just based on my paper writing experience. However, they are vague, passive and evasive: ‘Will be investigated’, ‘it is anticipated that’, ‘a deeper understanding’. The good version specifies a number, a method and an ‘I’, and explains why the objective matters before explaining how it will be achieved.

The final thing is to remember to link “because” to the stated gaps: writing “I will do X because [gap] currently blocks [outcome]” is more effective than “I will do X. I will also do Y,” as it transforms your proposal from a wish list into a plan.

The short version

A grant proposal is not a paper about work you have already done; it is a sales pitch for work you want to do, aimed at people who are looking for reasons not to read it. Familiarise yourself with the landscape, read the call properly and structure your proposal around the following: problem, objective, work package, deliverable and impact. Write in the active voice and be specific about what you will deliver and why it matters. The science doesn’t change, but the way you say the story has.

OPIGlet’s First Conference: PEGS Boston 2026

Hello everyone!

For my very first blog post, I’m excited to share my experience attending and speaking at my first conference.

I was invited to present at the Protein & Antibody Engineering Summit (PEGS) Boston 2026 on behalf of OPIG. The conference ran from May 10–15, and like many antibodies-stream OPIGlets before me, I delivered the OPIG tools short course. This formed part of a three-hour session titled In silico and Machine Learning Tools for Antibody Design and Developability Predictions. The short courses took place on the Sunday, before the main conference officially began on Monday, so somewhat terrifyingly, I gave a conference talk before I had ever attended one.

I arrived in Boston on Saturday after a long day of travel and was pleasantly surprised by a free bus into downtown (thanks, Boston MBTA). After a much-needed sleep, I headed to the conference centre, checked into my hotel (thank you, PEGS team), and tried not to explode from nerves. Thankfully, the talk went well; there were plenty of questions, and I quickly settled into speaking in front of the group. With an audience of around 50 people—only slightly larger than an OPIG group meeting—it felt like familiar territory.

Solo conference tip

Attending a conference on your own can be a little intimidating, and the first day can feel a bit lonely. If the conference has a young scientists’ meetup or networking event, go to it! It’s a great way to meet people early on, and the whole experience is much more enjoyable when you have a conference buddy (shout-out to Rucha 👋😊).

Science highlight: The Immunogenicity Database Consortium (IDC)

As a self-confessed database enthusiast (watch this space for OAS-nano), I was particularly impressed by the work of the Immunogenicity Database Consortium (IDC). Immunogenicity data is notoriously difficult to work with: it’s highly variable, hard to compare across studies, and often not easily accessible. The IDC is tackling this by consolidating data from sources such as the FDA into a unified, searchable database.

So far, they have compiled 4,146 anti-drug antibody (ADA) datapoints across 218 therapeutics. Using multivariate regression, they found that mechanism of action is the strongest predictor of ADA frequency. This kind of cross-study insight simply wouldn’t be possible without data at this scale.

Clean, well-curated datasets are essential for training meaningful machine learning models, and the IDC is laying the groundwork for this in immunogenicity research. It’s exciting to think about the potential for accurate immunogenicity prediction in the future. Even more impressive is that the IDC is a volunteer-driven effort, built purely for the advancement of science.

To read more about the IDC, see their pre-print on biorxiv:

S. Agnihotri, B. Gonzalez-Nolasco, B. Monian, S. Pattijn, C. Ackaert, P. Wu, H. Kettenberger, S. Tourdot, T. Hickling, Z. Hu, R. E. Higgs, D. S. Leventhal, The Immunogenicity Database Collaborative (IDC): A Standardized, Publicly Available Database for Clinical Immunogenicity Observations and Insights. bioRxiv [Preprint] (2025). https://doi.org/10.64898/2025.12.08.692993.

Conference takeaways

Antibodies take centre stage

Despite not making it into the acronym, antibodies are undeniably the stars of PEGS. The vast majority of the talks I attended, along with many of the booths in the exhibit hall, focused on antibodies in one way or another.

Miniproteins are hot right now

With advances in machine learning and AI, miniproteins (the smallest proteins that still fold and function like a protein) are gaining traction due to their designability. They featured in several talks, including those from BindCraft and AI Proteins.

A strong industry presence

With only about 8% of attendees coming from academia, I occasionally felt like the odd one out. On the other hand, it was fascinating to see so much innovation coming out of biotech startups, and the conference provided plenty of opportunities to connect with people working in industry.

Plushies!

I didn’t expect this, but I came home with seven stuffed toys from the exhibit hall! Highlights include three Giant Microbes, a Highland cow, and a yellow polka-dot llama. Special thanks to the company that gave me a Nalgene bottle, which I used it to smuggle them all home.

One essential Boston recommendation

If you ever find yourself in Boston: go to the Harvard Museum of Natural History.

I was told this repeatedly, and it absolutely lived up to the hype. The glass flower exhibit is astonishing—it genuinely doesn’t look real. Knowing it was created by a father-son duo between 1886 and 1936 makes it even more impressive.

The rest of the museum is also well worth exploring. Some of the taxidermy is unintentionally hilarious, and the museum’s age means it houses some fascinating (if ethically questionable) historical specimens, including a thylacine.

These are made of GLASS. I can’t explain to you how real they look in person.

Final thoughts

Overall, PEGS Boston 2026 couldn’t have been a better first conference. I was definitely thrown into the deep end by giving a talk before I’d ever attended a conference session, but I’m incredibly grateful for the opportunity. Throughout the week, I met so many brilliant people and learned a huge amount. A big thank you to Christina Lingham for organising everything and making the experience possible. It was a fantastic introduction to the world of scientific conferences.

OPIG Retreat, 2026: Heatwave Edition

Last week, a sizable fraction of OPIG headed to “The Plough” near Bradford on Avon, Wiltshire for our OPIG Retreat (a.k.a. “OPIGtreat”). Some of us travelled on Monday by train, encountering a biblical deluge, a darkness resembling a train tunnel but with pelting rain and bath-tubs of water washing the train windows, and lightning strikes every 30s while changing in Bath Spa…

The red monster storm with black eye is what greeted us as we were arriving in Bath Spa. (Credit: UK Thunderstorm Updates: “This is what the satellite imagery is currently showing from the storm affecting Somerset/ Wiltshire. Cloud tops on this monster have exceeded 40,000ft….”)

Undeterred, and either shuttled by the fabulous Anita from Bradford on Avon station, or walking on foot, we arrived at our lovely destination and home for the week.

Continue reading

SAbDab2: The structural antibody database in the age of machine learning 

Henriette L. Capel, Odysseas Vavourakis, Benjamin H. Williams, Christopher R. Taylor, and Charlotte M. Deane

The Structural Antibody Database 

The Structural Antibody Database (SAbDab) [1] is a publicly available repository of experimentally determined antibody structures, first released in 2013. Explicit support for single-domain antibodies was added in 2021, with SAbDab-nano [2]. Detailed annotations and consistent maintenance have made SAbDab a central resource supporting important advances in the field. SAbDab has been used to study antibody-antigen interactions, including SARS-CoV-2; to predict antibody structure; to design antibodies de-novo; and to investigate antibody flexibility. 

Continue reading

Building an Agent – Practical Notes for Beginners

For the last few months, I’ve been building an agent around OPIG’s antibody analysis and design tools, and I thought I’d share some practical notes from the process.

An agent is a language model that doesn’t just answer questions but can also decide what to do, call tools, and follow workflows. I’m using Claude in these notes, but most of the ideas apply equally well to other agent frameworks.

Rather than building an agent from scratch, we’re starting with one that already comes with useful capabilities out of the box. For example, Claude Code can search files, edit code, execute commands, and run scripts. Everything below is really about adapting that behaviour to a specific domain and workflow.

How to start?

Start with the `CLAUDE.md` file. It’s a special file Claude reads at the start of every conversation, and it’s where you define the behaviour of the agent (other agents have their own equivalent — for example `AGENTS.md`). In this file, include things like bash commands, code style preferences, and workflow rules. This gives Claude a persistent context that it can’t infer from the codebase alone. Since it’s loaded every session, it sets the baseline for how the agent behaves.

Start simple – especially if it’s your first time. Define clear tools, write lightweight instructions in the markdown (md) file, and create realistic evaluations before adding complexity.

Then run a loop where the agent gathers context, takes actions, and verifies the outputs. Think about how you’ll verify them first: if you can’t tell whether a run was good, you can’t tell whether your changes helped.

In research, you don’t always know how a project will evolve, so you’ll often end up making many changes along the way. But for projects that are relatively well-defined, I’ve found it’s worth spending some time upfront with pen and paper, specifying what you want the agent to do before writing it all out.

From there, most development becomes an iterative process of improving the md files and adjusting tools when needed.

What is a tool?

A tool gives the agent a capability. It executes an action and returns a result — calling an API, running code, querying a database, and so on.

The key idea is that tools are deterministic: given the same input, they produce the same output. So if I ask, “Can you check whether this is an antibody?”, the agent will always reach for the same tool — `execute_run_anarci()` — and get the same result.

A tool can be an MCP server or simply a Python function; what matters is that it gives the agent a reliable way to perform a specific action. Both work.

For example, I implemented execute_anarci_number() as a Python function — a thin wrapper around ANARCI — and it returns a structured JSON output with the results and the execution status. All the tools follow the same general structure, which makes them easier for the agent to use consistently.

The signature and docstring are really all the agent needs to decide when to reach for it:

def execute_anarci_number(sequence: str, chain_name: str = "Chain") -> dict:
"""Identify and number an antibody/TCR sequence using ANARCI.

Returns chain type, species, numbering, and whether it's a valid antibody.
Chain types: H=Heavy, K=Kappa light, L=Lambda light, A=TCR-alpha, B=TCR-beta
"""


The function itself is simple: it runs ANARCI, parses the numbering, extracts the CDRs, and checks whether the input looks like a real, complete variable domain. Instead of returning a bare error when numbering fails, the tool returns a structured verdict the agent can reason about:

# numbering failed → the sequence just isn't an antibody (not a tool error)
return {
"success": True,
"chain_name": chain_name,
"is_antibody": False,
"is_tcr": False,
"chain_type": None,
"species": None,
"message": "ANARCI could not number this sequence. "
"It is likely not an antibody or TCR variable domain.",
"sequence_length": len(sequence),
}

One thing I found useful is having tools return an explicit verdict, not just output, so the agent knows whether it received an answer, encountered an error, or was given an invalid input.

A few things that helped:

  • Use the agent itself to help write the tools. It’s good at it, especially if you give Claude documentation for any software libraries, APIs, or SDKs you’re wrapping.
  • Don’t forget to document the tool in the markdown workflow file so the agent knows it exists and when to use it.
  • Open a fresh session and check the agent can actually call the tools correctly before building on top of them.

What is a skill?

Skills extend Claude with procedural knowledge. They teach the agent how to perform a task, not just what tools are available.

I think of tools as capabilities and skills as workflows. Tools let the agent do something; skills tell it how to approach a task. A tool might tell Claude how to number an antibody sequence. A skill tells it how to carry out an antibody analysis workflow: which tools to use, in what order, what outputs to expect, and how to interpret the results.

Without skills, the model has to rediscover that workflow from scratch each time. Skills package it once and make it reusable.

A skill is just a folder containing a SKILL.md file (instructions plus metadata) and optional scripts or reference material. One nice advantage is portability: because a skill is just a folder of markdown and scripts, you can write it once and reuse it across different projects, environments, and even different agent frameworks.

To make it concrete, here’s one of mine: ab-diversity-select. After an optimization run, I’m left with dozens of candidate antibodies and need to select a small, maximally diverse subset where the retained mutations remain structurally safe. Rather than re-explaining that workflow every time, I captured it as a skill:

ab-diversity-select/
├── SKILL.md # when to use it + the procedure
├── structural_pipeline.py
├── pipeline.py
└── config_template.py

The SKILL.md header tells Claude when the skill is relevant:


name: ab-diversity-select
description: >-
Select a structurally-validated, maximally-diverse subset of antibody candidates from a results CSV…

The rest of the file describes the procedure, while the accompanying scripts do the heavy lifting. When Claude encounters a task like “pick 20 diverse antibody candidates,” it can automatically apply my workflow instead of inventing a new selection strategy from scratch.

Practices that worked for me

There’s already a lot of useful information out there, for example:

anthropic.com/engineering

Claude Code best practices

A few things I’d highlight:

Keep the markdown files organized. `CLAUDE.md` is loaded every session, so only put things in it that apply broadly. For domain-specific knowledge or workflows that are only relevant sometimes, use skills instead. There’s no required format for `CLAUDE.md`; just keep it short and human-readable. Mine roughly covers: setup & environment, architecture & code map, and failure handling.

Use subagents to protect the context. Once the basic agent is working, most improvements come from managing context effectively. Subagents run in their own context with their own set of allowed tools. They’re useful for subtasks that require a lot of context. For example, summarizing a paper. In practice, though, I mostly used them for tools that generate large outputs, where it becomes difficult for a single agent to process everything cleanly within one context window.

I defined small operator agents that return only compact summaries. The main agent stays focused on planning and interpretation, large tool outputs stay outside its context, and cheaper, faster models handle parsing and batch work.

Prompts matter — a lot. Performance changes significantly depending on the prompt. From my experience, when building longer workflows, improving the prompt often helps more than editing the markdown files.

For example, explicitly defining the expected output format and level of detail can reduce lazy behaviour and make the agent more consistent across runs.

One approach I like is building a skill that interviews the user up front about the information you care about using the built-in `AskUserQuestion` tool, and then generates the prompt from the user’s answers in a structured way.

Use the agent to explain its own failures. The agent is actually pretty good at explaining where it failed and why. Use it to help debug and improve itself. Ask it what went wrong, have it suggest edits to the markdown files, or ask what it learned during the session. Some of my best improvements came from just asking the agent why a run failed.

A few bio-specific lessons

First, watch the jargon and define your terms. “Diverse” might mean sequence distance, V-gene spread, or structural diversity. Say exactly what you mean, or define it explicitly in your workflow files.

Second, the agent will always give you an answer, so make sure it is grounded in tools rather than invented. A language model can easily produce a confident, plausible-looking sequence or numbering out of thin air. If you do not explicitly tell the agent to use the available tools, it may continue without them, even when they exist.

Finally, keep a human in the loop. Read the logs yourself, understand what happened, and do not trust a clean-looking summary on its own. Ask the agent to explain each step and justify its decisions — that is often the fastest way to catch a wrong assumption before it ends up in your results.

Agents are surprisingly capable, but I still found it challenging to get them to reliably execute long workflows without intervention. In practice, I had the most success when treating the agent as a collaborator rather than a fully autonomous system, giving it clear tools, workflows, and checkpoints along the way.

Building agents is still a fast-moving area, and there are many ways to approach it. It can feel confusing at first, but once you start experimenting and building real projects, things become much clearer. My advice would be to start simple, build something useful, and learn by doing.

References:
1. https://code.claude.com/
2. https://code.claude.com/docs/en/agent-sdk/modifying-system-prompts
3. https://youtu.be/TqC1qOfiVcQ?si=K24t3oxuHgYWs375
4. https://www.aiwithamitay.com/p/skills

Networks beyond proteins: a Lake Como summer school

My DPhil uses network representations of protein complexes to predict drug targets, so when a summer school on complex networks came up, I wanted to see what tools and ideas from the broader field I might be missing. The Lake Como School on Complex Networks  brought together students and postdocs from universities around the world to discuss recent applications and future possibilities using networks. This was the school’s 10-year anniversary, so we were honoured to have many of the lectures given by founding members of the society.

Continue reading

How Unusual Is Your Generated Molecule? Let The CCDC Tell You

In this post I’ll walk through how to set up the CCDC Python API and use the CSD Geometry Analyser to evaluate the geometric quality of molecules from three representative structure-based de novo design models. I’ve put together a small GitHub repo with the full analysis code where we look at bond lengths, angles, torsions, and ring conformations across the three methods, and compare these against their PoseBusters validity scores to see what each metric is really capturing.

Continue reading

Peering Inside the Black Box: A Beginner’s Introduction to Mechanistic Interpretability

Over the last few years, large language models (LLMs) have gone from being curiosities tucked away in research labs to something most of us interact with on a daily basis; whether for drafting emails, debugging code, or simply pondering the meaning of life at 2am. And yet, for all our reliance on these systems, a rather inconvenient truth lingers in the background: nobody, not even the people who built them, can fully explain what is going on inside.

This is where mechanistic interpretability comes in.

In essence, mechanistic interpretability is the approach of explaining complex machine learning systems through the behaviour of their functional units (Kästner and Crook, 2024) by reverse-engineering them into their more elementary computations (Rai et al., 2025). The aim is not simply to know that a model gives the right answer, but to pull apart the underlying machinery and uncover the causal relationships between input and output. Think of it as neuroscience for neural networks, except we can read every neuron at any moment, rewind, replay, and intervene mid-thought.

Continue reading

A timeline of sampling methods of diffusion models

When approaching the methods used in de-novo protein design, one is quickly confronted with a plethora of overlapping formulations of what looks superficially like “the same thing”. One paper trains an ϵ\boldsymbol{\epsilon}-prediction network with a simple MSE loss; another trains a score network with a stochastic-differential-equation justification; a third trains a clean-data predictor under yet another schedule. Each formulation carries its own notation, its own variance schedule, and its own sampler. Qualitatively, this zoo of formulations is doing the same thing: it starts from some unstructured noise and iteratively refines it to eventually produce a protein structure similar (but different!) to other proteins we have experimentally determined in the past. What is not immediately obvious to a newcomer is that all of these formulations are historical descendants of a small number of foundational ideas, and that essentially every architectural and algorithmic decision in a modern protein-design diffusion model has a specific paper of origin and a specific motivation for being there.

This post is my attempt to put these formulations onto a single timeline. I trace the trajectory of the field through four foundational works: DDPM (Ho et al., 2020), DDIM (Song et al., 2021a), the score-based SDE unification (Song et al., 2021b), and EDM (Karras et al., 2022), explaining at each step what specific problem with the previous formulation the next paper was attacking and how the new formulation generalises or simplifies the old one. The goal is coherent motivation rather than exhaustive coverage; the reader interested in implementation details is referred to the original papers and the references at the end.

Continue reading

Spin Lattices and Proteins – How state-based discretisations have enabled modern protein modelling

I got into protein modelling not long before AlphaFold2 first released. At that time some of the prevailing methods for protein structure prediction came from highly interpretable energy functionals that arose from a particularly beautiful intersection of statistical mechanics and biology. These “Potts” models are going to be the centre of a larger discussion in this blog on state-based discretisations of proteins, how they’ve shaped modern deep learning methods and whether there is still more to learn from them.

In the age of black box deep learning, does the Potts model still have a place?

The Potts/Ising Model

The Ising model is a well established popular theoretical physics model of ferromagnetism. Simply put, given a lattice of atoms each capable of adopting 1 of 2 spins (up and down) ferromagnetism arises when their spins align and their associated magnetic moments point in the same direction. The Ising model tries to parameterise the local and non-local relationships between atoms and their spin states such that we can learn the Hamiltonian of the system and its different configurations under the magnetic field. The Hamiltonian takes the following form for a system of N atoms


$$
E = -\sum_{i}^Nh_ix_i – \sum_{i<j}^N J_{ij}x_i x_j,
$$

where J is the “coupling energy” between any two atoms x_i and x_j, and h represents the magnetic field, or more appropriately for our purposes it can be framed as a single-site field dictating how an individual atom independently acts within the model. You might recognise the form this binary spin model takes as it arises naturally across the sciences including in Hopfield networks and graphical models.

Everything is an Ising-like model if you’re brave enough

Continue reading