AI is moving fast, and large language models (LLMs) are at the centre of it all, doing everything from generating coherent, human-like text to tackling complex coding challenges. And this is just scratching the surface—LLMs are popping up everywhere, and their list of talents keeps growing by the day.
However, these models aren’t infallible. One of their most intriguing and concerning quirks is the phenomenon known as “hallucination” – instances where the AI confidently produces information that is fabricated or factually incorrect. As we increasingly rely on AI-powered systems in our daily lives, understanding what hallucinations are is crucial. This post briefly explores LLM hallucinations, exploring what they are, why they occur, and how we can navigate them and get the most out of our new favourite tools.
Types & Causes of Hallucinations
Hallucinations come in a range of forms and have been described in a myriad of ways. For this post, we can broadly use Huang et al’s categorisation into factuality and faithfulness hallucinations.
Factual hallucinations, are when the model generates false or inaccurate information and presents it as fact. On the other hand, faithfulness hallucinations are concerned with deviation from the user query/input. You may also have seen hallucinations be characterised as intrinsic and extrinsic.
So what causes hallucinations? Hallucinations in LLMs are often caused by issues in training data, such as errors, biases, or inconsistencies, as well as shifts in data distribution between training and real-world application. Additionally, the randomness in the text generation process, combined with model’s tendency to prioritise learned knowledge over context, can lead to factually incorrect or irrelevant outputs. And last but not least, let’s not forget—if your prompt is vague or unclear, you’re basically inviting the model to make stuff up!
Practical Ways to Mitigate Hallucinations in Every day Use
Let’s dive into a few simple ways to improve your everyday use of LLMs and reduce hallucinations. We’ll touch on prompt strategies, keeping context relevant, and tweaking some easy-to-control parameters to get better, more accurate responses along with some general advice.
Be Smart With Your Prompts
When it comes to mitigating hallucinations day-to-day, one key strategy is to be smart with your prompts. The clearer and more specific your input, the less likely the model is to drift off into hallucinated territory. Giving examples of what you’re looking for works very well – such as tone or creativity in writing. This is a case of few-shot prompting where you provide some examples of what you’re looking for to the model.
Write me a short poem about a rainy autumn day in a similar style to this poem by Edgar Allan Poe:
"To —
I heed not that my earthly lot
Hath—little of Earth in it—
That years of love have been forgot
In the hatred of a minute[...]"
Another prompting tip, don’t ask for a vague “summary” of an article, instead, ask for a focused analysis on key points, and you’ll see better results. There are many prompting guides out there, check out this excellently named one called Prompt Engineering Guide for an introduction and more advanced prompting techniques.
Note, if you’re using the new GPT-o1preview and GPT-o1-mini, avoid prompting techniques such as Chain of Thought (CoT) or Tree of Thought. These models already have internal CoT prompting and all you’ll do is confuse the models and increase the likelihood of nonsensical outputs.
Reduce Long Inputs
One thing to remember is that even the smartest models around struggle with long context. You may notice when you feed in an article, the model doesn’t handle this so well and cannot answer your questions without hallucinations. Long contexts can cause a lost in the middle phenomenon, where LLMs are not able to reliably fetch information for you from the middle of your article/input. The solution is to keep it short. Instead of feeding in a whole article, do a few paragraphs at a time with more specific questions.
Complex queries should also be broken down into multiple tasks. If you have a task consisting of more than 1 step, break it up into separate prompts to avoid overwhelming the model. I normally think “how I would do this myself?” and create subtasks this way.
After altering your approach, always double check information aligns well with your first few prompts before you head off and present the results to your Professor.
Don’t Give Irrelevant Information
Additionally, irrelevant context has been shown to degrade performance of models and can result in increased hallucinations. The first and easiest fix is to avoid adding in irrelevant context to your prompt or input. The second, is to allow the model to ignore irrelevant information in your given context by adding this to your prompt; “feel free to ignore irrelevant information”. The linked article above showed this actually works!
Adjust Those Hyper-parameters
Another trick? If you have access, adjust the model’s hyper-parameters. Play with parameters like top-k, top-p and temperature to find the right balance between creativity and factual accuracy—lower randomness usually means fewer hallucinations but also less creativity.
Picking the Right LLM
Finally, as model size increases, there is a reduced likelihood of hallucinations, so pick a model that has been trained on more (good) data and for longer. To see the online community’s current rating of the most capable LLM, check out the open-source project ‘Chatbot Arena‘ (formerly LMSYS), where users vote for the most capable models, with a leaderboard showcasing the results! Allegedly, OpenAI pre-release their models on there…
General Sound Advice
Lastly, here’s some general advice to help you get the most out of LLMs and reduce hallucinations. The first and most practical suggestion is to employ LLMs for their intended use case. For example, to find sources use Perplexity. I recently used it to help me find an interactive reader (Immersive reader by Microsoft). If you need a hand coding, I’ve found Claude very helpful alongside CoPilot. Secondly, if you’re using a model without internet access, find the knowledge cut-off date and commit to memory. Finally, remember that LLMs are just a tool so always double check generated content and do not accept it blindly without considering the possibility of hallucinations!
P.S. Do you have a favourite LLM?