Picture the following: the year is 1923, and it’s a sunny afternoon at a posh garden party in Cambridge. Among the polite chatter, one Muriel Bristol—a psychologist studying the mechanisms by which algae acquire nutrients—mentions she has a preference for tea poured over milk, as opposed to milk poured over tea. In a classic example of women not being able to express even the most insignificant preference without an opinionated man telling them they’re wrong, Ronald A. Fisher, a local statistician (later turned eugenicist who dismissed the notion of smoking cigarettes being dangerous as ‘propaganda’, mind you) decides to put her claim to the test with an experiment. Bristol is given eight cups of tea and asked to classify them as milk first or tea first. Luckily, she correctly identifies all eight of them, and gets to happily continue about her life (presumably until the next time she dares mention a similarly outrageous and consequential opinion like a preferred toothpaste brand or a favourite method for filing papers). Fisher, on the other hand, is incentivized to develop Fisher’s exact test, a statistical significance test used in the analysis of contingency tables.
Now picture a similar image, except the year is 2024, and it’s a rainy afternoon in the fluorescent-lit department of Statistics, Oxford. Amongst the lunch chatter, an opiglet mentions their preference for Yorkshire tea. Unfortunately for them, I am present, and am also two things: one, a proud Lancastrian, and two, a deeply insufferable person. And thus, the birth of this blog post. The aim? To get my Fisher on (exclusively regarding tea blind tasting) and find the top dog of the English breakfast tea world, specifically by asking a handful of willing participants to taste test Yorkshire tea alongside its rival, Lancashire tea (some of the less cultured amongst you might be questioning whether Lancashire tea actually exists – it certainly does, and has been about since 2002, though I concede that it is certainly less of a household name than its rival).
So, onto the experiment: seven opiglets agreed to be my Muriel Bristols. I prepared each of them three cups of tea—one from the vibrant heart of England, home to the Blackpool Tower, Lancashire hotpot, and Eric Morecambe—and one from Yorkshire. For the sake of fairness, I also included a cup of the most neutral tea I could find, Twinings (if you were really enjoying the county rhetoric, just imagine this as somewhere nice and nondescript, like Surrey).
Our seven opiglets each sipped, savored, and ranked the three teas: Twinings (A), Yorkshire (B), and Lancashire (C). Here’s the raw data, how the rankings turned out, and the code that was used:
Participant | 1st Choice | 2nd Choice | 3rd Choice |
---|---|---|---|
1 | C | A | B |
2 | B | A | C |
3 | A | B | C |
4 | B | C | A |
5 | B | A | C |
6 | B | A | C |
7 | C | B | A |
Or in a frequency distributions table:
data = { "Participant": [1, 2, 3, 4, 5, 6, 7], "1st Choice": ["C", "B", "A", "B", "B", "B", "C"], "2nd Choice": ["A", "A", "B", "C", "A", "A", "B"], "3rd Choice": ["B", "C", "C", "A", "C", "C", "A"] } df = pd.DataFrame(data) freq_df = df.melt(id_vars=["Participant"], value_vars=["1st Choice", "2nd Choice", "3rd Choice"], var_name="Rank", value_name="Tea") freq_table = freq_df.groupby(["Tea", "Rank"]).size().unstack(fill_value=0)
Rank | 1st Choice | 2nd Choice | 3rd Choice |
---|---|---|---|
Twinings (A) | 1 | 4 | 2 |
Yorkshire (B) | 4 | 2 | 1 |
Lancashire (C) | 2 | 1 | 4 |
The results of our taste test reveal some shocking truths. Yorkshire tea emerged as the clear favorite, winning the most 1st choice rankings. Twinings, being the Switzerland of teas, held a strong position in the middle. And Lancashire tea, despite my fervent and undeniable bias, tended to fall to the 3rd position more often than not.
In an effort to comfort myself, I calculated Kendall’s coefficient of concordance, a measure of the agreement between the guinea (o)piglets:
rank_data = df.set_index('Participant') rank_data = rank_data.apply(lambda x: x.map({"A": 1, "B": 2, "C": 3})) kendall_w = rank_data.corr(method='kendall').mean().mean()
This resulted in a Kendall’s W of 0.048, indicating that there was very little consensus among the participants, and providing a thin blanket of comfort for my bruised Lancastrian ego.
A mature person would now conclude something about personal taste, trusting people’s opinions, and perhaps even not making everyone demonstrate their personal preferences via blind taste test. I’ve come away with some of that, but in all honesty, the only reason this blog post is seeing the light of day is that I found out that Lancashire tea is actually a product of Merseyside, so I can completely and safely dissociate myself from it—I preferred Lidl’s own brand anyway.