When showcasing an approach in computational chemistry, an example molecule is required as a placeholder. But which to chose from? I would classify there different approaches: choosing a recognisable molecules, a top selling drugs, or a randomly sketched compound.
At a recent conference, Sheffield Cheminformatics 2023, I saw examples of all three and one problem I had that some placeholders distracted me into searching to figure out what it was.
Recognisable compounds
The point of a placeholder molecule is an example whose identity does not matter. As a result a common practice is to choose a compound everyone can recognise. The cliché compound is caffeine, with aspirin a close second. When loading PyMol, there is a splash screen with an easel with aspirin on it, which is odd as it’s primary a great protein viewer.
Opting for a recognisable compound will mean the audience will recognise it and pat themselves on the back. Whereas when the compound is non-recognisable the audience may get distracted trying to remember where they saw that compound. I am particularly bad at this, but I believe its fairly common —I spoke to a med chemist about this and they actively tune out if they disagree with the placeholder molecule choice…
When folk in industry present they frequently use the compounds from their company, which may not be universally recognisable, but it’s their product, so their pride is fully understandable. In academia or in start-ups, one does not have a company example compound, so this route is unavailable.
When one needs a compound that addresses a specific objective, a recognisable compound isn’t often an option, so a randomly drawn molecule is used. I will admit that have on some slides random compounds with no labels, but I make sure to name them (in order: cateshol, cumarate, caffeinate, “this naphathalene-thinggy”, “this fake-natural-product with a lactam”).
Generally it’s something randomly scribbled that looks like a derivitised polyaromatic hydrocarbon, which is not great. However, these clearly look like random scribbles.
Personally, I prefer recognisable compounds —I would not write a blog post about it otherwise—, but I think a simple solution against distraction is adding to a placeholder compound its name as a caption, prefixed with “example” to make it clear its an example. This however does not answer the question of what makes a good placeholder compound.
Cliché compound: Aspirin
In the figure for the lead of this post, an obvious detail is instantly noticeable: size. The molecular weight of aspirin is only 180 Da. If it were not a pharmaceutical compound, it would be a building block, not the final product as it has endless off-target effects. It was developed at the end of 1800s by Bayer to improve salicylic acid, an extract from the bark of willows (whose genus is named Salix, hence the name acetylsalicylic acid). As a result it looks nothing like drug. If one uses it as an example compound for an algorithm odd things happen as it has very few fingerprints —for example, in the lightning-fast search tool Arthor odd analogues appear with it.
It has a a carboxylate which gets Lipinski-rule zealots riled up, especially if they did not recognise the compound —the two personal traits could be said to go hand in hand. Personally, I think carboxylates are awesome for the hit discovery stage. Candesartan is a modern beautiful-looking compound that has a carboxylate, so a carboxylate is not a deal-breaker for a drug.
Natural products: caffeine, serotonin & adrenalin
Caffeine is also small and is a molecule that appears on T-shirts and jewellery. It is very much a cliché compound. The T-shirts might have captions like the smell of coffee, which is actually incorrect as other compounds, such as 2-furfurylmercaptan, cause the smell of coffee as seen in the laminated infographic “The Aroma of Coffee” seen in most office kitchenettes. Another Etsy/Pintrest compound is serotonin, the neurotransmitter of happiness. Personally, I prefer its metabolite, melatonin, the sleep hormone. Another common hormone is adrenaline (epinephrine). Although this could mistaken for an amphetamine.
The recognisable compounds could be split into natural/metabolites and pharmaceutical compounds. Secondary metabolism biochemistry is amazing in the diversity, especially as it’s achieved from what amount to half a dozen reactions and three or four basic building block types (e.g. terpenes, polyketides and shikimate-derivatives); however natural products have scalability issues if harvested and synthetic accessibility issues if made. Id est, the opposite of drug-like.
Popular…
Wikipedia releases information on page views, which made great reading (e.g. the Ed Sheeran article has had more readers than Jesus). On the week after Barbenheimer, on the Top 25 report Oppenheimer (film) has 5.9B while Barbie (film) has 3.9B, yet the latter had a better box office outcome. Wikipedia readership correlates with importance for sure, but in the eyes of the heterogeneous demographic. Plus, one is more likely to look up the unknown not the know. I datamined Wikipedia for the most read about compounds and the outcome was —ehrm— “interesting”. Limonene was the most read-about, because of a strain of Cannabis: narcotics feature heavily.
I was told that someone for a talk visiting the United States used fentanyl as a placeholder and it was poorly received. I do not know if there were other factor, such as the units were left as metric and not converted to freedom units as personally I would confuse the structure of fentanyl and cetiridine (two benzenes and a piperizine) so I’d assume it would be latter. But it would stand to reason that using narcotics as placeholders is a bad idea. In terms of comedy, probably Sildenafil would be a better choice especially as it’s quite easy to recognise.
But in terms of easy to recognise, the prize goes to penicillins in my opinion. They save lives and the beta-lactam ring gives its name to the whole family of drugs. Penicillin G is the original from Penicillium chrysogenum, while amoxicillin is the one commonly prescribed today.
Therefore, next time I need a placeholder compound, I am going ditch caffeine and aspirin and without doubt going to use penicillin G instead.
Footnote 1: Clar sextet
Parenthetically, one thing to make sure is to not use unusual styles. In ChemDraw, the default is to show π-sextet rings, which can get some people angry, especially as naphthalene has one sextet ring not two according to the Clar’s rule.
Here is fluorescein in it’s anhydride form, which is confusing when thinking of its resonance, and the π-sextet looks even more confusing in my opinion.
Footnote 2: SMILES of compounds mentioned
I know that by blogging about what compounds to choose as an example I am also choosing compounds to act as examples in a meta way.
Name | SMILES | Role |
Aspirin | O=C(C)Oc1ccccc1C(=O)O | Nonsteroidal anti-inflammatory drug |
Salicylic acid | O=C(O)c1ccccc1O | Willow bark extract |
Paracetamol | CC(=O)Nc1ccc(O)cc1 | Analgesic drug |
Ibuprofen | CC(C)Cc1ccc(cc1)[C@@H](C)C(=O)O | Nonsteroidal anti-inflammatory drug |
Caffeine | CN1C=NC2=C1C(=O)N(C(=O)N2C)C | Essential vitamin |
2-Furfurylmercaptan | SCc1ccco1 | Aroma of coffee |
Melatonin | CC(=O)NCCC1=CNC2=C1C=C(C=C2)OC | Sleep hormone |
Adrenaline | CNC[C@H](O)c1ccc(O)c(O)c1 | Flight-or-fight hormone |
Methamphetamine | CNC(C)Cc1ccccc1 | Narcotic |
Cetirizine | Clc1ccc(cc1)C(c2ccccc2)N3CCN(CC3)CCOCC(=O)O | Antihistamine |
Serotonin | C1=CC2=C(C=C1O)C(=CN2)CCN | Mood neurotransmitter |
Candesartan | CCOc2nc1cccc(C(=O)O)c1n2Cc5ccc(c3ccccc3c4nn[nH]n4)cc5 | Hypertension medication |
Dolutegravir (Tivicay) | C[C@@H]1CCO[C@@H]2N1C(=O)c3c(c(=O)c(cn3C2)C(=O)NCc4ccc(cc4F)F)O | HIV antiviral drug (GSK) |
Penicillin G | O=C(Cc1ccccc1)NC1C(=O)N2C1SC(C2C(=O)O)(C)C | Antibiotic |
Fentanyl | O=C(CC)N(C1CCN(CC1)CCc2ccccc2)c3ccccc3 | Opioid |
Sildenafil (Viagra) | CCCC1=NN(C2=C1N=C(NC2=O)C3=C(C=CC(=C3)S(=O)(=O)N4CCN(CC4)C)OCC)C | Impotence medication (Pfizer) |
Fluorescein anhydride | c1cc2c(cc1)C(=O)OC23c4ccc(cc4Oc5c3ccc(c5)O)O | Dye |