Last Tuesday marked two exciting milestones for me in OPIG! Not only had I been looking forward to group socials since the beginning of lockdown, but I’d never met anyone other than Charlotte in person since starting in the group in April. As such, the annual cycling pub trip was an apt introduction to several OPIG members (who are now exempt from the game I play by myself during weekly Zoom group meetings: “Guess how tall this person is in real life!”) and a chance to interact with people other than my housemates!
Continue readingMonthly Archives: August 2020
Understanding Conformational Entropy in Small Molecules
While entropy is a major driving force in many chemical changes and is a key component of the free energy of a molecule, it can be challenging to calculate with standard quantum thermochemical methods. With proper consideration in flexible molecules, we can break down the total entropy into different components, including vibrational, translational, rotational and conformational entropy. The calculation of conformational entropy is the most time-consuming as we have to sample all thermally-accessible conformers. Here, we attempt to understand the components that contribute to the conformational entropy of a molecule, and develop a physically-motivated statistical model to rapidly predict the conformational entropies of small molecules.
Continue readingLearning from Biased Datasets
Both the beauty and the downfall of learning-based methods is that the data used for training will largely determine the quality of any model or system.
While there have been numerous algorithmic advances in recent years, the most successful applications of machine learning have been in areas where either (i) you can generate your own data in a fully understood environment (e.g. AlphaGo/AlphaZero), or (ii) data is so abundant that you’re essentially training on “everything” (e.g. GPT2/3, CNNs trained on ImageNet).
This covers only a narrow range of applications, with most data not falling into one of these two categories. Unfortunately, when this is true (and even sometimes when you are in one of those rare cases) your data is almost certainly biased – you just may or may not know it.
Continue reading