I recently had the pleasure of working for 11 weeks with the wonderful people in OPIG. I studied protein interaction networks and how we might discern the parts of the network that are important for disease (and otherwise). In the past, people have looked at differential gene expression or used community detection to this end, but both of these approaches have drawbacks. The former misses the fact that biological systems are rarely just binary systems or interactions. Community detection addresses this, but it in turn does not take into account the dynamic nature of proteins in the cell – how do their interactions change over time? What about interactions or proteins that are only present in some cells? Community detection tries to look at all proteins and ignores important context like this.
My aim was to develop approaches that combined these elements. We used Pearson’s correlation coefficient on gene expression data and community detection on an interaction network. We showed that the distribution of the correlation of pairs of genes is weighted towards 1.0 for those that interact compared to those that do not, and for those in the same community compared to those that are not – see the figure above. We went on to assign a “score” to communities based on their correlation in each set of expression data. For example, one community might have a high score in expression data from cells undergoing amino acid starvation. We ended up with a list of communities which seemed to be important in certain environmental conditions. We made use of functional enrichment – drawing on the lovely Malte’s work – to try and verify these scores.
I had a great time with some lovely people and produced something that I thought was very interesting. I really hope I see this work pop up again and get taken to interesting places! So long, and thanks for all the cookies!
Click here for some more pretty plots and a code repository (by request only).