My take on the Collaborations Workshop (CW) 2024

At the end of April, I attended the CW 2024. This yearly hybrid event organised by the Software Sustainability Institute (SSI) has been running since 2011! The event brings people together to discuss best practices and the future of software in research. This year’s event themes were (1) AI/ML tools for Science, (2) Citizen Science and (3) Environmental sustainability.

As a Research Software Engineer (RSE) working with OPIG, I felt a great curiosity to attend and find out what I could bring of use to the group, as most people work on AI/ML applications. In this blog post, I share a few bits of the event which resonated with me and I found most interesting and relevant to share with my group.

I left a huge amount of amazing stuff out, but all the talks are available on YouTube. Have your pick.

5 Principles for Building Generative AI (GAI) Products

In this keynote, Arfon summarised five lessons drawn from his experience developing GitHub copilot.

These include things such as avoiding delegating crucial decision-making to chatbots (Large Language Models) and being aware of their flaws as a consequence of limitations in their training data. And how these models aren’t yet at the point where we can blindly insert them into our workflows – and probably will never be. He also emphasised the great potential of these tools to be personalised to our needs.

With the benefit of hindsight, the risks he talked about may be for many in computational research glaringly obvious. Think of all those instances where ChatGPT, Gemini and other chatbots have been roasted on social media.

These tools are not perfect. However, assuming this is common sense is dangerous. Think of newcomers into the field with little experience in programming, copying and pasting AI-generated code. Or, people without a technical or scientific background, using GPT-3 to solve maths problems or write science essays. You get the point.

Although to be fair, nowadays OpenAI has made amazing progress in providing custom GPTs for specific domain knowledge tasks, see the screenshot below. The full list of GTPs is here. But, I would still argue that some degree of literacy is needed to distinguish correct from hallucinated answers.

The full video is here.

Environmental Sustainability and Digital Research Infrastructure

DRI here means digital assets such as computational models, tools and computer servers.

In her keynote, Kelly talked about ARINZRIT, a project out of many that feed into the UKRI Net Zero 2023 Review.

ARINZRIT aimed to look at the impact of DRIs at the societal and environmental levels. For this, 25 research groups owning DRIs were surveyed to gather their opinions on the impacts of DRIs. This was done through a workshop, whose main outcome was a set of recommendations formulated for stakeholders in policy, funding and research.

To get the broader context of her talk, I would recommend reading Kelly’s papers on the impact of Information Communication and Technology (ICT) and Systems Thinking as an approach to addressing ICT’s environmental impacts.

My take from her talk is that the computational research community is still figuring out concrete best practices that can be adopted sustainably to mitigate negative socio-environmental impacts. However, several ongoing projects and initiatives are working towards it.

The full video is here.

The Digital Footprint Revolution: A Call to Action for Sustainable Research Computing

Environmental sustainability seemed the most popular topic for most talks. Unsurprisingly, the one discussion panel talked about something related.

I think the intrinsic question of the topic was:

What are the necessary ingredients to make computational research environmentally sustainable?

During the discussion, there was a great deal of brainstorming exchanging ideas and ongoing efforts on achieving environmental sustainability. I summarised the major points I could gather in the concept map above.

I believe some of the above points are self-explanatory, but to properly elaborate on these and other points I might have missed, I would recommend watching the full video here.

For the remaining part of this blog, I will share some of the cool resources the panellists and participants shared throughout the conversation, which connect to some of the above points in my summary slide.

The Bottom Line

The CW 2024 was a stimulating event where I had the opportunity to explore the work of other research software engineers (RSEs) in the computational research community across various fields, including STEM, the Humanities, and Social Sciences. The event also highlighted the growing green movement in computational research and underscored the critical role of AI and Machine Learning (ML) as powerful allies in addressing environmental sustainability challenges. However, it also drew attention to the environmental impact of these technologies and the increasing awareness and efforts to mitigate their ecological footprint through ongoing projects and initiatives.

Author