Environmentally sustainable computing 

Did you know that it is approximated that you, a scientist, have a carbon footprint which is between 2 and 12 times higher than the set carbon budget per person to keep global warming below 1.5 °C [1]? 

Background

Global temperatures are rising. This has direct effects on the planet and contributes to increasing humanitarian emergencies. These include more frequent and intense heatwaves, wildfires, and floods [2]. The impact of climate change is already severe, with around 20 million internal displaced persons in 2023 alone due to those disasters [3]. 

Global warming and climate change are caused by the emissions of carbon dioxide and methane, known as carbon emissions. There are different ways in which you could minimise your carbon footprint. For example, I try to reduce the energy usage in the house, try eating mainly plant-based, and travel by train instead of by plane to family and for holidays and conferences. However, up until organising a Green Lecture with the Department of Statistics Green Team I never thought of my computational PhD as a major contributor to my carbon footprint. That doesn’t mean the work I, and all other scientists, do is not important and necessary. But the lecture on principles for environmentally sustainable research given by Loic Lannelongue made me aware of carbon costs of computing, which I would like to share with you. 

The Carbon Footprint of Computing

Carbon footprint is measured in grams of CO2-equivalent (gCO2e) [4]. This means, for any given mix of greenhouse gases the equivalent amount of CO2 that results in the same global warming impact. To give an indication of numbers, flying in Economy from Paris to London emits 50,000 gCO2e and streaming Netflix for an hour 55 gCO2e. The average carbon footprint of a person in the UK in 2022 was 4.7 tonnes CO2e per year, and the Intergovernmental Panel on Climate Change carbon budget per person per year is set at <2 tonnes of CO2e [1, 5]. Data centers have a carbon footprint of around 100 megatons (10^14 g) of CO2e per year, and a data center of medium size uses the same amount of data as three average-sized hospitals [6]. While for some scientists their contribution to climate change can be visible, for example a wet lab scientist that disposes equipment every day, for computational scientists running their code could feel as a low impact. However, this is not true. It is approximated that the carbon footprint of a scientist is between 4-25 tonnes of CO2e per year [1]. This includes the cost of computational tasks, data storage, and the life cycle of the hardware.  

The European Molecular Biology Laboratory (EMBL) estimated that 65 tonnes CO2e was generated per publication that came out of their institute [7]. A paper by Luccioni et al. estimated the carbon footprint of large machine learning models and estimated that 502 tonnes of CO2e was needed to train GPT-3 [8]. Training AlphaFold-2 is estimated to be 3.92 tonnes of CO2e. This includes only the training of the final model and does not include any research and development to get to this final model. The carbon footprint of training the even bigger ESM-fold is estimated to be 106.29 tonnes of CO2e [7]. The EMBL’s European Bioinformatics Institute stores the AlphaFold DB of predicted protein structures, to reduce the carbon footprint of researchers as they can now query the database instead of running inference themselves [9].  

Reducing Your Carbon Footprint

Because the environmental impact of computing is not often discussed, there is a lot of low hanging fruit to improve. Based on the “Ten simple rules” paper by Lannelongue et al [4], I will discuss some steps that can be easily implemented by early researchers. Reading this blogpost and becoming aware of the carbon cost is a good starting point. There are several python packages that can be implemented in your own code to estimate the carbon footprint of your model/tools/code. Examples are CarbonTracker [10, 11] and CodeCarbon [12]. You could also make estimations by using an online tool such as Green Algorithms [13, 14]. 

To reduce the cost, it is important to make your code efficient [4]. As scientists we often want to produce results as quickly as possible which can result in not using software best practices. Making your code more efficient will not only reduce running time of your code but will also reduce computational cost. This includes using up to date versions of software. Another simple way of reducing the footprint is by debugging and testing your code on a small dataset before running your code on the full dataset. And when you reached the stage of training your model on the full dataset think about parameters you want to optimise and perform a random search to find the best parameters more efficiently. When training takes a long time, checkpoints could avoid running the same code multiple times due to issues later in the pipeline [4].  

Allocating the appropriate amount of memory to your job is another easy way to reduce the energy usage of your code, because most of the energy usage depends on the memory available and not on the memory used [4]. So before submitting your job to the server, change your memory request to what you need for that specific job. The hardware you are using is also important to consider. For example, parallelisation or using GPUs instead of CPUs to reduce running time could increase your energy usage. It is therefore also important that you provide clear instructions on memory and hardware requirements when releasing your software [4]. 

When your models are ready to generate results, make sure to remove generated data that is no longer needed. This will not only make your supervisor and all your colleagues happy, but it will also reduce your carbon footprint. It is estimated that storing 1 TB of data costs around 10 kg CO2e per year [9]. Finally, take care of your devices. The carbon footprint of using a device is small compared to the production cost of this device [4, 9].

Other steps that need to be taken to make computing more environmentally sustainable require a bit more effort. For example, using computing facilities in countries where green energy is the main energy source (think about collaborations here as well) and including expected carbon footprints in project/funding proposals. Ideally journals and funding bodies would promote transparency on carbon usage [1, 4, 9]. Besides, environmentally sustainable computing should be standardised and become normal practice when writing applications/grands/publications, similar to how ethics is a standard practice in medical studies and clinical trials [15]. Guidelines for environmentally sustainable computational science, such as the GREENER set of principle [9], could help to lead the way. 

There are also several tools that can help you become a more environmentally sustainable scientist. As already mentioned, CarbonTracker [10, 11] and CodeCarbon [12] are python packages to track energy consumption of your models. The later has in integrated dashboard to visualise the outputs. For Netflow pipelines the nf-co2footprint could be used. The Green Algorithms for high performance computing GreenAlgoritms4HPC [16, 17] takes information from workload manager’s logs to summarise the carbon footprint of a project, including the cost of failed jobs. The Green Algorithms [14] is an online implementation which considers, among others, run time, number of cores, and type of CPU/GPU used. Other webpages that might be of interest is the carbon intensity forecast per region in the UK [19] and the real time electricity usage per country [20] 

Hopefully these tools and tricks will get you started and make you more aware of the carbon footprint of your compute, as we should be aware of those impacts. To end on a positive note, a recent study succeeded in drastically reducing the initial settling period of models describing Earth processes and their interactions [21]. Reducing the time from many months to under a week not only reduces the carbon footprint of the model, but also allows more accurate climate change predictions [21,22]. 

Sources 

1. Lannelongue, L., Inouye, M. Carbon footprint estimation for computational research. Nat Rev Methods Primers 3, 9 (2023). https://doi.org/10.1038/s43586-023-00202-5  

2. https://www.who.int/news-room/fact-sheets/detail/climate-change-and-health  

3. https://www.migrationdataportal.org/themes/environmental_migration_and_statistics

4. Lannelongue L, Grealey J, Bateman A, Inouye M. Ten simple rules to make your computing more environmentally sustainable. PLoS Comput Biol. 2021 Sep 20;17(9):e1009324. doi: 10.1371/journal.pcbi.1009324. PMID: 34543272; PMCID: PMC8452068.

5.  https://ourworldindata.org/per-capita-co2 

6. Mytton, D. (2021). Data centre water consumption. npj Clean Water, 4(1), 11. 

7. https://www.youtube.com/watch?v=PMu483_5f1c&t=322s  

8. Luccioni, A. S., Viguier, S., & Ligozat, A. L. (2023). Estimating the carbon footprint of bloom, a 176b parameter language model. Journal of Machine Learning Research, 24(253), 1-15. 

9. Lannelongue, L., Aronson, HE.G., Bateman, A. et al. GREENER principles for environmentally sustainable computational science. Nat Comput Sci 3, 514–521 (2023). https://doi.org/10.1038/s43588-023-00461-y  

10. Anthony, L. F. W., Kanding, B. & Selvan, R. Carbontracker: tracking and predicting the carbon footprint of training deep learning models. Preprint at https://arxiv.org/abs/2007.03051 (2020). 

11. https://github.com/lfwa/carbontracker  

12. https://github.com/mlco2/codecarbon  

13.Lannelongue, L., Grealey, J. & Inouye, M. Green algorithms: quantifying the carbon footprint of computation. Adv. Sci. 8, 2100707 (2021). 

14. https://www.green-algorithms.org/  

15. CW24 Panel – The Digital Footprint Revolution: A Call to Action for Sustainable Research Computing. https://www.youtube.com/watch?v=VhSqtsNVki0 

16. Lannelongue, L. GreenAlgorithms4HPC. GitHub https://github.com/GreenAlgorithms/GreenAlgorithms4HPC (2022). 

17. https://www.green-algorithms.org/GA4HPC/ 

18. https://github.com/nextflow-io/nf-co2footprint?tab=readme-ov-file 

19. https://carbonintensity.org.uk/  

20  https://app.electricitymaps.com/map 

21. Khatiwala, S. (2024). Efficient spin-up of Earth System Models using sequence acceleration. Science Advances10(18), eadn2839. 

22. https://www.ox.ac.uk/news/2024-05-02-new-computer-algorithm-supercharges-climate-models-and-could-lead-better-predictions  

Author