Lots of work has been put into making AMD designed GPUs to work nicely with GPU accelerated frameworks like PyTorch. Despite this, getting performant code on non-NVIDIA graphics cards can be challenging for both users and developers. Even in the case where the developer has appropriately optimised for each platform there are often gaps in performance where, at the driver-level, instructions to the GPU may not be optimised fully. This is because software developed using CUDA can benefit from optimisations like operation-fusing without having to specify in many cases.
This may not be much of a concern for most researchers as we simply use what is available to us. Most of the time this is usually NVIDIA GPUs and there is hardly a choice to it. NVIDIA is aware of this and prices their products accordingly. Part of the problem is that system designers just dont have an incentive to build AMD platfroms other than for highly specialised machines.
This does not mean that AMD is not competitive, as of this post the fastest supercomputer in the world is powered by AMD instinct GPUs. AMD products comparable to NVIDIA in terms of price often exceed them in important specs like memory bandwidth which is key for heavy compute applications like deep learning. This is important as NVIDIA has been continuously skimping on memory bandwidth to increase seperation to their halo-tier products like their DGX-A100 systems. An example of this is the RTX 2000, a workstation card present in many systems for its CUDA capability but also its small footprint. This has been quietly re-released as the RTX A2000 with more CUDA cores (and proportionally higher MSRP) but also a cut-down memory bus.
In order to convince system designers to provide more options we need better benchmarks, but given the problems mentioned above how do we actually compare performance between AMD and NVIDIA cards in an apples to apples manner?
This is where ZLUDA comes in, ZLUDA is an open-source port of CUDA onto AMD’s ROCm platform. This is not a total rewrite of CUDA but rather a translation layer that allows software to interface with the GPU as if it were a CUDA device. This means that any underlying (supported) hardware can benefit from the software optimisation that has gone on under the hood in CUDA.
NAMD is a Molecular Dynamics engine known for its GPU support, here AMD GPUs perform comparably at equivalent price brackets.
And here we can see that the ZLUDA (Top) actually performs better than the AMD HiP implementations (Below).
Overall ZLUDA on AMD GPUs when compared to OpenCL often performs better in raw compute. Its not perfect, but given its true killer feature is a drop-in replacement driver, I was quite impressed by this.
The story behind this project is also interesting, orginally written by a developer at Intel (Andrzej Janik) originally as an open-source driver for Intel integrated graphics. When Intel started to make its move into the discrete GPU market he was recruited to develop ZLUDA for their ARC GPUs. The Intel ARC team eventually pulled out of the deal and the original developer also had to stop due to personal reasons.
In 2022 AMD contracted Andrzej Janik to develop ZLUDA for AMD GPUs and have clearly been successful but in February 2024, AMD ended their contract halting development. Andrzej’s contract stated that upon ending the contract that the software was to be made open source (Where can I find lawyers like this?) and available for anyone with a compatiable GPU to try.