Our paper “Toward GPU-centric Networking on Commodity Hardware” has been accepted to appear at EdgeSys 2024 in Athens. The paper summarizes the last year efforts to implement a RDMA stack directly on a CUDA GPU, and it is a joint work with Mariano Scazzariello, Gerald Q. Maguire Jr., and Dejan Kostić.

This paper fits in the grand-schema of GPU-accelerated systems, tackling the problem of expensive, wasted, CPU cycles being used to just move data in and out of GPUs, often polling for completion. We believe this a key enabler to support the new era of GPU-accelerated computing, without limiting to AI applications. The use of standard RDMA semantics, and minimal modifications of the stack (e.g. without porting the whole code to the CUDA architecture), would allow an easier and more flexible adoption by current applications, without the need of proprietary and single-vendor stacks (e.g. NCCL). Although developed on NVIDIA hardware (and Mellanox NICs), I believe the future is variegated and multi-vendor, hence the need for a more standard and easy to deploy stack.

The paper is available on ACM Digital Library with Open Access, from DiVA or directly here. The article is released under the Creative Commons BY-NC-SA license, and Open Access in the hope that this would help the scientific progress. For the same reasons, we plan to publish the software artifacts as an Open Source project.

More details about the implementation and the reasons for it are described in my Licentiate thesis. But if you are interested in knowing more (and knowing what is not written there), get in touch! My institutional email is mylastname@kth.se.