Cuda | Bits and Bytes

At work, we focus on optimizing LLM serving, and one topic that comes up repeatedly is kernel optimization. I want to share some insights into what a kernel actually is and where it fits in the stack, because believe it or not, every modern LLM and diffusion model is ultimately powered by kernels running on a GPU. I have some familiarity with Compute Unified Device Architecture (CUDA) and I also happen to have an NVIDIA Blackwell GPU in my workstation, so in this post I will explain what a kernel is and walk through writing one from scratch in CUDA. ...