Llm

Hugging Face is the GitHub of AI models. It uses git with LFS (and xet) to serve large model weights, but at the end of the day, you can’t use those weights directly. The code required to run them is typically implemented in popular Python packages like transformers (for LLMs) or diffusers (for diffusion models). In this blog post, I will explain how to take a model weights file and use it to generate your very first token using your own custom inference implementation. Let’s get started—this will be quite technical and will likely involve some math! While it is always helpful to have a background in AI and ML, you can still follow along without it. ...

From Weights to Token - Qwen3 Implementation

Understanding Load balancing in LLMs