Bits and Bytes

From Weights to Token - Qwen3 Implementation

Hugging Face is the GitHub of AI models. It uses git with LFS (and xet) to serve large model weights, but at the end of the day, you can’t use those weights directly. The code required to run them is typically implemented in popular Python packages like transformers (for LLMs) or diffusers (for diffusion models). In this blog post, I will explain how to take a model weights file and use it to generate your very first token using your own custom inference implementation. Let’s get started—this will be quite technical and will likely involve some math! While it is always helpful to have a background in AI and ML, you can still follow along without it. ...

Understanding Load balancing in LLMs

With the rise of LLMs and inference workloads, a new type of routing has gained traction. In this post, I will share some insights on how this works and why it’s needed. It will be a technical deep dive, so let’s get started! Load balancing in LLMs Load balancing always needs a set of targets, and load balancing policy decides which target to choose for a given request. In traditional web services, these targets are web servers which often themselves are stateless or backed by some shared state, so a simple round robin or load-based routing policy works nicely. ...

Hybrid Homelab and Cloud Setup

I wanted to share the applications and services running in my homelab, but I realized that understanding my network configuration is essential for grasping the complete picture. This blog post serves as a necessary introduction before diving into the specifics of my homelab setup. To keep this post focused, I won’t try to list every service running in my homelab — that’s for another post. Instead, I’ll walk through a few unique configuration details, such as where this site is hosted and how external ingress reaches my network. ...

Creating a Language Server for Protocol Buffers

As a software engineer, there’s a particular satisfaction that comes from scratching your own itch by building the tool you need. That’s exactly how Protols—my Language Server Protocol (LSP) implementation for Protocol Buffers—came to life. Navigating Protobuf Hell At work, we use a lot of protobuf files. And I mean a lot. While protobuf is fantastic for defining APIs and data structures, navigating between dozens (sometimes hundreds) of .proto files was becoming a genuine pain point. I found myself constantly using grep or Vim’s search to jump between message definitions, enum declarations, and imports across different packages. ...

Accessing VSOL from LAN: A Raspberry Pi Bridge Setup

If you’ve ever wanted to access your VSOL ONU’s web interface (usually at 192.168.1.1) from your LAN (say 192.168.0.0/24), you might have hit a wall—especially if your router doesn’t allow assigning multiple WAN IPs. In this guide, I’ll walk you through how to bridge that gap using a Raspberry Pi or any other device with at least two network interfaces. We’ll use the following setup Prepare the Raspberry Pi Connect the Pi’s eth0 to your Router’s LAN port. Connect eth1 (USB-to-Ethernet adapter or secondary port) to ONU’s bridge port. Your Pi now has two interfaces: one in your LAN and one going to the ONU. ...