Creating a Language Server for Protocol Buffers

Protols is a Language Server Protocol (LSP) implementation for Protocol Buffers that I wrote in Rust. In this post I will talk about why I built it and some of the interesting bits from the implementation.

Why

At work, we use a lot of protobuf files. While protobuf is great for defining APIs and data structures, navigating between dozens (sometimes hundreds) of .proto files was painful. I was constantly running grep or using Vim’s search to jump between message definitions, enum declarations, and imports scattered across packages.

I looked for a decent Language Server for protobuf. At the time, the options were either incomplete, unmaintained, or didn’t support things like go-to-definition across package boundaries. So I decided to write one.

Getting started

I had never built a Language Server before, but I had worked with LSP from the client side and had a good understanding of the spec from my time working on cpeditor. I also had enough Rust experience to be comfortable with it, so that was the obvious language choice.

The idea is simple enough: parse protobuf files, understand their structure, and provide code intelligence. The devil is in the details.

Tree-Sitter

I went with Tree-sitter for parsing. It gives you incremental parsing with solid error recovery, which is exactly what you need when users are actively typing incomplete code. For the LSP framework itself, I used async-lsp.

I know tree-sitter has a query system, but I wasn’t aware of it when I started. I ended up implementing all the tree walking by hand with recursive traversals:

pub fn find_all_nodes(&self, filter: fn(&Node) -> bool) -> Vec<Node> {
    let mut result = Vec::new();
    self.visit_nodes(self.tree.root_node(), &filter, &mut result);
    result
}

fn visit_nodes(&self, node: Node, filter: &fn(&Node) -> bool, result: &mut Vec<Node>) {
    if filter(&node) {
        result.push(node);
    }
    
    for child in node.children(&mut node.walk()) {
        self.visit_nodes(child, filter, result);
    }
}

All those tree algorithms from CS courses finally found a practical use. Symbol collection, scope resolution, reference finding – all built on top of recursive AST traversals.

Multi-file state

The most interesting challenge was managing state across files. Protobuf files import from each other, creating dependency graphs. The LSP needs to parse all files in a workspace, resolve imports, build a complete symbol table, keep everything in sync when files change, and provide diagnostics across file boundaries.

pub struct ProtoLanguageState {
    documents: Arc<RwLock<HashMap<Url, String>>>,
    trees: Arc<RwLock<HashMap<Url, ParsedTree>>>,
    parser: Arc<Mutex<ProtoParser>>,
    parsed_workspaces: Arc<RwLock<HashSet<String>>>,
    protoc_diagnostics: Arc<Mutex<ProtocDiagnostics>>,
}

Import resolution was the tricky part. When you parse a file, you need to recursively parse all its dependencies. You need to handle circular dependencies without looping, and avoid re-parsing files that haven’t changed.

Features

Protols supports:

Auto-completion: Messages, enums, and keywords within the current package
Diagnostics: Tree-sitter syntax errors combined with protoc validation
Go to Definition: Across package boundaries and imports
Hover: Documentation and type information
Document / Workspace Symbols: Navigable outline of file and workspace structure
Find References: All usages of types and fields
Rename: Across the entire codebase
Formatting: Via clang-format

Rename was probably the most involved – it needs to find every reference to a symbol across potentially dozens of files and update them atomically.

Current state

Protols does what I need it to do day-to-day, so I’m not actively adding major features. It has over 100 stars on GitHub, is published on crates.io, has CI/CD, and even got a VS Code extension through community contributions.

The protobuf LSP ecosystem has also grown since I started. There are now several other implementations available, which is good to see.

If you work with protobuf files, you can install it with:

cargo install protols

Neovim setup:

require'lspconfig'.protols.setup{}

Why#

Getting started#

Tree-Sitter#

Multi-file state#

Features#

Current state#

Why

Getting started

Tree-Sitter

Multi-file state

Features

Current state