Creating a Language Server for Protocol Buffers

As a software engineer, there’s a particular satisfaction that comes from scratching your own itch by building the tool you need. That’s exactly how Protols—my Language Server Protocol (LSP) implementation for Protocol Buffers—came to life.

The Problem: Navigating Protobuf Hell

At work, we use a lot of protobuf files. And I mean a lot. While protobuf is fantastic for defining APIs and data structures, navigating between dozens (sometimes hundreds) of .proto files was becoming a genuine pain point. I found myself constantly using grep or Vim’s search to jump between message definitions, enum declarations, and imports across different packages.

You know that feeling when you’re trying to trace through a complex type definition that spans multiple files, and you end up with 15 terminal windows open, each showing different grep results? Yeah, that was my daily reality.

I thought, “Surely there’s a good Language Server for protobuf files out there.” So I went hunting. This was a while back, and the LSP ecosystem for protobuf was pretty barren. The few options I found were either incomplete, unmaintained, or didn’t support the features I desperately needed—like proper go-to-definition across package boundaries.

That’s when the software engineer mindset kicked in: if a tool doesn’t exist, you can create one.

Taking the Plunge: First LSP Implementation

I’d never built a Language Server before, though I had some experience with LSP from the client side and a solid understanding of the specification thanks to my previous work in another project called cpeditor. I also had enough Rust experience to feel confident tackling this in my favorite systems language. The challenge was exciting—how hard could it be, right? (Famous last words…)

The core idea was straightforward: parse protobuf files, build an understanding of their structure, and provide intelligent code assistance. Simple in concept, complex in execution.

The Technical Journey: Tree-Sitter and Recursive Traversals

The first major decision was choosing the right parsing library. I wanted to learn tree-sitter, which I’d heard great things about, and it seemed perfect for this use case. Tree-sitter provides robust, incremental parsing with excellent error recovery—exactly what you need for a language server that needs to work with potentially incomplete or syntactically incorrect code.

For the LSP framework, I went with async-lsp, which felt like the most mature and well-designed option available at the time. It provided clean abstractions for handling LSP requests and notifications without getting bogged down in protocol details.

The real fun began with implementing the tree-walking algorithms. I know tree-sitter supports queries, but honestly, I wasn’t even aware of that feature when I started (and I probably wouldn’t have used it anyway—I wanted to understand the AST traversal intimately).

This led to implementing a lot of recursive tree traversal algorithms by hand:

pub fn find_all_nodes(&self, filter: fn(&Node) -> bool) -> Vec<Node> {
    let mut result = Vec::new();
    self.visit_nodes(self.tree.root_node(), &filter, &mut result);
    result
}

fn visit_nodes(&self, node: Node, filter: &fn(&Node) -> bool, result: &mut Vec<Node>) {
    if filter(&node) {
        result.push(node);
    }
    
    for child in node.children(&mut node.walk()) {
        self.visit_nodes(child, filter, result);
    }
}

Finally, some practical application for all those computer science tree algorithms! It was genuinely satisfying to implement features like symbol collection, scope resolution, and reference finding using these fundamental data structure operations.

The “Aha!” Moment

The standout moment came when I got my first working “go to definition” feature. I still remember the exact moment—I was working on a complex protobuf file with multiple imports, placed my cursor on a message type, hit the keybinding, and boom—my editor jumped directly to the definition in another file.

It was magical. All those hours of debugging tree traversal logic, figuring out import resolution, and managing state across multiple files suddenly paid off in this one smooth interaction.

Architecture: State Management and Multi-File Coordination

One of the most interesting challenges was managing state across multiple files. Protobuf files can import from each other, creating complex dependency graphs.

The LSP needs to:

Parse all relevant files in a workspace
Resolve imports and build a complete symbol table
Keep everything in sync when files change
Provide diagnostics that span multiple files

I implemented a state management system that tracks parsed trees, document contents, and workspace-level metadata:

pub struct ProtoLanguageState {
    documents: Arc<RwLock<HashMap<Url, String>>>,
    trees: Arc<RwLock<HashMap<Url, ParsedTree>>>,
    parser: Arc<Mutex<ProtoParser>>,
    parsed_workspaces: Arc<RwLock<HashSet<String>>>,
    protoc_diagnostics: Arc<Mutex<ProtocDiagnostics>>,
}

The trickiest part was handling import resolution. When you parse a file, you need to recursively parse all its dependencies to build a complete picture. But you also need to avoid infinite loops in case of circular dependencies and prevent re-parsing the same files unnecessarily.

Features That Made It Real

Protols ended up supporting a comprehensive set of LSP features:

Auto-completion: Suggests messages, enums, and keywords within the current package
Diagnostics: Combines tree-sitter syntax errors with protoc validation
Go to Definition: Works across package boundaries and handles imports
Hover Information: Shows documentation and type information
Document Symbols: Provides a navigable outline of file structure
Find References: Locates all usages of types and fields
Rename Symbols: Safely renames symbols across the codebase
Code Formatting: Integrates with clang-format for consistent styling

Each feature required understanding different aspects of the LSP protocol and implementing sophisticated tree analysis. The rename functionality, for instance, needs to find all references to a symbol across potentially dozens of files and update them atomically.

Real-World Impact

The difference in day-to-day productivity has been substantial. What used to be a tedious process of grep-searching and manual file navigation is now as simple as Ctrl+clicking on a symbol. Code reviews became faster because I could quickly understand complex type hierarchies. Refactoring protobuf schemas went from being a dreaded task to something I could do confidently.

It’s also been gratifying to see the protobuf LSP ecosystem grow. When I started Protols, there were very few options. Now there are several quality implementations available, which is fantastic for the community.

Lessons Learned

Building Protols taught me several valuable lessons:

Start Simple: I began with basic parsing and gradually added features. Trying to implement everything at once would have been overwhelming.
Tree-Sitter is Powerful: Once you understand the mental model, tree-sitter makes parsing robust and efficient. The error recovery is particularly impressive.
LSP Protocol is Well-Designed: The request/response model with capabilities negotiation makes it straightforward to build incrementally.
State Management is Critical: In a language server, you’re essentially building a long-running service that needs to stay in sync with a constantly changing codebase. Getting the state management right is crucial for correctness and performance.
Testing with Real Codebases: The sample files I created for testing were useful, but nothing beats testing against real, messy production codebases with complex import hierarchies.

What’s Next?

Protols has reached a level of functionality that satisfies my daily needs, so I’m not actively adding major features. There are always more LSP capabilities that could be implemented—semantic highlighting, code actions, workspace symbols—but the core functionality is solid and stable.

The most rewarding aspect has been seeing other developers adopt and contribute to the project.

It’s published on crates.io, has CI/CD set up, and even has VS Code extension support through community contributions.

The Bigger Picture

This project reinforced something I love about being a software engineer: when you encounter a problem, you have the power to build a solution. It doesn’t matter if you haven’t done it before—the combination of determination, existing knowledge, and the willingness to learn new concepts can take you surprisingly far.

Protols started as a solution to my own productivity problem and ended up being something that helps other developers too. That’s the kind of impact that makes all those late nights debugging recursive tree traversals worthwhile.

If you work with protobuf files and want to try Protols, you can install it via:

cargo install protols

And if you’re a Neovim user like me, configuration is as simple as:

require'lspconfig'.protols.setup{}

Building your own tools isn’t just about solving immediate problems—it’s about deepening your understanding of the technologies you use every day and contributing back to the community that supports your work. Plus, there’s nothing quite like the satisfaction of using a tool you built yourself to solve the exact problem that motivated you to build it in the first place.

The source code for Protols is available on GitHub under the MIT license. Contributions and feedback are always welcome!

The Problem: Navigating Protobuf Hell#

Taking the Plunge: First LSP Implementation#

The Technical Journey: Tree-Sitter and Recursive Traversals#

The “Aha!” Moment#

Architecture: State Management and Multi-File Coordination#

Features That Made It Real#

Real-World Impact#

Lessons Learned#

What’s Next?#

The Bigger Picture#