(cache)The unexpected productivity boost of Rust

TL;DR

Lubeno's backend is 100% Rust code, and it has grown to a size where it's impossible for me to keep all parts of the codebase in my head at the same time.

In my experience projects typically hit a significant slowdown at this stage. Just making sure your changes didn't have any unforeseen consequences becomes very difficult.

I have found that Rust's strong safety guarantees give me much more confidence when touching the codebase. With that extra confidence I'm much more willing to refactor even critical parts of the app, which has a very positive effect on my productivity, and long-term maintainability.

Rust just saved me again today!

I recently ran into an issue that got me thinking and ultimately inspired me to write this post. I needed to wrap a structure into a mutex, because it was being accessed concurrently. To access the internal structure, you first need to acquire a lock on the mutex.

let lock = mutex.lock();
// … The locked data is used to generate a commit …
db.insert_commit(commit).await;

This change looked completely fine to me, and rust-analyzer agreed. No errors were shown in that file. But suddenly another file in my editor lit up red, indicating a compilation error, the router definition. This didn't make any sense to me, how could my lock influence what handler the router will take?

.route("/api/git/post-receive", post(git::post_receive))
                                     ^^^^^^^^^^^^^^^^^
error: future cannot be sent between threads safely
help: within 'impl Future<Output = Result<Response<Body>>', the trait 'Send' is not implemented for "MutexGuard<'_, GitInternal>"

It took me much longer than I would like to admit, to figure out what was going on here. Let me break it down for you!

When a new HTTP connection comes in, the web framework we are using will spawn a new async task for it. The async tasks are executed on a work-stealing scheduler. This means if a thread finished all work, it will “steal” tasks from other threads, to balance out the workload. This can only happen at '.await' points in Rust.

There is another important rule, if a mutex is locked on one thread, it needs to be released on the same thread, or we would have undefined behavior.

Now, Rust keeps track of all lifetimes and knows that the lock lives long enough and passes the '.await' point. This means that releasing the lock might happen on a different thread and that is not allowed, because it might result in undefined behavior.

The solution is very simple, just release the lock before the '.await' statement.

Bugs like this are the worst! It's almost impossible to catch them in development, because there is never enough load on the system to force the scheduler to move the execution to another thread. So, you end up with one of these "impossible to reproduce, fails sometimes, but never for you" bugs.

It's mind-blowingly cool that the Rust compiler can detect something like this. And that seemingly unrelated parts of the language, like mutexes, lifetimes and async operations form such a coherent system.

TypeScript on the other hand is scary

In contrast, a recent asynchronous bug in our TypeScript codebase went undetected long after being shipped to production. Here's the culprit.

// User logged in successfully!
if (redirect) {
    window.location.href = redirect;
}

let content = await response.json();
if (content.onboardingDone) {
    window.location.href = "/dashboard";
} else {
    window.location.href = "/onboarding";
}

Very simple. When logging in, see if there is a redirect. If yes, redirect to the specific page. If not, go to the dashboard or onboarding page. Assigning a value to 'window.location.href' will redirect your browser to a location.

I believe I tested it, and it worked. But suddenly it didn't work anymore. Did it ever work? What is going on here? We always got redirected to the dashboard, even when a redirect was present.

There is a scheduling race condition here. Assigning a value to 'window.location.href' doesn't immediately redirect you, like I thought it would. It just sets the value and schedules a redirect as soon as possible. But the code doesn't stop executing! This means that the next assignment might execute before the browser starts the redirect, redirecting you to the wrong location. It took me forever to figure out that was the case. The solution is to just add a return statement to the if block and never let it get to the rest.

if (redirect) {
    window.location.href = redirect;
    return;
}

I feel like both issues, the Rust and the TypeScript one, are similar. They are both related to async scheduling and both show some undefined behavior that is very much not obvious. But the Rust type checker is much more useful, it prevented the bug from ever compiling. The TypeScript compiler doesn't track lifetimes or has any borrowing rules and is just not capable of catching this kind of issues.

Fearless refactoring

Rust is often recommended as a great language for systems programming, but it's not usually the first choice when it comes to web applications. Python, Ruby and JavaScript/Node.js are always perceived as more “productive” for web development. I think that this is true if you are just starting out! You get so much out of the box with these languages, and the initial progress is very fast.

But once the project reaches a certain size, everything grinds to a halt. There is just so much loose coupling between parts of the codebase that it becomes very hard to change anything.

We have all been there. You change something and everything works great, but then two days later you get a ping that your change broke another (completely unrelated) page. After the 3rd time this happens, your willingness to touch the codebase dramatically drops.

With Rust I just worry much less, and this allows me to try out more things. I have the feeling that my productivity has even increased as the codebase has grown. There is much more code I can build upon, reuse and change without worrying that I will accidentally break the existing stuff.

Rust is just so good at telling you “Yeah, that change you are doing is affecting another part of the project that you are probably not thinking at all about, because you are six functions call deep in and the deadline is approaching fast, but here is exactly why this might cause issues”.

What about tests?

I think that tests are great! They are a very powerful tool if you are doing a big refactoring and need help with catching regressions. But they are not required by the compiler for the code to run. This means that you can just easily decide not to add tests.

Some days are just more stressful than others, there is very little time and things need to get done. But with tests there is this additional mental overhead. I need to decide what the right level of abstraction is. Am I testing the behavior or the implementation details? Will this test actually prevent any errors in the future? Making all these decisions is very exhausting and error prone.

Rust can be sometimes challenging to learn and write, but the nice thing with Rust is that it takes the burden of deciding away from me. The decisions have been made by much smarter people than me, that worked on huge codebases and have encoded all the common mistakes into the compiler.

Of course, some properties of an app can't be part of the type-system. In that case tests rock!

Bonus: Zig is scary too!

Zig is often compared with Rust; both are aiming to be systems programming languages. I think that Zig is very cool, and the language just sparks some nerd joy inside me. But then I remember again that it's scary. Let's just look at a simple error handling example.

const std = @import("std");

const FileError = error{
    AccessDenied,
};

fn doSomethingThatFails() FileError!void {
    return FileError.AccessDenied;
}

pub fn main() !void {
    doSomethingThatFails() catch |err| {
        if (err == error.AccessDenid) {
            std.debug.print("Access was denied!\n", .{});
        } else {
            std.debug.print("Unexpected error!\n", .{});
        }
    };
}

We have a function called 'doSomethingThatFails' that always fails with an error value of 'FileError.AccessDenied', then we catch the error and print out that access was denied.

Except, we don't do it. There is a typo in the error handling logic 'AccessDenid != AccessDenied'. The code will compile just fine. The Zig compiler will generate a new number for each unique 'error.*' and doesn't care what types you are comparing. It's just numbers.

However, if you use a 'switch' statement instead of an 'if', then suddenly the Zig compiler goes "Oh this is obviously wrong! The returned error can never be of this value because it's not in 'FileError'", and refuses to compile the code. It's capable of detecting the bug, it just chooses not to care. If it looks like a number, the 'if' statement might as well compare it as a number.

These small design decisions in the language are very much in contrast with Rust. And for someone who mistypes names all the time, that can be scary.