Member-only story
Rust vs Go (2026): One of Them Gave Us 5× Latency Wins That Forced Backend Migrations
The incident started the way these things always do: a customer escalation with screenshots of dashboards we’d already seen a hundred times. Average latency was fine. Error rate was fine. CPU wasn’t pegged. Nothing was “on fire.”
But p99 had quietly crept past 1.8 seconds during a peak window that should have been boring. It stayed there long enough to trip an SLA clause we’d negotiated assuming “that’ll never happen.”
We spent the first hour doing what experienced teams do when they don’t want a conclusion to be true: blaming queries, traffic shape, caches, and upstream jitter. We rolled back a deploy that hadn’t actually changed anything in the hot path. We added more replicas. The p99 moved a little. Then it came back.
Only later did we accept the uncomfortable part: the system was behaving exactly as designed. The design just wasn’t compatible with the latency budget we’d promised anymore.
Baseline Reality: a Service That Was “Good Enough” Until It Wasn’t
The existing service had been running for years. Steady-state traffic hovered around 18–22k requests per second. p50 latency lived comfortably under 20 ms. p95 rarely crossed 80 ms. For most of its…