go tool trace

Will Sewell · 06 April 2017

Do you want to know what your Go program is really doing? go tool trace can show you: it visualizes all the runtime events over a run of your Go program, in exquisite detail. This under-documented tool is one of the most useful tools in the Go ecosystem for diagnosing performance problems such as latency, poor parallelisation, and contention. In my previous blog post I mentioned that we used go tool trace at Pusher to track down long pause times in Go’s garbage collector. In this blog post, I’ll give you a deeper dive into go tool trace.

Tour of the `go tool trace` interface

go tool trace displays a lot of information, so it is hard to know where to start. Let’s get a rough overview of the interface first, and then we’ll look at how you can investigate specific problems with it.

The go tool trace UI is a web app. Below, I’ve embedded a live example of this web app! This example visualizes a trace of a parallel quicksort implementation:

Play around with the example! For help navigating the UI, click the “?” in the top right. Click any event on the screen to get more information below the trace. Quiz time! Here are some things you can find out from this trace:

How long does this program run for? (Find the answer by looking at the timeline along the top.) 2.78 milliseconds.
How many goroutines were running 872 microseconds into the run? (Find the answer by zooming until individual microseconds are visible, then clicking on the “Goroutines” section.) three.
When did the process first ramp up to using three OS threads? (You should browse the “Threads” section.) 87 microseconds in.
When did main call qSortPar? (Expand the “PROCS” section, and look for “G1 runtime.main” in Proc 0. What happens after that event?) 510 microseconds in.
What caused the additional procs (1, 2 and 3) to start doing work? (Under “View Options”, turn on “Flow events”. You can then see arrows which trace cause-and-effect between events.) They were all started by qSortPar on Proc 0.
When did proc #2 stop? (Zoom in on the end of activity for proc #2. There’s an event called “proc stop”.) 2.67s milliseconds in.

Great! So how do I use `go tool trace` with my program?

You must instrument your program to emit runtime events to a binary file. This involves importing the runtime/trace package from the standard library, and adding a few lines of boilerplate. This quick video walks you through it:

Here’s the code to copy-paste to follow along:

package main

import (
	"os"
	"runtime/trace"
)

func main() {
	f, err := os.Create("trace.out")
	if err != nil {
		panic(err)
	}
	defer f.Close()

	err = trace.Start(f)
	if err != nil {
		panic(err)
	}
	defer trace.Stop()

  // Your program here
}

This will cause your program to emit events in a binary format to the file trace.out. You can then run go tool trace trace.out. This will parse the trace file and open a browser with the visualizer. It will start a server which will respond to the visualizer with the trace data. Once the initial page is loaded in the browser, click “View trace”. This will load a trace viewer like the one embedded above.

What problems can I solve with `go tool trace`?

Let’s run through examples of how you would track down typical problems using this tool.

Diagnosing latency problems

Latency problems can be introduced when a goroutine that is critical to completing an operation blocked from running. This can happen for a number of reasons: it is blocked on a syscall; it is blocked on a shared memory (channel/mutex etc); it is blocked on the runtime system (e.g. the GC), it’s even possible the scheduler is not running the critical goroutine as frequently as you would like.

All of these can be identified using go tool trace. You can track the problem down by looking at the PROCs timeline, and finding periods of time where a critical goroutine is being blocked for an unacceptably long period of time. Once you have identified this period of time, whatever is being run instead of the critical goroutine should give you a clue as to what the root cause is.

As an example of a latency issue, let’s see what a long GC pause looks like from the previous blog post:

The red-colored events represent when the one and only program goroutine is running. The goroutines running in parallel on all four threads are the MARK phase of the garbage collector. This MARK phase blocks the main goroutine. Can you see how long it blocks the runtime.main goroutine for? 12 milliseconds, from 626 milliseconds until 638 milliseconds.

I investigated this latency problem soon after the Go team announced typical GC pause times of less than 100 microseconds . So the long pause times I was seeing with go tool trace looked odd, particularly because I could see they were occurring in the concurrent phase of the collector. I mentioned this to the go-nuts mailing list, and it appears to be related to this issue, which is now fixed in Go 1.8. My benchmark also surfaced another GC time pause problem, which is actively being worked on at the time of writing. This investigative work would be impossible without go tool trace!

Diagnosing poor parallelism

Let’s say you’ve written a program which you expect to be using all CPUs, but it runs slower than you expect it to. This could be because your program is not as parallel as you expect. This can be caused by running too much of the critical path in serial, when parts could be run asynchronously (and in parallel).

Example time! Say we have a pub/sub message bus that we want running in a single goroutine so that it can safely modify a map of subscribers without a mutex. Request handlers write incoming messages to the message bus queue. The bus reads messages off the queue, looks up in the map which subscribers to send them to, and writes the message to their sockets. Let’s see what this looks like in go tool trace for a single message:

The initial green-colored event is where the http handler reads the published message and writes it into the message bus event queue. After that, the message bus runs in a single thread – the second green-colored event – writing the message out to subscribers.

The red lines show where messages are written to subscriber’s sockets. How long does it take the process to write to all the subscribers? 16 milliseconds, from 4671 milliseconds until 4687 milliseconds.

The problem is three out of four threads are sitting idle while this is going on. Is there a way we can make use of them? The answer is yes! We don’t need to synchronously write to each subscriber; these could be run concurrently in separate goroutines. Let’s see what happens if we visualize it with this change:

As you can see, the period of time where messages are being written to subscribers is now happening on all threads, in many different goroutines.

But is it faster? The message writing time previously took 16ms. It now takes 10ms (at least for these two messages). So that’s a win.

It’s interesting to see that the speedup is modest, given that we are using 4X as much CPU. This is because there is a lot more overhead in running code in parallel: starting and stopping goroutines; sharing memory; separate caches. There is also a theoretical limit on the speedup which prevents us from achieving a 4X reduction in latency: Amdahl’s Law.

In fact, it may often be less efficient to run code in parallel; particularly if the goroutines are very short lived, or there is a lot of contention between them. This is another reason to use this tool: try both approaches and check which works best for your use case.

When is `go tool trace` not appropriate?

Of course, go tool trace can’t solve everything. It’s not appropriate if you want to track down slow functions, or generally find where your program is spending most of its CPU time. For that you should use go tool pprof, which shows the percentage of CPU time spent in each function. go tool trace is better suited at finding out what your program is doing over time, not in aggregate. Also, there are other visualizations offered by go tool trace alongside the “View Trace” link, and these are particularly useful to diagnose contention issues. There’s also no substitute to knowing your program’s performance in theory (with old-fashioned Big-O analysis).

Conclusion

Hopefully this post has given you a taste of how useful go tool trace can be for diagnostics. Even if you don’t have a specific problem to solve, visualizing your program is a great way of checking that the runtime characteristics of your program match up with your intuition. I encourage you to fire up your Go program and give it a whirl. The examples I showed in this post were simple, but symptoms in your more complex programs should be surprisingly similar.

Appendix (or, show me the code)

This blog post gave you an introduction to using go tool trace, but you may wish to dive deeper. The official docs for go tool trace are currently quite sparse. There is a Google Doc that goes into more detail. Beyond that, I found it useful to refer to source code in order to figure out how go tool trace worked:

go tool trace source
Source of binary trace parser
Tracer source
The web interface for go tool trace comes from the trace viewer of the Catapult project. This viewer can generate visualizations from a number of trace formats. go tool trace uses a JSON-based Trace Event Format.

go tool trace

Tour of the go tool trace interface

Great! So how do I use go tool trace with my program?

What problems can I solve with go tool trace?