Go, REST APIs, and Pointers

One of the more interesting design challenges with go-github (and subsequently the one that results in the most questions) is the use of pointers for most all of the fields in our structs that are marshaled and passed to the GitHub API. After a fair amount of trial and error, I settled on the approach that I’m going to talk about below, and it’s something I think more API clients written in Go ought to consider. The original bug for this is google/go-github#19, and the full discussion there may be interesting for some; this post attempts to lay out the problem into a more consumable form. It’s a lesson on the interaction between Go’s zero values, the omitempty option on JSON or XML field tags, and the semantics of the PATCH HTTP method.

Starting Simple

The way Go handles most data encoding is very nice and simple. You define a standard Go struct, and for each field in the struct you can add a tag that specifies how that field should be encoded in particular formats. For example, here’s a simplified struct to represent a GitHub repository:

type Repository struct {
    Name        string `json:"name"`
    Description string `json:"description"`
    Private     bool   `json:"private"`
}

Each of the fields on this struct specify the key name the value should be marshaled to in the resulting JSON object. We could then build a new Repository and marshal it as JSON:

r := new(Repository)
b, _ := json.Marshal(r)
println(string(b))

outputs >>> {"name":"","description":"","private":false}

When we created the new Repository, each of its fields were set to their zero value: the empty string "" for string types, and false for bool types. There is no notion in Go of a declared, but uninitialized, string or bool. At the time of declaration, if an initial value is not assigned, then the variable is initialized to its zero value. Remember that, it will be important in a moment.

Understanding PATCH

As its name implies, a REST-based API involves passing around the representation of the state of a resource. This is most commonly applied to HTTP, which is very straightforward: to read the current state of a resource, perform a GET operation on the resource’s URI. To update a resource, pass the new representation of the resource to its URI in a PUT operation. The PUT method is defined as a complete replacement of the resource at a given URI, meaning you must always provide the full representation that you want to set. But what if you only want to update a few fields in the resource? That’s done with the PATCH method.

The exact semantics of how the body of a PATCH request is applied to the requested resource are determined by the media type of the request. The way GitHub (and many other JSON APIs) handles PATCH requests is that you provide the JSON representation of the resource to update, omitting any fields that should be left unchanged. So for example, to update only the description of a repository, the HTTP request might look something like:

PATCH /repos/google/go-github HTTP/1.1
Host: api.github.com

{"description": "new description"}

To delete the description entirely, simply set it to an empty string:

PATCH /repos/google/go-github HTTP/1.1
Host: api.github.com

{"description": ""}

What if you were to perform a PATCH request with every field specified? That would actually be semantically equivalent to a PUT request with the same request body. In fact, because of this, all resource updates in the GitHub API are done using PATCH. They don’t even support (or at least, don’t document) using PUT at all for these types of requests.

Omitting empty values

The go-github library has a method for updating a repository named Edit which takes the owner and name of the repository to edit, as well as a Repository struct which contains the fields to be updated. So the Go code to update the description of a repository would simply be:

r := &github.Repository{Description:"new description"}
client.Repositories.Edit("google", "go-github", r)

What would the resulting HTTP request look like? If you recall the previous discussion about JSON marshaling, it would be something like:

PATCH /repos/google/go-github HTTP/1.1
Host: api.github.com

{"name":"", "description":"new description", "private":false}

Well that’s not what was specified… the name and private fields were included even though they weren’t part of the Repository struct. But remember that those fields are set to their zero value, so this really is what was specified. The name field is not actually a big deal since it’s immutable and GitHub will ignore it. However the private field is a big problem. If this were a private repository, this seemingly innocuous change would have accidentally made it public!

To address this, we can update our Repository type to omit empty values when marshaling to JSON:

type Repository struct {
    Name        string `json:"name,omitempty"`
    Description string `json:"description,omitempty"`
    Private     bool   `json:"private,omitempty"`
}

Now the empty string for name and the false value for private are omitted, resulting in the desired HTTP request:

PATCH /repos/google/go-github HTTP/1.1
Host: api.github.com

{"description": "new description"}

So far so good.

Intentionally empty values

Now let’s go back to a previous example and see what it would look like in code. Let’s delete the description for a repository by setting it to an empty string:

r := &github.Repository{Description:""}
client.Repositories.Edit("google", "go-github", r)

Given the omitempty option we added to our struct fields, what will happen? Unfortunately, not what we want:

PATCH /repos/google/go-github HTTP/1.1
Host: api.github.com

{}

Because all fields on our Repository struct are now set to their zero value, this marshals to an empty JSON object. This request would have no effect whatsoever.

What we need is a way to identify which fields are set to their zero value simply because that’s how they were initialized (and omit those from our JSON serialization), versus those that were intentionally set to a zero value by the developer (and include those in our JSON serialization). And that’s where pointers come in.

Pointers

The zero value for a pointer is nil, regardless of what it is a pointer for. So by using pointers for our struct fields, we can unambiguously differentiate between an unset value, nil, and an intentional zero value, such as "", false, or 0. This is exactly what goprotobuf does, for exactly this reason. So this results in our final Repository type of:

type Repository struct {
    Name        *string `json:"name,omitempty"`
    Description *string `json:"description,omitempty"`
    Private     *bool   `json:"private,omitempty"`
}

This does come at a cost however, since it’s a little annoying to have to create pointers to a string or bool. You end up with overly verbose code such as:

d := "new description"
r := &github.Repository{Description:&d}
client.Repositories.Edit("google", "go-github", r)

To make this easier, go-github provides a handful of convenience functions copied over from goprotobuf for creating pointer types:

r := &github.Repository{Description: github.String("new description")}
client.Repositories.Edit("google", "go-github", r)

Using pointers also means that clients of the library will need to perform their own nil checks where appropriate to prevent panics. goprotobuf generates accessor methods to help make this a little easier, but go-github hasn’t added those yet.

Other libraries

So does any of this matter for your Go API client? Well, it depends. If the API doesn’t do any kind of partial updates like PATCH, then you can probably leave off omitempty, not worry with pointers, and go on about your way. If you never need to send a zero value such as empty string, false, 0 in a JSON or XML request (not likely), then you can set omitempty and move on. But for most modern APIs, those won’t be the case, and you should experiment to see if your current library prevents you from performing certain actions.

(I’ll also note that google/go-github#19 discusses alternative solutions that weren’t discussed here, such as using a field mask or using goprotobuf directly. It may be worth looking at those. Pointers just made sense for this library; use what works for you.)

Related Reading

Likes and Reposts

  • Par Trivedi
  • Gennady Feldman
  • Philip Durbin
  • Kathryn Huxtable
  • Van Riper
  • Aldo Mendez Reyes
  • Greg Jones
  • Renato Suero
  • Cristian Gary Bufadel
  • Thomas Broyer
  • Sebastian Müller
  • Satish Talim
  • Sam Hsing
  • Randall Farmer
  • Eric Casteleijn
  • Igor Yurchenko
  • Djuke van Hoof
  • Aldo Mendez Reyes
  • Will Norris
  • Mykola Aleshchanov
  • Mykola Aleshchanov
  • Sam Hsing
  • Sam Hsing
  • Matt Dragon
  • Eric Casteleijn
  • Eric Casteleijn

Comments

  1. Chris

    I think it’s worth noting that you don’t NEED to have the struct tags for the json equivalent field if you’re decoding the json (you do for encoding, though, if you want to keep them lower case). Without the json tags, the json package will still decode into the struct appropriately. It first matches a perfect case match to the api but secondarily, the package will then make a match to case insensitive json field names. So it will successfully match json “name” to the struct’s Name string variable. That being said, it’s still probably a good idea to have the json tags regardless, but just wanted to point out it’s not required, haha.

    • sure, it’s not technically required in all cases. But in practice, when you’re writing a client for an existing API, it’s almost always necessary (in my experience, anyway).

  2. There is another solution. Do not unmarshal your PATCH data directly into a repository.

    If you first unmarshal to a map[string]interface{} then you can use reflection to get only the values (or empty strings) that were explicitly set in the JSON.

    • The problem is marshaling, not unmarshaling. Sure, the Edit method could take a map[string]interface{} instead of a Repository struct, but then you lose the type-safety Go provides, and doesn’t feel very idiomatic to me.

  3. Troy Kruthoff

    I hope this does not become idiomatic. I agree keeping type safety is good but the friction is from trying to use a struct as a map[string]interface{} for only the sake of catching compile time errors, so why not make a RepositoryPatch type that embeds the Repository and keeps track of what was set, something like: https://gist.github.com/troyk/896c045094ec93c1578b

    • MattK

      Marshal => Unmarshal => Filter => Marshal (again)? For every marshal? I hope that doesn’t become idiomatic either!

      Sure the OP’s suggested solution isn’t exactly cuspy but at least it doesn’t uselessly waste CPU (indirect addressing isn’t expensive (mostly) in modern CPUs).

      • Troy Kruthoff

        I agree, it’s a hack, but the lessor of two evils. “Using pointers also means that clients of the library will need to perform their own nil checks where appropriate to prevent panics”. In other words, I’d rather use a little more CPU to ensure keeping things easy and resilient.

        I suppose you could use the proper field name and then use reflection to create a map and then marshall the map, but I’d bet the json would outperform it because it is caching the reflection. Our it would be great if the json pkg had a MarshalToMap method!!!

        • It’s certainly a trade off… I just disagree on which is the lesser evil :)

          I figure developers are most likely going to be dealing with pointers elsewhere in their code, so are very possibly going to be doing nil checks anyway. And if it’s it’s really a problem, a helper method is as simple as http://play.golang.org/p/9tpIodrqhp. That’s really all goprotobuf’s generated getter methods do. I still have an open bug to add these to go-github, it just honestly hasn’t proven to be a big enough problem.

          The other big consideration is that this is a client library. It’s one thing to make performance decisions for you own application, but for a library like this that is intended to be used by others, I wouldn’t want to force them to pay a performance penalty if I don’t have to.

        • it’s also interesting to note that I actually do use a technique somewhat like what you’re suggesting, but for a tool I wrote to identify which data fields GitHub is returning that aren’t being mapped into my Go structs.

  4. MattK

    How about a “omitunwritten” modifier? :) Then of course that would lead to requiring a way to store/reset the written flag.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Anti-Spam Quiz: